overpass - Re: [overpass] compressed database info

Subject: Overpass API developpement

List archive

Re: [overpass] compressed database info

From: mmd <>
To:
Subject: Re: [overpass] compressed database info
Date: Tue, 03 Mar 2015 18:25:51 +0100

Hi Roland,

Am 26.02.2015 um 06:13 schrieb Roland Olbricht:

>
>> Noticed on earlier thread a mention of a compressed database.
>>
>> https://github.com/drolbr/Overpass-API/tree/backend_compression
>
> The whole world database including attic has a compressed size of 200
> GB, as opposed to 500 GB for the uncompressed database.

that's quite a nice compression rate! On my local full planet instance
dating back to October 2014 with a few weeks of attic data + meta data +
area, I need about 270GB on btrfs with lzo compression enabled.

I would assume this to increase even further when loading 2 years worth
of history. So, at least in terms of compression rate, lzo definitively
needs more space than lzw.

If I remember correctly, the compressed database has to fit on a 512GB
SSD drive. Are there any other constraints for an upper size limit of a
compressed db, i.e. do we have to keep the DB lower than, say 300GB?

> The performance of the code is not known yet, but the first tests
> suggest that it is similar to the uncompressed code.

I set up another local instance with backend compression (using my
branch test753_compression). However, due to the limited SSD capacity on
my laptop, I could only populate a rather small Germany database without
attic, but including meta data and areas. Total DB size is now at 67GB
(no file system compression used).

Running one example query [1] on both instances, I noticed a much higher
CPU consumption (especially for user time) with the default settings for
backend compression. 'perf top' shows up to 50% cpu consumption on libz
decompression while running a query.

Old Instance - full planet (btrfs with lzo compression enabled, no
backend compression):

real 0m45.318s (run time is really i/o bound)
user 0m14.923s
sys 0m4.998s

New Instance I - Germany only (backend compression, 512x1024):

real 0m57.715s
user 0m49.488s
sys 0m2.260s

Other queries [2] involving the 'foreach' statement seemed to be even
slower with backend compression.

> Both the performance and the compression rate can likely be improved by
> adjusting the values in settings.cc [...]

I also tried another DB build with adjusted settings: 256*1024. DB size
was almost identical: 66 GB.

New Instance II - Germany only (backend compression, smaller blocksize)

real 0m51.332s
user 0m46.861s
sys 0m1.730s

Compared to 'New Instance I' there are some improvements. Also, response
time (real time) looks very similar to the Old Instance figures
above. However, user+sys times are still much higher - something which
may be an issue on a high traffic instance with only limited CPU capacity.

> In general these adjustment tests are a lot of work because it needs a
> database rebuild each time.

This made me worry a bit, as it will make tests somewhat time consuming.
As the compression is now 'baked' right into the DB, it may have some
impact when cloning an existing DB and adjust it to the specifics of the
underlying storage.

Would it be technically feasible to decompress and re-compress the DB
with different compression settings, maybe using some additional tool?

More general comment:
I have to admit that I find file system compression quite useful,
especially, as it provides some flexibility to move files to another
container with different compression characteristics, if needed.

Assuming attic data may be less frequently requested, I could even think
about a mixed setup, where that data would be stored using lzw
compression, while current data is stored using lzo/lz4 for faster
decompression speeds (at the expense of additional disk space).

I'd love to hear from others running some more performance tests, where
file system level compression (btrfs or even zfs, [3], [4]) is also
considered as an option. I don't have clear idea who might win this, I
just want to understand better the pros and cons of different options.
When comparing query runtimes across different set ups, it would also be
very useful to check the total runtime along with the CPU consumption in
each case.

Based on the good experience I have with LZO compression on btrfs, I
would also highly welcome seeing LZO or even LZ4 as additional
compression algorithms in the compression backend (optional: as
configurable option on a per-file type basis). As you pointed out
before, the design is already open for different algorithms.

Best regards,
mmd

[1]
http://wiki.openstreetmap.org/wiki/DE:Overpass_API/Beispielsammlung#Unn.C3.BCtze_associatedStreet-Relationen_ermitteln
[2] http://overpass-turbo.eu/s/7VO
[3] http://open-zfs.org/wiki/Performance_tuning#Compression
[4] https://btrfs.wiki.kernel.org/index.php/Compression

Re: [overpass] compressed database info, Donal Diamond, 03/02/2015
- <Possible follow-up(s)>
- Re: [overpass] compressed database info, mmd, 03/03/2015
  - Re: [overpass] compressed database info, mmd, 03/08/2015

List archive

Re: [overpass] compressed database info