Overpass API developpement

Text archives Help


Re: [overpass] Setting up diffs


Chronological Thread 
  • From: mmd < >
  • To:
  • Subject: Re: [overpass] Setting up diffs
  • Date: Sat, 29 Jul 2017 00:13:28 +0200

Hi,

Am 26.07.2017 um 19:03 schrieb Igor Brejc:

>
> 1. What is the best/preferred way of doing diffs (minutely, hourly,
> daily, doesn't matter)? I've found at least three ways:

that depends a bit on your set up. Some update extracts may only provide
hourly or daily updates. For the full planet, I tried minutely, hourly
and daily diffs before w/o major issues.

One thing to keep in mind here is that in very rare cases (=server side
osmosis crashes), some changes contained in minutely diffs wouldn't show
up in hourly/daily diffs, leading to loss of data. That's quite nasty in
case of an attic database.

Also by using hourly/daily diffs, main memory requirements increase, see
http://wiki.openstreetmap.org/wiki/User:Mmd/Overpass_API/Performance_Project_2016/Full_Attic_DB_Setup
for some details.

The replicate_id file provided by the server is based on minutely diffs.
You'd have to manually match this number to the corresponding hourly or
daily diff as well as changing the URL endpoint to fetch the diffs files
from a different location.


> a) fetch_osc.sh + apply_osc_to_db.sh (as described on the wiki page)
> b) fetch_osc_and_apply.sh script (found that one in the bin directory,
> but it's otherwise undocumented)

On the dev instance, we run option a) to fetch and store diffs in a
directory and have multiple databases applying changes via
apply_osc_to_db. If you don't care about storing minutely diffs,
fetch_osc_and_apply may be a better option.

> c) in the "Populating the DB" section, the Overpass wiki page mentions
> the following, but I don't really understand to exactly which way of
> populating this refers to:
>
> As a side note, this also works for applying OSC files onto an
> existing database. Thus you can make daily updates by applying these
> diffs with a cronjob. This method takes fewer disk loads than minute
> updates, and the data is still pretty timely

I think that doesn't really matter, if you switch to hourly or daily
diffs, you're effectively using a different endpoint and adjusted
replicate_id value.

>
>
> 2) When applying diffs on a cloned database, what value of
> replication_id should I use initially? If replicate_id file holds a
> value (example) of 10, do I use 9, 10 or 11? When I used 10, the
> apply_osc_to_db.sh script simply looped without doing anything, but when
> I use 11, I get a lot of "Way X used in relation Y not found." messages
> (and also for nodes used in ways), which leads me to believe some diffs
> were not applied.

I always use the "auto" option rather than providing an actual
replicate_id number in case of minutely updates. apply_osc_to_db.sh
assumes that you have fetch_osc.sh running in parallel, so it should
find some new diff files once in a while and start applying those.

>
> 3) Both with a) and b) variant I get this kind of error:
>
> ./apply_osc_to_db.sh: line 117: 28219 Segmentation fault (core
> dumped) ./update_from_dir --osc-dir=$1 --version=$DATA_VERSION $META
> --flush-size=0
> + EXITCODE=139

Can you provide some more details on this, e.g. dmesg output, stack
traces? Note that once a db has hit such a error, I'd usually throw it
away and start from scratch (or restore some backup). Otherwise you
start chasing strange and otherwise unreproducible effects.

>
> By the way, my server has 16 GB of RAM, 512 GB SSD and an 8-core CPU.
> Running Ubuntu 16.04.2

At one point you mentioned that you're using a 32 bit desktop image
version. Any reason why you don't run a 64bit server image? Depending on
your load 16 GB may be a bit small for a full planet. The servers I know
have at least 32 GB.

>
> Any hints would be very appreciated and I will document them on the
> installation page. Thanks in advance!
>


cheers,
mmd




Archive powered by MHonArc 2.6.19+.

Top of page