Subject: Overpass API developpement
List archive
- From: Roland Olbricht <>
- To:
- Subject: Re: [overpass] error with attic data
- Date: Sun, 15 May 2016 14:29:09 +0200
Hello Michael,
thank you for reporting the problem. There are four questions involved. I would like to disentangle them first.
> Things I learned: The first call to overpass with a fresh db directory
> has to be with --meta, --keep-attic (with and without
> --flush-size=0)will fail without clear explanation (lots of node x not
> found in way etc. when they clearly are present in the file). This might
> be an improvement for overpass, returning an error for --keep-attic when
> there is no db yet.
This is indeed a bug. What should work is the following:
1.
--meta --flush-size=nonzero_value
or
--meta
This should start a standard import. To get this done within reasonable time, this uses striping and merging. It won't work with multiple versions of an object.
2.
--keep-attic --flush-size=0
or
--keep-attic
This should not be using striping. It should work with multiple versions of an element.
The updater used to decide if is uses striping by whether it runs with a dispatcher or with an explicit directory. This turned out to be too confusing. I think the command line switches should take precedence, and the updater should report in which mode it runs.
So I found more time to look into the issue, why I can't import the cut
planet history into overpass. And after a weekend full of testing, I
think I found a minimal example of what is going wrong. I put the
example at https://osm.ch/share/way30913536.tar.gz for anyone to test.
I've drilled down the example even further:
<?xml version='1.0' encoding='UTF-8'?>
<osmChange version="0.6" generator="osmium/1.3.0">
<bounds minlon="8.3897999" minlat="47.0760000" maxlon="8.7061000" maxlat="47.2534000"/>
<modify>
<node id="343651629" version="5" timestamp="2009-02-09T12:16:06Z" uid="98286" user="Jens Nickel" changeset="313699" lat="47.185351" lon="8.166322"/>
</modify>
<delete>
<node id="343651629" version="6" timestamp="2009-02-09T12:16:40Z" uid="98286" user="Jens Nickel" changeset="313699"/>
</delete>
<modify>
<node id="343651883" version="3" timestamp="2009-02-09T12:16:12Z" uid="98286" user="Jens Nickel" changeset="313699" lat="47.195648" lon="8.517674"/>
</modify>
<delete>
<node id="343651883" version="4" timestamp="2009-02-09T12:16:26Z" uid="98286" user="Jens Nickel" changeset="313699"/>
</delete>
<modify>
<node id="343651926" version="3" timestamp="2009-02-09T12:16:12Z" uid="98286" user="Jens Nickel" changeset="313699" lat="47.180688" lon="8.525247"/>
</modify>
<delete>
<node id="343651926" version="4" timestamp="2009-02-09T12:16:27Z" uid="98286" user="Jens Nickel" changeset="313699"/>
</delete>
<modify>
<node id="343651934" version="3" timestamp="2009-02-09T12:16:12Z" uid="98286" user="Jens Nickel" changeset="313699" lat="47.181117" lon="8.527593"/>
</modify>
<delete>
<node id="343651934" version="4" timestamp="2009-02-09T12:16:28Z" uid="98286" user="Jens Nickel" changeset="313699"/>
</delete>
<modify>
<node id="343651935" version="3" timestamp="2009-02-09T12:16:12Z" uid="98286" user="Jens Nickel" changeset="313699" lat="47.181182" lon="8.52764"/>
</modify>
<delete>
<node id="343651935" version="4" timestamp="2009-02-09T12:16:28Z" uid="98286" user="Jens Nickel" changeset="313699"/>
</delete>
<modify>
<way id="30913536" version="2" timestamp="2009-02-09T12:16:12Z" uid="98286" user="Jens Nickel" changeset="313699">
<nd ref="343651629"/>
<nd ref="343651883"/>
<nd ref="343651926"/>
<nd ref="343651934"/>
<tag k="created_by" v="Potlatch 0.10f"/>
</way>
</modify>
<delete>
<way id="30913536" version="3" timestamp="2009-02-09T12:16:28Z" uid="98286" user="Jens Nickel" changeset="313699"/>
</delete>
</osmChange>
If you look carefully into the issue then you can see that node 343651883 has been deleted at 2009-02-09T12:16:26Z, but way 30913536 existed until 2009-02-09T12:16:28Z. It is this invalid data that derails the representation of that way.
Interestingly, the .eu instance does not return this way, and I tried a
few possible dates during which the way should exist.
As Daniel has pointed out, this is because the current Overpass API instance only delivers data since the license change.
Another thing I observed: Most OSM tools are tolerant to replays, in
case something went wrong with a diff, one can go back a few diffs and
re apply them crhonologically, and everything is fine again. Overpass
sems to have problems, complaining about duplicate objects. This replay
tolerance seems to me important, as usually, when something goes wrong
with the diffs, the OWG recommends to replay from a ceartain point in
time to fix the data. What is behind overpasses different behaviour?
Missing feature or or some reason reason that makes replays undesirable?
About replaying diffs: I can imagine three different levels of replay capabilities.
1. Undefined behaviour. In practice this means that it almost always works but sometimes painfully-difficult-to-find errors happen.
2. Predictable behaviour, even if not necessarily desirable. This includes emitting error messages as well as accepting that some constellations of data are wrong.
3. Doing all faithfully. This can be still painful if the logically consistent behaviour is different from the intuitively expected.
I want to get Overpass API along 2.:
- Objects with lower version numbers than already existing will be ignored with an error message.
- Objects with increasing version number and same timestamp are properly processed, but trigger error messages.
This ensures (or should ensure) that Overpass API is tolerant to replaying diffs of consistent data no matter in which order. What doesn't work if the data in the diff is flawed.
To get into the dirty details: The data is organized to assure two features:
- An attic object stays immutable and an existing object should only change by eventually getting an expiration date.
- The database is modular: the current data is directly accessible, without any dependence on attic data.
Now imagine two diffs as follows:
diff 101:
node 42 version 1 lat=7 lon=8
diff 102:
node 42 version 2 lat=17 lon=18
If somebody now imports diff 101, then diff 102, then again diff 101 and again diff 102, then:
- From the current-version users point of view, after step 3 the expected version would be 1.
- From the immutability point of view we would like to keep version 1 as attic since diff 102, and version 2 as attic after the second application of diff 101 as attic, because another version is current again
- Now we have two copies of version 1, one attic and one recent.
The problem gets even more intricate if you keep in mind that we have to construct the geometry of ways and relations from the referred objects and their timestamps.
With the chosen approach to ignore objects of the wrong version we get at least a consistent state after applying diff 102 again. To inform the user that applying diff 101 had no effect, Overpass API emits the error messages.
The problem is that we have altogether four timelines:
- The state of the real objects
- The version numbers on the representation
- The timestamps on the the representation
- The order in which the data has been applied
Best regards,
Roland
- [overpass] error with attic data, michael spreng, 05/14/2016
- Re: [overpass] error with attic data, mmd, 05/14/2016
- Re: [overpass] error with attic data, michael spreng, 05/15/2016
- Re: [overpass] error with attic data, mmd, 05/15/2016
- Re: [overpass] error with attic data, michael spreng, 05/15/2016
- Re: [overpass] error with attic data, Roland Olbricht, 05/15/2016
- Re: [overpass] error with attic data, michael spreng, 05/18/2016
- Re: [overpass] error with attic data, mmd, 05/18/2016
- Re: [overpass] error with attic data, Roland Olbricht, 05/19/2016
- Re: [overpass] error with attic data, mmd, 05/19/2016
- Re: [overpass] error with attic data, michael spreng, 05/18/2016
- Re: [overpass] error with attic data, mmd, 05/14/2016
Archive powered by MHonArc 2.6.18.