overpass - [overpass] [PerfProject2016] Area creation

Subject: Overpass API developpement

List archive

[overpass] [PerfProject2016] Area creation

From: mmd <>
To:
Subject: [overpass] [PerfProject2016] Area creation
Date: Thu, 2 Jun 2016 23:22:47 +0200

Hi there,

in this week's mini-episode on the Overpass API Performance Project
2016, we'll take a look at area creation.

Areas are not true natives in OSM-land, but need to be created by a
dedicated job on the server, governed by set of area creation rules.
Area creation in general is quite resource intensive. In this post we
are going to investigate some options for improvement.

As baseline measurement, the current official version 0.7.52 is used on
a dedicated server with SSD storage.

* Baseline runtime: 13.5 hours

Besides a different compression schema (see last post), a number of CPU
profiling runs revealed an interesting issue. Here's where the Linux
`perf` tool comes in handy, as it quickly shows a significant percentage
of time spent in malloc/free operations. They're essentially caused by
an excessive object creation.

As you probably know, Overpass API stores geographically close-by OSM
entities in blocks of several hundred KB of size. A very common use case
is to load such a block from disk, iterate over all OSM
nodes/way/relations in there and check if they're in a list of given OSM
ids. In the source code, Roland refers to it as "collect_items_*".

While iterating over a block, we'll essentially create a lot of objects
representing e.g. a Node each ("de-serialization"). If the Node's OSM id
matches our list of OSM ids, we'll keep the object. Otherwise we have to
discard it, which happens most of the time.

This situation can be improved according to the approach outlined in
[2]. In cases where the OSM ids don't match, we skip the object creation
altogether.

Here's one option I used during testing: We just pass an evaluation
function to the Template database backend. This function retrieves the
OSM id directly from the buffer and returns it, usually as 64bit value.
This operation is very cheap, as it does not require any object
allocation. Now, we can directly compare OSM id with OSM id.

* Improved runtime: 4.5 hours

Reality is a bit more complex, as some predicates need a full blown
object rather than an OSM id. Those cases where excluded during testing.

In conclusion, by switching compression and avoiding object creation in
hot code paths, we can already reduce the runtime by 66%.

By leveraging Attic database, there's even further potential. So far,
area creation always processed the whole database each time ("full
run"). By considering only those changes since last area creation, we'll
end up with:

* Delta run: ~ 10 minutes (for 1 day worth of OSM data).

This would enable scenarios, where areas are updated (up to) several
times per day with little overall CPU usage (details in [3]).

Best,
mmd

[1] https://github.com/drolbr/Overpass-API/issues/258#issuecomment-197480938
[2] https://github.com/drolbr/Overpass-API/pull/208
[3] https://github.com/drolbr/Overpass-API/issues/141

[overpass] [PerfProject2016] Area creation, mmd, 06/02/2016

List archive

[overpass] [PerfProject2016] Area creation