Only add ways used by relations to the store #292

systemed · 2021-08-23T17:38:01Z

Currently, on reading each way from the .pbf, we add it to the way store. But we don't actually need to keep all ways for later use - only those which are part of (multipolygon) relations.

This WIP PR has an additional, early relation-scanning phase when reading the PBF. This notes which ways are used in relations, and stores them in a std::vector<bool>. That means that when we come to read the ways, we only need to store those ones which we know will be used in relations.

This tries to reserve an appropriate size for the vector on init, which depends whether we're running in compact mode or not. If in non-compact mode, then it's 0.25GB to cope with the likely maximum way ID. This is likely inefficient for the smallest extracts, where the way store would be less than 0.25GB. We could potentially guess this from a heuristic such as file size or number of nodes, and disable the relation-scanning phase in that case.

There's obviously a timing impact from the extra PBF phase. This, and memory usage, need measuring!

This also slightly breaks the progress display (which can halt for a long time while the PBF is re-read).

systemed · 2021-09-10T19:57:25Z

Memory usage ("Maximum resident set size (kbytes)" from /usr/bin/time) with this:

Great Britain:
PR 292, in-memory:                        14257420
PR 292, in-memory, --compact:             13038972
PR 292, using --store on SSD:             11766208
Current master, in-memory:                16203216 (13% more)
Current master, in-memory, --compact:     15022600 (15% more)
Current master, using --store on SSD:     13907484 (18% more)
Current master, in-memory, with .clear(): 16157624
Current master, --store, with .clear():   13888208

Oxfordshire:
PR 292, in-memory:                         3849008
PR 292, in-memory, --compact:              3680760
PR 292, using --store on SSD:              1794492
Current master, in-memory:                 3748048 (2% less)
Current master, in-memory, --compact:      3745708 (2% more)
Current master, using --store on SSD:      1715492 (4% less)

kleunen · 2021-09-11T09:05:16Z

I am not completely sure if this is a fair comparison for the --store. For the in-memory processing, clearly it will make a difference. But when using the store, it is possible that only the amount of cached on-disk storage is reduced. But when you are constrained on memory, this actually makes no difference.

I tried this PR on my 8G RAM + 8G swap machine, with converting the africa-latest.osm.pbf. And it does not seem to make any difference. It still gets killed with OOM when generating the tiles.

After applying the outputobject optimizations, it did make a difference. Before, it was killed when loading the ways. But after applying these optimizations, i was able to load the nodes/ways/relations and it started generating tiles. But then got killed with OOM during the tile generation phase.

So i guess it is useful for in-memory processing, but when using the --store, i doubt you will get any actual benefit from this. But when processing in-memory, you can actually clear all loaded nodes/ways/relations and just keep the generated geometries. I think now only the nodes are cleared ?

systemed · 2021-09-11T10:38:18Z

I've been running various configurations with /usr/bin/time -v on the GB and Oxfordshire extracts as per the table above. I'd expected this PR not to benefit the smaller extract (because we have the added overhead of the used_ways structure, but were reading in comparatively few ways anyway), but make more of a difference on the larger extract. That seems to be the case, though I'm pleasantly surprised that the increase in the Oxfordshire case is only around 2%.

As part of this PR I'd added a call to clear all loaded nodes/ways/relations after reading the .pbf: https://github.com/systemed/tilemaker/pull/292/files#diff-e10c099c7c412dae9f968fc46ec087a2092c75dbd17ec85d6c91af00089880f5R343

I've now experimented with backporting just that call to current master, too (see the lines in the GB table marked "with .clear()").

The results seem to be:

Backporting the .clear() call doesn't make much difference (a 0.2% saving)
This PR does make a big difference - around 2GB memory saving (=13-18%) on Great Britain
The saving applies both with --store and without

This wasn't in a memory-constrained situation, of course - I have 144GB and this was reaching 16GB maximum.

I'll experiment with a bigger extract next, where it'd hit the limit of physical memory, and see how that works. I think it's possible that, as you say, with --store this PR won't significantly change the size of extract that can be processed. But it might be quicker because we write less data to disk and, potentially, read less too (i.e. we need to skip about less in the mmap to find the ways we actually need).

kleunen · 2021-09-11T11:02:23Z

But it might be quicker because we write less data to disk and, potentially, read less too (i.e. we need to skip about less in the mmap to find the ways we actually need).

yes, that is definitely possible. I think it is worthwhile to add, just make sure in the --store case it does not slow down the process.

systemed · 2021-09-13T21:41:38Z

Seems to work well with --store, and we're going to need an extra relation pass like this for handling boundary relations anyway, so I think we're good to go.

systemed added 2 commits August 23, 2021 18:27

Only add ways used by relations to the store

73994a9

Use size_t to avoid overflow

592d649

systemed mentioned this pull request Aug 23, 2021

Ideas for further reducing memory usage #293

Closed

systemed and others added 2 commits August 23, 2021 21:06

Clear usedWays with everything else

b370fa9

Add mutex

0893aa9

systemed mentioned this pull request Aug 24, 2021

Relation reading performance #295

Closed

systemed added 3 commits August 26, 2021 21:05

% progress; fix missing ways

94bd772

Clear store after use

d0e3d5f

Merge branch 'master' into way_usage

d7115f9

systemed mentioned this pull request Sep 13, 2021

Planet generation experiences #315

Open

4 tasks

systemed merged commit f45032d into master Sep 13, 2021

systemed deleted the way_usage branch September 13, 2021 21:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Only add ways used by relations to the store #292

Only add ways used by relations to the store #292

Uh oh!

systemed commented Aug 23, 2021 •

edited

Loading

Uh oh!

systemed commented Sep 10, 2021 •

edited

Loading

Uh oh!

kleunen commented Sep 11, 2021

Uh oh!

systemed commented Sep 11, 2021 •

edited

Loading

Uh oh!

kleunen commented Sep 11, 2021

Uh oh!

systemed commented Sep 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Only add ways used by relations to the store #292

Only add ways used by relations to the store #292

Uh oh!

Conversation

systemed commented Aug 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

systemed commented Sep 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kleunen commented Sep 11, 2021

Uh oh!

systemed commented Sep 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kleunen commented Sep 11, 2021

Uh oh!

systemed commented Sep 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

systemed commented Aug 23, 2021 •

edited

Loading

systemed commented Sep 10, 2021 •

edited

Loading

systemed commented Sep 11, 2021 •

edited

Loading