-
Notifications
You must be signed in to change notification settings - Fork 103
Description
Test case
[date:"2017-01-01T00:00:00"][out:json][timeout:4000];
area["type"="boundary"]["ISO3166-2"="DE-NW"];
foreach(
node(area)[~"^(opening_hours|opening_hours:kitchen|opening_hours:warm_kitchen|happy_hours|delivery_hours|opening_hours:delivery|lit|smoking_hours|collection_times|service_times|fee)$"~"."][fee!=no][fee!=yes][lit!=no][lit!=yes]->.t; .t out tags;
way(area)[~"^(opening_hours|opening_hours:kitchen|opening_hours:warm_kitchen|happy_hours|delivery_hours|opening_hours:delivery|lit|smoking_hours|collection_times|service_times|fee)$"~"."][fee!=no][fee!=yes][lit!=no][lit!=yes]->.t; .t out tags;
);Issue
Query uses too much memory and aborts
Analysis
collect_items.h:
template < class Index, class Object, class Current_Iterator, class Attic_Iterator, class Predicate >
void collect_items_by_timestamp(const Statement* stmt, Resource_Manager& rman,
Current_Iterator current_begin, Current_Iterator current_end,
Attic_Iterator attic_begin, Attic_Iterator attic_end,
const Predicate& predicate, uint64 timestamp,
std::map< Index, std::vector< Object > >& result,
std::map< Index, std::vector< Attic< Object > > >& attic_result)
{
std::vector< std::pair< typename Object::Id_Type, uint64 > > timestamp_by_id;
reconstruct_items(stmt, rman, current_begin, current_end, predicate, result, timestamp_by_id, timestamp);
reconstruct_items(stmt, rman, attic_begin, attic_end, predicate, attic_result, timestamp_by_id, timestamp);
std::sort(timestamp_by_id.begin(), timestamp_by_id.end());Line 205 reconstruct_items(stmt, rman, current_begin, current_end, predicate, result, timestamp_by_id, timestamp);
Result: timestamp_by_id = std::vector of length 72374789
Line 206 reconstruct_items(stmt, rman, attic_begin, attic_end, predicate, attic_result, timestamp_by_id, timestamp);
Result: timestamp_by_id = std::vector of length 75855450
Most objects in timestamp_by_id have been stored with timestamp NOW. By separating those entries in two separate vectors, almost half of the memory could be saved:
- Vector 1: timestamp_by_id_attic: all entries where timestmap != NOW, format as today
- Vector 2: timestamp_by_id_now: only entries where timestamp == NOW. It is implicitly clear that any object id in this vector has timestamp NOW, i.e. we only have to store object ids but no timestamp.
Calculation:
(timestamp_by_id_now * 8 bytes + timestamp_by_id_attic * 16 bytes) / (timestamp_by_id * 16 bytes)
(72374789 * 8 + (75855450 - 72374789) * 16) / (75855450 * 16)
= 0.5229427219797654
(that's based on 64bit node ids, savings in % will be larger for way/relation ids).
Testcase 2
[timeout:1200][adiff:"2017-08-06T22:39:23Z","2017-08-06T22:43:13Z"];
(node(36.3181693,5.5767073,47.8357181,18.9969694)(changed);
way(36.3181693,5.5767073,47.8357181,18.9969694)(changed););
out meta geom(36.3181693,5.5767073,47.8357181,18.9969694);Current:
User time (seconds): 126.50
System time (seconds): 9.21
Percent of CPU this job got: 85%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:38.45
Maximum resident set size (kbytes): 8573240 <<<<<<
Minor (reclaiming a frame) page faults: 7166159
Page size (bytes): 4096
Exit status: 0
Improved:
User time (seconds): 110.13
System time (seconds): 5.31
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:55.71
Maximum resident set size (kbytes): 4379196 <<<<<<
Minor (reclaiming a frame) page faults: 3409217
Page size (bytes): 4096
Exit status: 0