stats: Reduce stat memory by removing duplicate string storage in map keys.#4711
stats: Reduce stat memory by removing duplicate string storage in map keys.#4711jmarantz wants to merge 40 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Joshua Marantz <jmarantz@google.com>
… as a key. Also reduce HeapStatAllocator key overhead by 8 bytes by just using the char*. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ction boundaries around HeapStatData construction/destruction. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…mbol-table mem. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
Cool stuff. Two comments from a quick skim.
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
Closing temporarily until (a) #4714 can be merged and (b) I write a stats impl overview per @mattklein123's suggestion above. |
…tats are robust at runtime. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ce/doc/stats/md (#4729) per @mattklein123 's suggestion on #4711 , adds a stats design doc. In this rev it only pulls in material from a comment in thread_local_store.h. A few TBD sections were added to suggest structure, but the intent is not to fill them out in this PR, which just moves the existing material and adds an overview for context. Future stats-related PRs should include edits to this file to keep it up-to-date. This is one step toward resolving #4196. Risk Level: low Testing: none Docs Changes: this is a doc change. Release Notes: n/a Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…tructure. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ce/doc/stats/md (envoyproxy#4729) per @mattklein123 's suggestion on envoyproxy#4711 , adds a stats design doc. In this rev it only pulls in material from a comment in thread_local_store.h. A few TBD sections were added to suggest structure, but the intent is not to fill them out in this PR, which just moves the existing material and adds an overview for context. Future stats-related PRs should include edits to this file to keep it up-to-date. This is one step toward resolving envoyproxy#4196. Risk Level: low Testing: none Docs Changes: this is a doc change. Release Notes: n/a Signed-off-by: Joshua Marantz <jmarantz@google.com> Signed-off-by: Yang Song <yasong@yasong00.cam.corp.google.com>
… name structure (envoyproxy#4741) Describe the current state of stat name representation, as a pre-cursor to envoyproxy#4711 which should be updated to modify the stat naming doc. Risk Level: low Testing: none Docs Changes: n/a Release Notes: n/a Signed-off-by: Joshua Marantz <jmarantz@google.com> Signed-off-by: Yang Song <yasong@yasong00.cam.corp.google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…eflect intent. Would appreciate suggestions on how to improve the name, if you have any. Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ings-in-map-keys Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
jmarantz
left a comment
There was a problem hiding this comment.
ptal -- this now directly shows memory-reduction in unit-tests, as well as an associated edit of the stats design doc.
mattklein123
left a comment
There was a problem hiding this comment.
In general this makes sense to me. However, do you want to get a speed benchmark in place before we merge this? Even for this change it seems worth it?
| auto data = std::make_unique<HeapStatData>(name); | ||
| // required to use this allocator. Note that data must be freed by calling | ||
| // its free() method, and not by destructing or unique_ptr. So this method | ||
| // cannot call anything that might throw, and thus it cannot through itself. |
| @@ -77,7 +77,7 @@ struct RawStatData { | |||
| /** | |||
| * Returns the name as a std::string. | |||
There was a problem hiding this comment.
done and switched to ersatz-doxygen here and elsewhere in the class.
| // required to use this allocator. | ||
| auto data = std::make_unique<HeapStatData>(name); | ||
| // required to use this allocator. Note that data must be freed by calling | ||
| // its free() method, and not by destructing or unique_ptr. So this method |
There was a problem hiding this comment.
Up to you, but you could use CSmartPtr (search code) to use RAII around calling free().
There was a problem hiding this comment.
CSmartPtr doesn't quite work in this scenario, but unique_ptr does support a custom deletion functor so I just used it directly.
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
There was a problem hiding this comment.
perf test in #4867 and merged here, showing a slight degradation in performance with non TLS, and a slight increase in performance with TLS, though these could be due to transient noisy neighbors as these runs were not properly interleaved.
The big win in both perf and memory comes from combining this and flat-hash-map. @mattklein123 would you prefer I just issue a combined PR for those?
old:
--------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------
BM_StatsNoTls_mean 16249144 ns 16248983 ns 37
BM_StatsNoTls_median 16176011 ns 16175443 ns 37
BM_StatsNoTls_stddev 484905 ns 484909 ns 37
BM_StatsWithTls_mean 20958468 ns 20958244 ns 30
BM_StatsWithTls_median 20899829 ns 20899217 ns 30
BM_StatsWithTls_stddev 576702 ns 576784 ns 30
new:
--------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------
BM_StatsNoTls_mean 17098922 ns 17099054 ns 34
BM_StatsNoTls_median 17049225 ns 17049419 ns 34
BM_StatsNoTls_stddev 354827 ns 354787 ns 34
BM_StatsWithTls_mean 20593702 ns 20593854 ns 30
BM_StatsWithTls_median 20527260 ns 20527525 ns 30
BM_StatsWithTls_stddev 532875 ns 532881 ns 30
flat-hash-map (#4715)
--------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------
BM_StatsNoTls_mean 13582401 ns 13582484 ns 40
BM_StatsNoTls_median 13519546 ns 13519464 ns 40
BM_StatsNoTls_stddev 269552 ns 269544 ns 40
BM_StatsWithTls_mean 18171743 ns 18171829 ns 28
BM_StatsWithTls_median 18116579 ns 18116550 ns 28
BM_StatsWithTls_stddev 257126 ns 257189 ns 28
combined (#4872)
--------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------
BM_StatsNoTls_mean 13779759 ns 13779854 ns 39
BM_StatsNoTls_median 13630165 ns 13630325 ns 39
BM_StatsNoTls_stddev 576447 ns 576471 ns 39
BM_StatsWithTls_mean 16336826 ns 16336967 ns 35
BM_StatsWithTls_median 16240568 ns 16240563 ns 35
BM_StatsWithTls_stddev 544943 ns 544942 ns 35
| // required to use this allocator. | ||
| auto data = std::make_unique<HeapStatData>(name); | ||
| // required to use this allocator. Note that data must be freed by calling | ||
| // its free() method, and not by destructing or unique_ptr. So this method |
There was a problem hiding this comment.
CSmartPtr doesn't quite work in this scenario, but unique_ptr does support a custom deletion functor so I just used it directly.
| auto data = std::make_unique<HeapStatData>(name); | ||
| // required to use this allocator. Note that data must be freed by calling | ||
| // its free() method, and not by destructing or unique_ptr. So this method | ||
| // cannot call anything that might throw, and thus it cannot through itself. |
| @@ -77,7 +77,7 @@ struct RawStatData { | |||
| /** | |||
| * Returns the name as a std::string. | |||
There was a problem hiding this comment.
done and switched to ersatz-doxygen here and elsewhere in the class.
…ass method comments. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
I decided to close this for now in favor of #4872 which incorporates the flat_hash_map change. That avoids the risk that this PR on its own has of slightly degrading one axis of performance (though improving several others). LMK if you still think this one makes sense to go in on its own; the argument for that is that this change is complicated but bounded in scope. |
**Note: this code is ready IMO, but #4714 should be checked in first to get a baseline. **
Description: A significant amount of storage is wasted in the maps in ThreadLocalStore with copies of the strings. This changes the maps to use const char* keys, which are held in every form of stat. This is one step toward resolving #4196. It reduces memory from 68k/cluster to 54k/cluster in the Istio test referenced #4196.
A significant follow-up to this would be to tackle the redundant storage for tagExtractedName and TagVector.
Risk Level: low
Testing: //test/...
Docs Changes: n/a
Release Notes:n/a