stats: Use a more compact rep for SymbolTable and add tests for memory usage#4696
Conversation
…mpl_test.cc. This just corrects that for two tests. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…an RAII to avoid leaks. The RAII concept is nice but it's expensive; need to keep 8-byte SymbolTable reference with each stat-name. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
@ambuc -- still looking at CI issues. |
…nd comment out broken SSL test for now. See envoyproxy#4703 Signed-off-by: Joshua Marantz <jmarantz@google.com>
…tions. I think virtualizing these interfaces has a significant cost in per-stat memory that I'd like to avoid. I don't at this point see a compelling need to mock symbol tables or have alternate representations. We could of course virtualize SymbolTable which there's only one of, but StatName does not want to be virtual. Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
This is great work -- One thing I would do is explicitly move the internal-only helper functions, bit-shifting magic, etc. into its own set of files, to make very explicit which parts are the interface for external use and which parts are private/protected. Ideally just the symbol table API would be exposed. I'd also make some differentiation between the two encode() functions for greppability. |
…ializtion) and clean up stale pure virtual interface. Signed-off-by: Joshua Marantz <jmarantz@google.com>
| bssl::UniquePtr<X509> cert = readCertFromFile("test/common/ssl/test_data/san_dns_cert.pem"); | ||
| EXPECT_EQ(270, Utility::getDaysUntilExpiration(cert.get())); | ||
| } | ||
| /* |
There was a problem hiding this comment.
I will revert this once #4701 is submitted.
|
RE " internal-only helper functions, ... own set of files" -- I am not sure exactly what you are suggesting. We could have a separate file for stat_name, but in general I've put the functions that will be needed in the stats system as public, and the ones that that are needed only internal to the symbol table as private. All the non-trivial impls are in the .cc file. Can you be more specific about what's exposed in a public function in the .h that shouldn't be? |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
You're right -- I looked at the header again and it's plenty differentiated, nvm on that. |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
@ambuc any more comments? Any senior committers want to start taking a look at some fun bit-hacking? |
…yping initially. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
… can be shared with tests for ThreadLocalStore improvements. Signed-off-by: Joshua Marantz <jmarantz@google.com>
…mbol-table mem. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…tats are robust at runtime. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
Ping
…On Sat, Oct 20, 2018, 3:08 PM stale[bot] ***@***.*** wrote:
This pull request has been automatically marked as stale because it has
not had activity in the last 7 days. It will be closed in 7 days if no
further activity occurs. Please feel free to give a status update now, ping
for review, or re-open when it's ready. Thank you for your contributions!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4696 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB2kPc9QCgxXAvElEjTagdORt6rWIKZ4ks5um3S2gaJpZM4XYe42>
.
|
|
I will review. |
|
If there are follow-up edits I hope to get to them on Wednesday.... Don't
have access to my computer now.
I was pinging just to avoid stalebot. The benchmark PR would be good to
get in first anyway, which would show more clearly the incremental benefit
of this change to StatName.
…On Sat, Oct 20, 2018, 10:37 PM Matt Klein ***@***.*** wrote:
I will review.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4696 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB2kPd5RwZhhYmfbwe7qVMp4mo-rmt9Iks5um94DgaJpZM4XYe42>
.
|
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…eflect intent. Would appreciate suggestions on how to improve the name, if you have any. Signed-off-by: Joshua Marantz <jmarantz@google.com>
…le-uint8-array Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
… global symbol table. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
this should be ready to go; ptal. |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
At a high level this makes sense to me. Some random comments from a read through. Cool stuff!
| /** | ||
| * Decodes a uint8_t array into a SymbolVec. | ||
| */ | ||
| static SymbolVec decodeSymbols(const SymbolStorage array, size_t size); |
There was a problem hiding this comment.
nit: generally prefer explicit types. I would probably replace size_t everywhere with either uint32_t or uint64_t
There was a problem hiding this comment.
OK, done, but WDYT about loop variables when looping until you hit stl_container.size()?
| * overhead for the size itself. | ||
| */ | ||
| size_t numBytes() const { | ||
| return symbol_array_[0] | (static_cast<size_t>(symbol_array_[1]) << 8); |
There was a problem hiding this comment.
Do we care about endian-ness here? Can we just cast/store the 2 bytes and then retrieve them? Might be simpler to read?
There was a problem hiding this comment.
No not endianness, but alignment. In a future PR modeled off my experiments I was going to put all the StatNames needed for a Metric (tag-extracted name, and each tag and value) into a single allocated block. Will comment.
| next_symbol_ = pool_.top(); | ||
| pool_.pop(); | ||
| } | ||
| // This should catch integer overflow for the new symbol. |
There was a problem hiding this comment.
What happens if we do overflow? Can this happen with a very long running server? Sorry I didn't originally review this code so am unfamiliar with the details.
There was a problem hiding this comment.
nothing good will happen :). But we'd probably also run out of memory before we allocate 4B symbols (say averaging 32 bytes including std::string overhead).
Note that symbols are reference-counted and recycled, so there would have to be 4B referenced symbols.
| } | ||
|
|
||
| StatNameStorage::~StatNameStorage() { | ||
| // StatNameStorage is not fully RAII: you must call free(SymbolTable&) to |
There was a problem hiding this comment.
Sorry how does this save 8 bytes?
There was a problem hiding this comment.
If we want to make this RAII we need to store a reference to the SymbolTable& which would require 8 bytes overhead per StatNameStorage instance. Added comment.
| uint8_t* StatNameJoiner::alloc(size_t num_bytes) { | ||
| bytes_.reset(new uint8_t[num_bytes + 2]); | ||
| uint8_t* p = bytes_.get(); | ||
| *p++ = num_bytes & 0xff; |
There was a problem hiding this comment.
factored out length-assigner and commented there why this can't be done via uint16_t.
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Description: SymbolTables are conceptually and structurally really powerful ways to reduce memory overhead for repeated patterns. However the space savings was limited by some taxes:
With these changes the raw space taken by all the stats in a 1k cluster system are reduced by 4x.
This is one step toward resolving #4196.
We will also be able to concatenate StatNames without accessing the global symbol table, enabling lock-free scoped name lookup.
Risk Level: low for now as they are not used yet, but probably wants to be fuzz-tested.
Testing: //test/common/stats/...
Docs Changes: n/a
Release Notes: n/a