Skip to content

stats: Use a more compact rep for SymbolTable and add tests for memory usage#4696

Merged
mattklein123 merged 39 commits intoenvoyproxy:masterfrom
jmarantz:symbol-table-uint8-array
Oct 29, 2018
Merged

stats: Use a more compact rep for SymbolTable and add tests for memory usage#4696
mattklein123 merged 39 commits intoenvoyproxy:masterfrom
jmarantz:symbol-table-uint8-array

Conversation

@jmarantz
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz commented Oct 11, 2018

Description: SymbolTables are conceptually and structurally really powerful ways to reduce memory overhead for repeated patterns. However the space savings was limited by some taxes:

  • references in structures to aid in RAII -- we can use assert statement to get most of the value.
  • lots of very small integers held in arrays of uint32_t -- we can a utf-8-like encoding scheme
  • 16 bytes overhead for vectors, to store size and capacity -- we can store fixed size in 2 uint8s.

With these changes the raw space taken by all the stats in a 1k cluster system are reduced by 4x.
This is one step toward resolving #4196.

We will also be able to concatenate StatNames without accessing the global symbol table, enabling lock-free scoped name lookup.

Risk Level: low for now as they are not used yet, but probably wants to be fuzz-tested.
Testing: //test/common/stats/...
Docs Changes: n/a
Release Notes: n/a

…mpl_test.cc.

This just corrects that for two tests.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…an RAII to avoid leaks.

The RAII concept is nice but it's expensive; need to keep 8-byte
SymbolTable reference with each stat-name.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Copy Markdown
Contributor Author

@ambuc -- still looking at CI issues.

…nd comment out broken SSL test for now.

See envoyproxy#4703

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Comment thread source/common/stats/symbol_table_impl.h Outdated
…tions.

I think virtualizing these interfaces has a significant cost in
per-stat memory that I'd like to avoid. I don't at this point see a
compelling need to mock symbol tables or have alternate
representations.

We could of course virtualize SymbolTable which there's only one of, but
StatName does not want to be virtual.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@ambuc
Copy link
Copy Markdown
Contributor

ambuc commented Oct 12, 2018

This is great work -- One thing I would do is explicitly move the internal-only helper functions, bit-shifting magic, etc. into its own set of files, to make very explicit which parts are the interface for external use and which parts are private/protected. Ideally just the symbol table API would be exposed. I'd also make some differentiation between the two encode() functions for greppability.

…ializtion) and clean up stale pure virtual interface.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Comment thread test/common/ssl/utility_test.cc Outdated
bssl::UniquePtr<X509> cert = readCertFromFile("test/common/ssl/test_data/san_dns_cert.pem");
EXPECT_EQ(270, Utility::getDaysUntilExpiration(cert.get()));
}
/*
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will revert this once #4701 is submitted.

@jmarantz
Copy link
Copy Markdown
Contributor Author

jmarantz commented Oct 12, 2018

RE " internal-only helper functions, ... own set of files" -- I am not sure exactly what you are suggesting. We could have a separate file for stat_name, but in general I've put the functions that will be needed in the stats system as public, and the ones that that are needed only internal to the symbol table as private. All the non-trivial impls are in the .cc file.

Can you be more specific about what's exposed in a public function in the .h that shouldn't be?

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@ambuc
Copy link
Copy Markdown
Contributor

ambuc commented Oct 12, 2018

You're right -- I looked at the header again and it's plenty differentiated, nvm on that.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Copy Markdown
Contributor Author

@ambuc any more comments?

Any senior committers want to start taking a look at some fun bit-hacking?

…yping initially.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
… can be shared with tests for ThreadLocalStore improvements.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…mbol-table mem.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz jmarantz changed the title stats: Use a more compact rep for SymbolTable and add tests for memory usage WIP stats: Use a more compact rep for SymbolTable and add tests for memory usage Oct 13, 2018
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…tats are robust at runtime.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@stale
Copy link
Copy Markdown

stale Bot commented Oct 20, 2018

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale Bot added the stale stalebot believes this issue/PR has not been touched recently label Oct 20, 2018
@jmarantz
Copy link
Copy Markdown
Contributor Author

jmarantz commented Oct 20, 2018 via email

@stale stale Bot removed the stale stalebot believes this issue/PR has not been touched recently label Oct 20, 2018
@mattklein123 mattklein123 self-assigned this Oct 21, 2018
@mattklein123
Copy link
Copy Markdown
Member

I will review.

@jmarantz
Copy link
Copy Markdown
Contributor Author

jmarantz commented Oct 21, 2018 via email

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…eflect intent.

Would appreciate suggestions on how to improve the name, if you have any.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…le-uint8-array

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
… global symbol table.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz jmarantz changed the title WIP stats: Use a more compact rep for SymbolTable and add tests for memory usage stats: Use a more compact rep for SymbolTable and add tests for memory usage Oct 29, 2018
@jmarantz
Copy link
Copy Markdown
Contributor Author

this should be ready to go; ptal.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level this makes sense to me. Some random comments from a read through. Cool stuff!

Comment thread source/common/stats/symbol_table_impl.h Outdated
/**
* Decodes a uint8_t array into a SymbolVec.
*/
static SymbolVec decodeSymbols(const SymbolStorage array, size_t size);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: generally prefer explicit types. I would probably replace size_t everywhere with either uint32_t or uint64_t

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done, but WDYT about loop variables when looping until you hit stl_container.size()?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that's fine.

Comment thread source/common/stats/symbol_table_impl.h Outdated
Comment thread source/common/stats/symbol_table_impl.h Outdated
Comment thread source/common/stats/symbol_table_impl.h Outdated
* overhead for the size itself.
*/
size_t numBytes() const {
return symbol_array_[0] | (static_cast<size_t>(symbol_array_[1]) << 8);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about endian-ness here? Can we just cast/store the 2 bytes and then retrieve them? Might be simpler to read?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No not endianness, but alignment. In a future PR modeled off my experiments I was going to put all the StatNames needed for a Metric (tag-extracted name, and each tag and value) into a single allocated block. Will comment.

Comment thread source/common/stats/symbol_table_impl.cc Outdated
next_symbol_ = pool_.top();
pool_.pop();
}
// This should catch integer overflow for the new symbol.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we do overflow? Can this happen with a very long running server? Sorry I didn't originally review this code so am unfamiliar with the details.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing good will happen :). But we'd probably also run out of memory before we allocate 4B symbols (say averaging 32 bytes including std::string overhead).

Note that symbols are reference-counted and recycled, so there would have to be 4B referenced symbols.

}

StatNameStorage::~StatNameStorage() {
// StatNameStorage is not fully RAII: you must call free(SymbolTable&) to
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry how does this save 8 bytes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to make this RAII we need to store a reference to the SymbolTable& which would require 8 bytes overhead per StatNameStorage instance. Added comment.

Comment thread source/common/stats/symbol_table_impl.cc Outdated
Comment thread source/common/stats/symbol_table_impl.cc Outdated
uint8_t* StatNameJoiner::alloc(size_t num_bytes) {
bytes_.reset(new uint8_t[num_bytes + 2]);
uint8_t* p = bytes_.get();
*p++ = num_bytes & 0xff;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint16_t size assign?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

factored out length-assigner and commented there why this can't be done via uint16_t.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@mattklein123 mattklein123 merged commit 1bdc95a into envoyproxy:master Oct 29, 2018
@jmarantz jmarantz deleted the symbol-table-uint8-array branch October 29, 2018 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants