Fix an allocator bug in hot_restart_impl, and add a test#1788
Fix an allocator bug in hot_restart_impl, and add a test#1788mattklein123 merged 9 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Greg Greenway <ggreenway@apple.com>
83a4acf to
32186df
Compare
there are unused entries followed by used entries, and the requested stat name already exists. Signed-off-by: Greg Greenway <ggreenway@apple.com>
32186df to
3da5304
Compare
|
Thanks this is a great find and a carry over from when we did not support stat removal. Will review soon. |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks very much for finding/fixing this. Also great to have unit test capability on this code now also.
| Stats::RawStatData* HotRestartImpl::alloc(const std::string& name) { | ||
| // Try to find the existing slot in shared memory, otherwise allocate a new one. | ||
| std::unique_lock<Thread::BasicLockable> lock(stat_lock_); | ||
| for (Stats::RawStatData& data : shmem_.stats_slots_) { |
There was a problem hiding this comment.
perf nit: I think you can still do this in 1 pass by keeping track of the first empty entry while you scan the entire range. Then if you don't find it, you immediately know where to put it or if you are OOM.
It occurs to me also that at startup we could built free list and a map from shared memory and it would all be O(1), but I probably wouldn't bother right now. Feel free to put in a TODO along those lines.
There was a problem hiding this comment.
I came up with the same ideas when I thought about this over the weekend.
I came across this bug in the context of making the number of stats tunable, and I was thinking about the O(n^2) complexity of allocating all the stats when N is larger. I'll keep discussions of how to make this substantially better over there, but I'll do this in one loop here.
| TEST(HotRestartImplTest, alloc) { | ||
| MockOsSysCalls os_sys_calls; | ||
| NiceMock<MockOptions> options; | ||
| CSmartPtr<void, ::free> buffer; |
There was a problem hiding this comment.
nit: for simplicity I would probably just make this a vector<uint8_t> or something like that (the you can just call resize(), return address of [0], etc.). But don't feel that strongly about it.
There was a problem hiding this comment.
Good idea. Side note: c++11 added std::vector::data() so we don't have to take the address of element 0 anymore: http://en.cppreference.com/w/cpp/container/vector/data
| public: | ||
| virtual ~OsSysCalls(){}; | ||
|
|
||
| virtual int shm_open(const char* name, int oflag, mode_t mode) PURE; |
There was a problem hiding this comment.
Nit: prefer to Envoy stylify the wrapper names, e.g. shmOpen.
There was a problem hiding this comment.
This I'm torn on. I like consistent naming (ie making this shmOpen), but its also useful that this is identical to the syscall it mirrors. Someone looking at the calling code knows exactly what manpage to lookup for semantics by looking at it. We get no guidance from Envoy::Filesystem::OsSysCalls because that happens to only use 1-word-named syscalls.
A middle ground could be shmOpen(), with a doc-comment pointing at the real function name and manpage.
There was a problem hiding this comment.
No real preference from me either way.
There was a problem hiding this comment.
If we envoy-stylify, is it fTruncate of ftruncate?
|
|
||
| class Instance; | ||
|
|
||
| class OsSysCalls { |
There was a problem hiding this comment.
Do you want to move this to its own file in somewhere more neutral than hotrestart? I think this would be a useful shim layer for syscalls to improve mocking and testing across the code base.
There was a problem hiding this comment.
I took the idea for this from Envoy::Filesystem::OsSysCalls. I can see the utility of both having one big OsSysCalls shim for all of envoy, and for having a smaller one per use. I'm fine with either; we should just decide which one we like better. @mattklein123 what's your opinion?
There was a problem hiding this comment.
It sounds good to me to have a single OS shim layer. Perhaps move into the Api namespace?
There was a problem hiding this comment.
Want me to do that in this PR or open a separate one?
There was a problem hiding this comment.
Either way. This PR is fine if it's easier for you.
There was a problem hiding this comment.
Ok, let's combine the OsSysCalls in a separate PR then. Keep the intent/scope of this one clean.
|
Do you guys like having the syscalls in their own compilation unit to avoid needing additional link libs in the tests, or would you rather they're combined with hot_restart_impl.cc? |
Signed-off-by: Greg Greenway <ggreenway@apple.com>
No big preference. How you have it now is fine with me. If you move it all into Api namespace it would likely end up being it's own impl anyway. |
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
|
It will be cleaner if #1791 gets merged first. |
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
|
@htuch @mattklein123 This is all merged with #1791 and I think all comments are addressed. |
| Stats::RawStatData* unused = nullptr; | ||
| std::unique_lock<Thread::BasicLockable> lock(stat_lock_); | ||
| for (Stats::RawStatData& data : shmem_.stats_slots_) { | ||
| if (!data.initialized()) { |
There was a problem hiding this comment.
Ugh, I put it back to the broken state when I was cleaning up the 2-->1 loops stuff, but I reversed the order so the test passes.
There was a problem hiding this comment.
Oh wait, now I've confused myself, I think this is still ok.. Either way, need a better test, which I'll write right now.
Signed-off-by: Greg Greenway <ggreenway@apple.com>
mattklein123
left a comment
There was a problem hiding this comment.
As long as you are making the tests better, do you mind also testing the OOM case also?
|
I have that in progress on my branch to make stats variable-sized. Ok if it waits for that? |
|
Sure that's fine. |
The android define logger conflicts with the logger lambda API such that the logger lambda would not be called if Envoy Mobile is built with the flag. Signed-off-by: Alan Chiu <achiu@lyft.com> For an explanation of how to fill out the fields, please see the relevant section in [PULL_REQUESTS.md](https://github.com/envoyproxy/envoy/blob/master/PULL_REQUESTS.md) Description: android: release builds avoids defining logger Risk Level: low Testing: local Docs Changes: n/a Release Notes: n/a [Optional Fixes #Issue] [Optional Deprecated:] Signed-off-by: JP Simard <jp@jpsim.com>
The android define logger conflicts with the logger lambda API such that the logger lambda would not be called if Envoy Mobile is built with the flag. Signed-off-by: Alan Chiu <achiu@lyft.com> For an explanation of how to fill out the fields, please see the relevant section in [PULL_REQUESTS.md](https://github.com/envoyproxy/envoy/blob/master/PULL_REQUESTS.md) Description: android: release builds avoids defining logger Risk Level: low Testing: local Docs Changes: n/a Release Notes: n/a [Optional Fixes #Issue] [Optional Deprecated:] Signed-off-by: JP Simard <jp@jpsim.com>
…ng (#1788) **Description** add unit tests for InferencePool edge cases and metadata handling --------- Signed-off-by: liuhy <liuhongyu@apache.org>
No description provided.