Skip to content

drain manager: Refactor DrainManagerImpl with simulated time#11138

Merged
mattklein123 merged 24 commits into
envoyproxy:masterfrom
aunu53:drain
Jun 1, 2020
Merged

drain manager: Refactor DrainManagerImpl with simulated time#11138
mattklein123 merged 24 commits into
envoyproxy:masterfrom
aunu53:drain

Conversation

@aunu53
Copy link
Copy Markdown

@aunu53 aunu53 commented May 11, 2020

drain_manager: Refactor DrainManagerImpl with simulated time
* Replace recurring timer in DrainManagerImpl with a single timer that can be tested via simulated time.
* Clarify the behaviour where probability of drain close increases as the deadline is approached, and add mocks on the random() call to test this
* Rely on the system scheduler rather than tick time
* Had to remove the "drain tick #x" trace log
* Remove overly explicit includes
* Add TODO for integration tests on this via a real time system, once TestDrainManager is removed (that's a separate thread of work)

Additional Description: Preparatory work for enhancing graceful drain functionality.
Risk Level: Medium. This is changing a core dependency (via the server) that hasn't been touched in a while, but the behaviour should be a noop.
Testing: I added a lot more testing coverage, and some aspects of this (the probabilistic draining) weren't being tested at all prior.

Auni Ahsan added 9 commits May 6, 2020 14:50
Signed-off-by: Auni Ahsan <auni@google.com>
wip
Signed-off-by: Auni Ahsan <auni@google.com>
Signed-off-by: Auni Ahsan <auni@google.com>
Signed-off-by: Auni Ahsan <auni@google.com>
Signed-off-by: Auni Ahsan <auni@google.com>
Signed-off-by: Auni Ahsan <auni@google.com>
Signed-off-by: Auni Ahsan <auni@google.com>
Signed-off-by: Auni Ahsan <auni@google.com>
Signed-off-by: Auni Ahsan <auni@google.com>
Signed-off-by: Auni Ahsan <auni@google.com>
Comment thread source/server/drain_manager_impl.cc
Signed-off-by: Auni Ahsan <auni@google.com>
@aunu53
Copy link
Copy Markdown
Author

aunu53 commented May 12, 2020

/retest

@repokitteh-read-only
Copy link
Copy Markdown

🔨 rebuilding ci/circleci: api (failed build)

🐱

Caused by: a #11138 (comment) was created by @auni53.

see: more, trace.

Comment thread source/server/drain_manager_impl.cc Outdated
Comment thread source/server/drain_manager_impl.cc Outdated
Comment thread source/server/drain_manager_impl.cc Outdated
@jmarantz
Copy link
Copy Markdown
Contributor

/wait

Signed-off-by: Auni Ahsan <auni@google.com>
@aunu53
Copy link
Copy Markdown
Author

aunu53 commented May 13, 2020

Sorry for the format presubmit fails...I thought I had it setup to automatically lint before pushing, trying to fix that.

Signed-off-by: Auni Ahsan <auni@google.com>
@jmarantz
Copy link
Copy Markdown
Contributor

/wait

Signed-off-by: Auni Ahsan <auni@google.com>
@aunu53
Copy link
Copy Markdown
Author

aunu53 commented May 14, 2020

/retest

@repokitteh-read-only
Copy link
Copy Markdown

🐴 hold your horses - no failures detected, yet.

🐱

Caused by: a #11138 (comment) was created by @auni53.

see: more, trace.

Signed-off-by: Auni Ahsan <auni@google.com>
@aunu53
Copy link
Copy Markdown
Author

aunu53 commented May 14, 2020

I don't understand why these presubmits are failing...unable to reproduce yet, and my local bazel //test/... is fine. Going to merge from master and push up again.

Signed-off-by: Auni Ahsan <auni@google.com>
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically looks good modulo one last naming nit.

Comment thread test/server/drain_manager_impl_test.cc Outdated
Auni Ahsan added 2 commits May 15, 2020 01:06
Signed-off-by: Auni Ahsan <auni@google.com>
Signed-off-by: Auni Ahsan <auni@google.com>
@aunu53
Copy link
Copy Markdown
Author

aunu53 commented May 15, 2020

Fix the naming nit, is this good to go?

jmarantz
jmarantz previously approved these changes May 15, 2020
@jmarantz
Copy link
Copy Markdown
Contributor

@envoyproxy/senior-maintainers

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks. LGTM with one comment.

/wait

Comment thread source/server/drain_manager_impl.cc Outdated
Signed-off-by: Auni Ahsan <auni@google.com>
@aunu53
Copy link
Copy Markdown
Author

aunu53 commented May 20, 2020

(I'm waiting to resolve #11240 before introducing more changes.)

Auni Ahsan added 2 commits May 28, 2020 13:42
Signed-off-by: Auni Ahsan <auni@google.com>
Signed-off-by: Auni Ahsan <auni@google.com>
@aunu53
Copy link
Copy Markdown
Author

aunu53 commented May 28, 2020

Fixed the ternary, added an expected_delay to the mock expect, and put up #11345 which addresses the tangential race condition. Hope this is good to go!

@aunu53 aunu53 requested a review from mattklein123 May 28, 2020 14:20
Signed-off-by: Auni Ahsan <auni@google.com>
@aunu53
Copy link
Copy Markdown
Author

aunu53 commented May 28, 2020

/retest

@repokitteh-read-only
Copy link
Copy Markdown

🤷‍♀️ nothing to rebuild.

🐱

Caused by: a #11138 (comment) was created by @auni53.

see: more, trace.

// P(return true) = elapsed time / drain timeout
const auto remaining_time =
std::chrono::duration_cast<std::chrono::seconds>(drain_deadline_ - current_time);
const auto elapsed_time = server_.options().drainTime() - remaining_time;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a guarantee that drainTime() >= current_time ? Or should we do a compare first? I don't totally understand the surrounding draining architecture, but it looks like drainClose() can be called by something not controlled by the timer, so current_time_ may just be later. In that case maybe we should skip the subtraction underflow and compare to random and just return true?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you mean drain_deadline_ (the computed deadline time point) and not drainTime() (the server configuration)? This case is tested for at drain_manager_impl_test.cc:83. If current_time > drain_deadline, the probability of drain close is 100%,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I meant checking server_.options().drainTime() >= remaining_time

or is that guaranteed to be true due to the wait remaining_time is calculated?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(psuedo variables)

drain_deadline = drain_timeout + previous_time
remaining_time = drain_deadline - current_time
remaining_time = drain_timeout + previous_time - current_time
current_time >= previous time, therefore remaining_time <= drain_timeout

Math checks out to me!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome. add assert then?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's the exact ASSERT you wanted but

drain >= remaining
drain - remaining >= 0
elapsed = drain - remaining
elapsed >= 0

so I added ASSERT(elapsed_time >= 0);

Copy link
Copy Markdown
Contributor

@jmarantz jmarantz May 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not, if elapsed_time is ultimately a uint_something it would be true by defintion.

I meant asserting the subtraction is non-underflowing.

ASSERT(server_.options().drainTime() >= remaining_time);

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: Auni Ahsan <auni@google.com>
@aunu53
Copy link
Copy Markdown
Author

aunu53 commented Jun 1, 2020

@mattklein123 is this good to merge?

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

namespace {

class DrainManagerImplTest : public testing::Test {
constexpr int DrainTimeSeconds(600);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just use std::chrono::seconds here. Feel free to do this change in one of your other PRs.

@mattklein123 mattklein123 merged commit c6642ba into envoyproxy:master Jun 1, 2020
@aunu53 aunu53 deleted the drain branch June 1, 2020 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants