event: scaled timers using dynamic queues#13129
event: scaled timers using dynamic queues#13129mattklein123 merged 21 commits intoenvoyproxy:masterfrom
Conversation
Add an interface for timers that can be enabled for some timeout within a range. The actual choice of when the timer should be triggered is not specified by the interface. Possible implementations of the interface include - timers that are triggered when a thread is not busy - timers that are triggered early/late in response to load Signed-off-by: Alex Konradi <akonradi@google.com>
The Google Test repository has endorsed Abseil's "live at head" philosophy. Bump to a newer commit that includes google/googletest/envoyproxy#2350 for ease of writing tests. Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
Add a ScaledRangeTimerManager class that produces RangeTimers that can be adjusted after the fact by changing the scaling factor on the manager. This class implements the dynamic queues approach suggested in envoyproxy#11427#issuecomment-691154144. The code is tested but not yet integrated anywhere. Signed-off-by: Alex Konradi <akonradi@google.com>
|
/assign @antoniovicente |
DoAll now needs to be pulled into scope explicitly with a `using` declaration. Signed-off-by: Alex Konradi <akonradi@google.com>
antoniovicente
left a comment
There was a problem hiding this comment.
Thanks Alex, this seems like huge progress towards implementation of overload manager aware timeouts. Sorry for the laundry list of comments, it sometimes unavoidable on PRs of this size and complexity.
| ASSERT(!queue.range_timers.empty()); | ||
| auto item = std::move(queue.range_timers.front()); | ||
| queue.range_timers.pop_front(); | ||
| item.timer.trigger(); |
There was a problem hiding this comment.
Do you need to trigger timers in a loop in cases where there are multiple timers at the head of the queue that have already expired?
Note that real timers always execute in the next iteration of the event loop. This implies that in the current implmentation at most 1 scaled timer can trigger per event loop iteration.
Another thing to consider is how to integrate with #13104. Ideally we'ld touch the watchdogs between each scaled timer execution; you'll need help from the dispatcher to do that.
Please add a test that verifies that multiple timers that are older than the current monotonic time fire as a group even if their exact trigger times are different. The way to verify that the timers fire on the same iteration requires hooking into the event loop's prepare callback as done in test/common/event/dispatcher_impl_test.cc test cases including SchedulableCallbackImplTest.RescheduleNext
There was a problem hiding this comment.
Do we want to fire multiple timers as a group? It seems like we can get the same watchdog petting between callback executions by rescheduling the queue's timer with duration 0. I guess that might be extra expensive, though.
There was a problem hiding this comment.
Timers with 0 duration run on the next iteration of the event loop after #11823
In order to schedule a second call in the same loop iteration you would need to use to use SchedulableCallback::scheduleCallbackCurrentIteration to schedule in cases where remaining max-min duration is 0.
There was a problem hiding this comment.
Triggering expired timers as a group seems like a good way to implement this. We can address the watchdog petting issues in a followup.
There was a problem hiding this comment.
Changed this to trigger multiple timers when a queue timer expires.
| WaitingForMin(std::chrono::milliseconds duration) : duration(duration) {} | ||
|
|
||
| // The number for the bucket this timer will be placed in. | ||
| const std::chrono::milliseconds duration; |
There was a problem hiding this comment.
nit: Could use a more descriptive name like scalable_duration. Also, the comment seems out of date.
|
|
||
| ScaledRangeTimerManager& manager_; | ||
| const TimerCb callback_; | ||
| const TimerPtr pending_timer_; |
There was a problem hiding this comment.
nit: Possilbly better name: min_duration_timer_
| const ScopeTrackedObject* scope_; | ||
| }; | ||
|
|
||
| ScaledRangeTimerManager::ScaledRangeTimerManager(Dispatcher& dispatcher, double scale_factor) |
There was a problem hiding this comment.
I would have expected scale factor to always be 1.0 on construction. Consider removing scale_factor constructor argument.
| EXPECT_FALSE(timer->enabled()); | ||
| } | ||
|
|
||
| TEST_F(ScaledRangeTimerManagerTest, DisableWhileActive) { |
There was a problem hiding this comment.
What does pending and active mean in this context? Seems different from the states encoded in the RangeTimerImpl absl::variant
There was a problem hiding this comment.
Pending was from an old implementation, should be fixed now.
There was a problem hiding this comment.
What does "active" mean in this context?
I see other references to "active" below, it seems that it refers to ScalingMax.
|
|
||
| timer->enableTimer(std::chrono::seconds(5), std::chrono::seconds(100)); | ||
|
|
||
| simTime().advanceTimeAsync(std::chrono::seconds(5)); |
There was a problem hiding this comment.
Worth adding some assertions about the queue state after the advance and run?
There was a problem hiding this comment.
It's not part of the public interface, so I'm hesitant to test it directly.
|
|
||
| TEST_F(ScaledRangeTimerManagerTest, ScheduledWithScalingFactorZero) { | ||
| ScaledRangeTimerManager manager(dispatcher_, 0.0); | ||
|
|
There was a problem hiding this comment.
Please add a test that covers use of setScaleFactor. There seems to be no tests that exercise that function.
Also, prefer use of setScaleFactor instead of passing in a scale value to the constructor.
I recommend exercising some some in-between scale factors like 0.5 with timers that are expected to fire with different orders when scale = 1.0 and scale = 0.5
Cover both the case of timers added before the scale change and timers added after the scale change. Bonus points for mixing the two.
There was a problem hiding this comment.
MultipleTimersWithScaling does some of this. I've added some more tests.
|
|
||
| EXPECT_THAT(*timers[0].trigger_times, ElementsAre(T + std::chrono::seconds(3))); | ||
| EXPECT_THAT(*timers[1].trigger_times, ElementsAre(T + std::chrono::seconds(3))); | ||
| EXPECT_THAT(*timers[2].trigger_times, ElementsAre(T + std::chrono::seconds(3))); |
There was a problem hiding this comment.
See the comment I added to ScaledRangeTimerManager::onQueueTimerFired
There was a problem hiding this comment.
The bit about heterogeneous lookup? Please clarify.
Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
…aled-timer Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
antoniovicente
left a comment
There was a problem hiding this comment.
Almost there. Thanks again for this very exciting change.
| timers.emplace_back(manager, simTime()); | ||
| } | ||
|
|
||
| const MonotonicTime T = simTime().monotonicTime(); |
There was a problem hiding this comment.
The variable name 'T' doesn't follow style guidelines. I recommend changing the name to "start"'
|
|
||
| EXPECT_THAT(*timers[0].trigger_times, ElementsAre(T + std::chrono::seconds(2))); | ||
| EXPECT_THAT(*timers[1].trigger_times, ElementsAre(T + std::chrono::seconds(2))); | ||
| EXPECT_THAT(*timers[2].trigger_times, ElementsAre(T + std::chrono::seconds(2))); |
There was a problem hiding this comment.
Please add a test that verifies that multiple timers that are older than the current monotonic time fire as a group even if their exact trigger times are different. The way to verify that the timers fire on the same iteration requires hooking into the event loop's prepare callback as done in test/common/event/dispatcher_impl_test.cc test cases including SchedulableCallbackImplTest.RescheduleNext
There was a problem hiding this comment.
Thanks for the pointer. Added a test.
| ScaledRangeTimerManager::ScalingTimerHandle handle_; | ||
| }; | ||
|
|
||
| void onPendingTimerComplete() { |
There was a problem hiding this comment.
nit about naming consistency: I think you used to refer to WaitingForMin as Pending. Consider changing name of this function to onMinTimerComplete
| * Implementation of RangeTimer that can be scaled by the backing manager object. | ||
| * | ||
| * Instances of this class exist in one of 3 states: | ||
| * - disabled: not enabled |
There was a problem hiding this comment.
change disabled to inactive for consistency with name of state_ alternatives.
| * Instances of this class exist in one of 3 states: | ||
| * - disabled: not enabled | ||
| * - waiting-for-min: enabled, min timeout not elapsed | ||
| * - scaling-max: enabled, min timeout elapsed, max timeout not elapsed |
There was a problem hiding this comment.
Worth commenting about the expected transitions from inactive to waiting-for-min to scaling-max; usually we start inactive, enter wait-for-min for the requested min duration and transition to scaling-max when the min duration timer expires.
| EXPECT_FALSE(timer->enabled()); | ||
| } | ||
|
|
||
| TEST_F(ScaledRangeTimerManagerTest, DisableWhileWaitingForMax) { |
There was a problem hiding this comment.
I think you meant: DisableWhileWaitingForMin
|
|
||
| timer->enableTimer(std::chrono::seconds(1), std::chrono::seconds(1)); | ||
|
|
||
| simTime().advanceTimeAndRun(std::chrono::seconds(1), dispatcher_, Dispatcher::RunType::Block); |
There was a problem hiding this comment.
Please add: EXPECT_FALSE(timer->enabled());
|
|
||
| EXPECT_CALL(callback, Call).WillOnce([&] { EXPECT_EQ(dispatcher_.scope_, getScope()); }); | ||
|
|
||
| timer->enableTimer(std::chrono::seconds(0), std::chrono::seconds(1), getScope()); |
There was a problem hiding this comment.
I think this is the only test case that has min time set to 0. Should we have some regular trigger and disable tests with min time of 0?
| simTime().advanceTimeAndRun(std::chrono::seconds(5), dispatcher_, Dispatcher::RunType::Block); | ||
| simTime().advanceTimeAndRun(std::chrono::seconds(5), dispatcher_, Dispatcher::RunType::Block); | ||
|
|
||
| timer1->disableTimer(); |
There was a problem hiding this comment.
It seems that you want to cover behavior in cases where the timer scaling max which is responsible for the queue timer is disabled, and check that the next timer in the queue takes over. There are no assertions on changes to the queue timer.
Possible suggestion: Advance time to T=30 and verify that nothing triggers, move to T=35 and verify that a timer fires.
| EXPECT_THAT(*timer.trigger_times, ElementsAre(T + std::chrono::seconds(4))); | ||
| } | ||
|
|
||
| TEST_F(ScaledRangeTimerManagerTest, ScheduledWithMaxBeforeMin) { |
There was a problem hiding this comment.
Please add comment explaining expected behavior; cases where max < min are treated as if max == min.
Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
| const MonotonicTime start = simTime().monotonicTime(); | ||
|
|
||
| std::vector<TrackedTimer> timers; | ||
| timers.reserve(3); |
There was a problem hiding this comment.
I think you mean 4 instead of 3.
| * | ||
| * Some methods combine multiple state transitions; enableTimer(0, max) on a | ||
| * timer in the scaling-max state will logically execute the transition sequence | ||
| * [scaling-max -> inactive -> waiting-for-min -> scaling-max] in a single |
There was a problem hiding this comment.
Where does the transition from scaling-max to inactive come from?
There was a problem hiding this comment.
If the timer is already enabled, the first step is to disable it. Then we can enable it again for the new duration.
There was a problem hiding this comment.
I missed that the timer described in the comment is initially in the scaling-max state. The example may have been easier to follow if the initial state had been waiting-for-min
| dispatcher_.createSchedulableCallback([&] { schedulable_watcher.ready(); }); | ||
|
|
||
| testing::Expectation first_prepare = EXPECT_CALL(prepare_watcher, ready()); | ||
| testing::ExpectationSet after_first_prepare; |
There was a problem hiding this comment.
.After is new to me. Cool stuff!
|
|
||
| // Now that the scale factor is 0.5, fire times are 0: T+10, 1: T+13, 2: T+14, 3: T+13. | ||
| // Advance to timer 2's min. | ||
| // Now that the scale factor is 0.5, fire times are 0: start 10, 1: start 13, 2: start 14, 3: |
There was a problem hiding this comment.
nit: You lost some of the + after the T to start rename.
Signed-off-by: Alex Konradi <akonradi@google.com>
…aled-timer Signed-off-by: Alex Konradi <akonradi@google.com>
antoniovicente
left a comment
There was a problem hiding this comment.
Change looks really good. Thanks for sticking with it until the end, looking forward to softer overload actions that protect the proxy while reducing query loss.
Off to @mattklein123 or @envoyproxy/senior-maintainers for review and merge.
| * | ||
| * Some methods combine multiple state transitions; enableTimer(0, max) on a | ||
| * timer in the scaling-max state will logically execute the transition sequence | ||
| * [scaling-max -> inactive -> waiting-for-min -> scaling-max] in a single |
There was a problem hiding this comment.
I missed that the timer described in the comment is initially in the scaling-max state. The example may have been easier to follow if the initial state had been waiting-for-min
|
/cc @htuch since you had left comments on the doc |
…aled-timer Signed-off-by: Alex Konradi <akonradi@google.com>
|
@mattklein123 @htuch this implements the first scheme from the doc, which it sounds like might be sufficient for our needs. Tagging you both since you left feedback on the doc, PTaL. |
|
Sure happy to take a look. |
htuch
left a comment
There was a problem hiding this comment.
I'll defer to others on implementation review, but I'm happy with this after a high-level read over the PR, nice work!
mattklein123
left a comment
There was a problem hiding this comment.
I just reviewed the code. Overall LGTM with a few small mostly comment comments. Thanks!
/wait
| virtual void enableTimer(const std::chrono::milliseconds& min_ms, | ||
| const std::chrono::milliseconds& max_ms, |
There was a problem hiding this comment.
Done, and fixed Event::Timer::enableTimer as well.
| /** | ||
| * Class for creating RangeTimer objects that can be adjusted towards either the minimum or maximum | ||
| * of their range by the owner of the manager object. | ||
| */ |
There was a problem hiding this comment.
Can you add a brief summary of the design doc here so the reader has some idea of the underlying implementation?
| /** | ||
| * Sets the scale factor for all timers created through this manager. The value should be between | ||
| * 0 and 1, inclusive. | ||
| */ |
There was a problem hiding this comment.
Can you add some comments on the effect this API has on new/existing timers? Possibly covered by the top level comment I asked for.
| double value() const { return value_; } | ||
|
|
||
| private: | ||
| double value_; |
There was a problem hiding this comment.
nit: const, same elsewhere where you can.
There was a problem hiding this comment.
I wish, but then assignment doesn't work.
| ASSERT(dispatcher_.isThreadSafe()); | ||
| auto it = queues_.find(duration); | ||
| if (it == queues_.end()) { | ||
| auto queue = std::make_unique<Queue>(duration, *this, dispatcher_); |
There was a problem hiding this comment.
Let me try to explain this code back to you to see if I understand it:
- Every range timer has an individual min timer that fires independently.
- Once the min timer fires, if there is a gap between min and max, we take the duration of the min/max delta, and put it in a queue unique to that delta. The assumption here is that there will be a small number of min/max configured so the number of queues is not that big.
- Each queue has a timer that fires for the min/max duration scaled, and flushes all timers that have expired, and then resets for the next duration that needs to fire.
If so that makes sense, but per my other comments can you add a bunch more comments in these two files?
There was a problem hiding this comment.
Yep, that's correct. Added plenty of comments.
Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
Awesome, thanks for the comments. LGTM modulo format fix.
/wait
Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
|
/retest |
|
Retrying Azure Pipelines, to retry CircleCI checks, use |
Signed-off-by: Alex Konradi <akonradi@google.com>
|
Looks like the failing test is |
Commit Message: Add ScaledRangeTimerManager using dynamic queues
Additional Description:
The ScaledRangeTimerManager will be used to implement overload actions
that scale timeouts, as suggested for #11427. The timer objects returned remain
attached to the manager so that their actual trigger points can be adjusted via
changes to the manager's scale factor.
Risk Level: low (unused code)
Testing: ran unit tests
Docs Changes: none
Release Notes: none
#11427
/cc @antoniovicente