event: scaled timers using dynamic queues by akonradi · Pull Request #13129 · envoyproxy/envoy

akonradi · 2020-09-16T16:13:36Z

Commit Message: Add ScaledRangeTimerManager using dynamic queues
Additional Description:
The ScaledRangeTimerManager will be used to implement overload actions
that scale timeouts, as suggested for #11427. The timer objects returned remain
attached to the manager so that their actual trigger points can be adjusted via
changes to the manager's scale factor.

Risk Level: low (unused code)
Testing: ran unit tests
Docs Changes: none
Release Notes: none

#11427
/cc @antoniovicente

Add an interface for timers that can be enabled for some timeout within a range. The actual choice of when the timer should be triggered is not specified by the interface. Possible implementations of the interface include - timers that are triggered when a thread is not busy - timers that are triggered early/late in response to load Signed-off-by: Alex Konradi <akonradi@google.com>

The Google Test repository has endorsed Abseil's "live at head" philosophy. Bump to a newer commit that includes google/googletest/envoyproxy#2350 for ease of writing tests. Signed-off-by: Alex Konradi <akonradi@google.com>

Signed-off-by: Alex Konradi <akonradi@google.com>

Add a ScaledRangeTimerManager class that produces RangeTimers that can be adjusted after the fact by changing the scaling factor on the manager. This class implements the dynamic queues approach suggested in envoyproxy#11427#issuecomment-691154144. The code is tested but not yet integrated anywhere. Signed-off-by: Alex Konradi <akonradi@google.com>

repokitteh-read-only · 2020-09-16T16:13:41Z

CC @envoyproxy/dependency-watchers: FYI only for changes made to (bazel/repository_locations\.bzl)|(api/bazel/repository_locations\.bzl)|(.*/requirements\.txt).

🐱

Caused by: #13129 was opened by akonradi.

see: more, trace.

antoniovicente · 2020-09-16T16:33:12Z

/assign @antoniovicente

DoAll now needs to be pulled into scope explicitly with a `using` declaration. Signed-off-by: Alex Konradi <akonradi@google.com>

antoniovicente

Thanks Alex, this seems like huge progress towards implementation of overload manager aware timeouts. Sorry for the laundry list of comments, it sometimes unavoidable on PRs of this size and complexity.

antoniovicente · 2020-09-16T19:24:28Z

+  ASSERT(!queue.range_timers.empty());
+  auto item = std::move(queue.range_timers.front());
+  queue.range_timers.pop_front();
+  item.timer.trigger();


Do you need to trigger timers in a loop in cases where there are multiple timers at the head of the queue that have already expired?

Note that real timers always execute in the next iteration of the event loop. This implies that in the current implmentation at most 1 scaled timer can trigger per event loop iteration.

Another thing to consider is how to integrate with #13104. Ideally we'ld touch the watchdogs between each scaled timer execution; you'll need help from the dispatcher to do that.

Please add a test that verifies that multiple timers that are older than the current monotonic time fire as a group even if their exact trigger times are different. The way to verify that the timers fire on the same iteration requires hooking into the event loop's prepare callback as done in test/common/event/dispatcher_impl_test.cc test cases including SchedulableCallbackImplTest.RescheduleNext

Do we want to fire multiple timers as a group? It seems like we can get the same watchdog petting between callback executions by rescheduling the queue's timer with duration 0. I guess that might be extra expensive, though.

Timers with 0 duration run on the next iteration of the event loop after #11823

In order to schedule a second call in the same loop iteration you would need to use to use SchedulableCallback::scheduleCallbackCurrentIteration to schedule in cases where remaining max-min duration is 0.

Triggering expired timers as a group seems like a good way to implement this. We can address the watchdog petting issues in a followup.

Changed this to trigger multiple timers when a queue timer expires.

antoniovicente · 2020-09-16T22:55:10Z

+    WaitingForMin(std::chrono::milliseconds duration) : duration(duration) {}
+
+    // The number for the bucket this timer will be placed in.
+    const std::chrono::milliseconds duration;


nit: Could use a more descriptive name like scalable_duration. Also, the comment seems out of date.

antoniovicente · 2020-09-16T22:56:25Z

+
+  ScaledRangeTimerManager& manager_;
+  const TimerCb callback_;
+  const TimerPtr pending_timer_;


nit: Possilbly better name: min_duration_timer_

antoniovicente · 2020-09-16T22:59:58Z

+  const ScopeTrackedObject* scope_;
+};
+
+ScaledRangeTimerManager::ScaledRangeTimerManager(Dispatcher& dispatcher, double scale_factor)


I would have expected scale factor to always be 1.0 on construction. Consider removing scale_factor constructor argument.

antoniovicente · 2020-09-17T00:03:55Z

+  EXPECT_FALSE(timer->enabled());
+}
+
+TEST_F(ScaledRangeTimerManagerTest, DisableWhileActive) {


What does pending and active mean in this context? Seems different from the states encoded in the RangeTimerImpl absl::variant

Pending was from an old implementation, should be fixed now.

What does "active" mean in this context?

I see other references to "active" below, it seems that it refers to ScalingMax.

antoniovicente · 2020-09-17T00:06:06Z

+
+  timer->enableTimer(std::chrono::seconds(5), std::chrono::seconds(100));
+
+  simTime().advanceTimeAsync(std::chrono::seconds(5));


Worth adding some assertions about the queue state after the advance and run?

It's not part of the public interface, so I'm hesitant to test it directly.

antoniovicente · 2020-09-17T00:13:57Z

+
+TEST_F(ScaledRangeTimerManagerTest, ScheduledWithScalingFactorZero) {
+  ScaledRangeTimerManager manager(dispatcher_, 0.0);
+


Please add a test that covers use of setScaleFactor. There seems to be no tests that exercise that function.
Also, prefer use of setScaleFactor instead of passing in a scale value to the constructor.

I recommend exercising some some in-between scale factors like 0.5 with timers that are expected to fire with different orders when scale = 1.0 and scale = 0.5
Cover both the case of timers added before the scale change and timers added after the scale change. Bonus points for mixing the two.

MultipleTimersWithScaling does some of this. I've added some more tests.

antoniovicente · 2020-09-17T00:24:45Z

+
+  EXPECT_THAT(*timers[0].trigger_times, ElementsAre(T + std::chrono::seconds(3)));
+  EXPECT_THAT(*timers[1].trigger_times, ElementsAre(T + std::chrono::seconds(3)));
+  EXPECT_THAT(*timers[2].trigger_times, ElementsAre(T + std::chrono::seconds(3)));


See the comment I added to ScaledRangeTimerManager::onQueueTimerFired

The bit about heterogeneous lookup? Please clarify.

Signed-off-by: Alex Konradi <akonradi@google.com>

…aled-timer Signed-off-by: Alex Konradi <akonradi@google.com>

Signed-off-by: Alex Konradi <akonradi@google.com>

antoniovicente

Almost there. Thanks again for this very exciting change.

antoniovicente · 2020-09-22T17:17:37Z

+    timers.emplace_back(manager, simTime());
+  }
+
+  const MonotonicTime T = simTime().monotonicTime();


The variable name 'T' doesn't follow style guidelines. I recommend changing the name to "start"'

antoniovicente · 2020-09-22T17:31:29Z

+
+  EXPECT_THAT(*timers[0].trigger_times, ElementsAre(T + std::chrono::seconds(2)));
+  EXPECT_THAT(*timers[1].trigger_times, ElementsAre(T + std::chrono::seconds(2)));
+  EXPECT_THAT(*timers[2].trigger_times, ElementsAre(T + std::chrono::seconds(2)));


Please add a test that verifies that multiple timers that are older than the current monotonic time fire as a group even if their exact trigger times are different. The way to verify that the timers fire on the same iteration requires hooking into the event loop's prepare callback as done in test/common/event/dispatcher_impl_test.cc test cases including SchedulableCallbackImplTest.RescheduleNext

Thanks for the pointer. Added a test.

antoniovicente · 2020-09-22T23:08:17Z

+    ScaledRangeTimerManager::ScalingTimerHandle handle_;
+  };
+
+  void onPendingTimerComplete() {


nit about naming consistency: I think you used to refer to WaitingForMin as Pending. Consider changing name of this function to onMinTimerComplete

antoniovicente · 2020-09-22T23:09:18Z

+ * Implementation of RangeTimer that can be scaled by the backing manager object.
+ *
+ * Instances of this class exist in one of 3 states:
+ *  - disabled: not enabled


change disabled to inactive for consistency with name of state_ alternatives.

antoniovicente · 2020-09-22T23:11:48Z

+ * Instances of this class exist in one of 3 states:
+ *  - disabled: not enabled
+ *  - waiting-for-min: enabled, min timeout not elapsed
+ *  - scaling-max: enabled, min timeout elapsed, max timeout not elapsed


Worth commenting about the expected transitions from inactive to waiting-for-min to scaling-max; usually we start inactive, enter wait-for-min for the requested min duration and transition to scaling-max when the min duration timer expires.

antoniovicente · 2020-09-22T23:48:29Z

+  EXPECT_FALSE(timer->enabled());
+}
+
+TEST_F(ScaledRangeTimerManagerTest, DisableWhileWaitingForMax) {


I think you meant: DisableWhileWaitingForMin

antoniovicente · 2020-09-22T23:54:53Z

+
+  timer->enableTimer(std::chrono::seconds(1), std::chrono::seconds(1));
+
+  simTime().advanceTimeAndRun(std::chrono::seconds(1), dispatcher_, Dispatcher::RunType::Block);


Please add: EXPECT_FALSE(timer->enabled());

antoniovicente · 2020-09-22T23:57:06Z

+
+  EXPECT_CALL(callback, Call).WillOnce([&] { EXPECT_EQ(dispatcher_.scope_, getScope()); });
+
+  timer->enableTimer(std::chrono::seconds(0), std::chrono::seconds(1), getScope());


I think this is the only test case that has min time set to 0. Should we have some regular trigger and disable tests with min time of 0?

antoniovicente · 2020-09-23T00:02:14Z

+  simTime().advanceTimeAndRun(std::chrono::seconds(5), dispatcher_, Dispatcher::RunType::Block);
+  simTime().advanceTimeAndRun(std::chrono::seconds(5), dispatcher_, Dispatcher::RunType::Block);
+
+  timer1->disableTimer();


It seems that you want to cover behavior in cases where the timer scaling max which is responsible for the queue timer is disabled, and check that the next timer in the queue takes over. There are no assertions on changes to the queue timer.

Possible suggestion: Advance time to T=30 and verify that nothing triggers, move to T=35 and verify that a timer fires.

antoniovicente · 2020-09-23T00:19:01Z

+  EXPECT_THAT(*timer.trigger_times, ElementsAre(T + std::chrono::seconds(4)));
+}
+
+TEST_F(ScaledRangeTimerManagerTest, ScheduledWithMaxBeforeMin) {


Please add comment explaining expected behavior; cases where max < min are treated as if max == min.

Signed-off-by: Alex Konradi <akonradi@google.com>

antoniovicente · 2020-09-23T18:55:49Z

  const MonotonicTime start = simTime().monotonicTime();

  std::vector<TrackedTimer> timers;
+  timers.reserve(3);


I think you mean 4 instead of 3.

🤦 thanks, fixed.

antoniovicente · 2020-09-23T20:37:09Z

+ *
+ * Some methods combine multiple state transitions; enableTimer(0, max) on a
+ * timer in the scaling-max state will logically execute the transition sequence
+ * [scaling-max -> inactive -> waiting-for-min -> scaling-max] in a single


Where does the transition from scaling-max to inactive come from?

If the timer is already enabled, the first step is to disable it. Then we can enable it again for the new duration.

I missed that the timer described in the comment is initially in the scaling-max state. The example may have been easier to follow if the initial state had been waiting-for-min

antoniovicente · 2020-09-23T20:56:17Z

+      dispatcher_.createSchedulableCallback([&] { schedulable_watcher.ready(); });
+
+  testing::Expectation first_prepare = EXPECT_CALL(prepare_watcher, ready());
+  testing::ExpectationSet after_first_prepare;


.After is new to me. Cool stuff!

antoniovicente · 2020-09-23T20:57:25Z


-  // Now that the scale factor is 0.5, fire times are 0: T+10, 1: T+13, 2: T+14, 3: T+13.
-  // Advance to timer 2's min.
+  // Now that the scale factor is 0.5, fire times are 0: start 10, 1: start 13, 2: start 14, 3:


nit: You lost some of the + after the T to start rename.

Thanks, fixed.

Signed-off-by: Alex Konradi <akonradi@google.com>

…aled-timer Signed-off-by: Alex Konradi <akonradi@google.com>

antoniovicente

Change looks really good. Thanks for sticking with it until the end, looking forward to softer overload actions that protect the proxy while reducing query loss.

Off to @mattklein123 or @envoyproxy/senior-maintainers for review and merge.

antoniovicente · 2020-09-23T21:50:19Z

+ *
+ * Some methods combine multiple state transitions; enableTimer(0, max) on a
+ * timer in the scaling-max state will logically execute the transition sequence
+ * [scaling-max -> inactive -> waiting-for-min -> scaling-max] in a single


I missed that the timer described in the comment is initially in the scaling-max state. The example may have been easier to follow if the initial state had been waiting-for-min

akonradi · 2020-09-24T13:48:35Z

/cc @htuch since you had left comments on the doc

…aled-timer Signed-off-by: Alex Konradi <akonradi@google.com>

akonradi · 2020-09-30T14:07:38Z

@mattklein123 @htuch this implements the first scheme from the doc, which it sounds like might be sufficient for our needs. Tagging you both since you left feedback on the doc, PTaL.

mattklein123 · 2020-09-30T16:05:31Z

Sure happy to take a look.

htuch

I'll defer to others on implementation review, but I'm happy with this after a high-level read over the PR, nice work!

mattklein123

I just reviewed the code. Overall LGTM with a few small mostly comment comments. Thanks!

/wait

mattklein123 · 2020-10-01T16:50:12Z

+  virtual void enableTimer(const std::chrono::milliseconds& min_ms,
+                           const std::chrono::milliseconds& max_ms,


perf nit: pass by value

Done, and fixed Event::Timer::enableTimer as well.

mattklein123 · 2020-10-01T16:52:49Z

+/**
+ * Class for creating RangeTimer objects that can be adjusted towards either the minimum or maximum
+ * of their range by the owner of the manager object.
+ */


Can you add a brief summary of the design doc here so the reader has some idea of the underlying implementation?

mattklein123 · 2020-10-01T16:54:47Z

+  /**
+   * Sets the scale factor for all timers created through this manager. The value should be between
+   * 0 and 1, inclusive.
+   */


Can you add some comments on the effect this API has on new/existing timers? Possibly covered by the top level comment I asked for.

mattklein123 · 2020-10-01T17:02:17Z

+    double value() const { return value_; }
+
+  private:
+    double value_;


nit: const, same elsewhere where you can.

I wish, but then assignment doesn't work.

mattklein123 · 2020-10-01T17:14:13Z

+  ASSERT(dispatcher_.isThreadSafe());
+  auto it = queues_.find(duration);
+  if (it == queues_.end()) {
+    auto queue = std::make_unique<Queue>(duration, *this, dispatcher_);


Let me try to explain this code back to you to see if I understand it:

Every range timer has an individual min timer that fires independently.

Once the min timer fires, if there is a gap between min and max, we take the duration of the min/max delta, and put it in a queue unique to that delta. The assumption here is that there will be a small number of min/max configured so the number of queues is not that big.

Each queue has a timer that fires for the min/max duration scaled, and flushes all timers that have expired, and then resets for the next duration that needs to fire.

If so that makes sense, but per my other comments can you add a bunch more comments in these two files?

Yep, that's correct. Added plenty of comments.

Signed-off-by: Alex Konradi <akonradi@google.com>

mattklein123

Awesome, thanks for the comments. LGTM modulo format fix.

/wait

Signed-off-by: Alex Konradi <akonradi@google.com>

akonradi · 2020-10-02T21:17:54Z

/retest

repokitteh-read-only · 2020-10-02T21:17:58Z

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #13129 (comment) was created by @akonradi.

see: more, trace.

Signed-off-by: Alex Konradi <akonradi@google.com>

akonradi · 2020-10-05T14:16:35Z

Looks like the failing test is //test/extensions/transport_sockets/tls:handshaker_test which is flaky under Windows.

akonradi added 4 commits September 16, 2020 12:11

Bump googletest version

96b55f7

The Google Test repository has endorsed Abseil's "live at head" philosophy. Bump to a newer commit that includes google/googletest/envoyproxy#2350 for ease of writing tests. Signed-off-by: Alex Konradi <akonradi@google.com>

Add WrappedDispatcher class for method interception

397cccb

Signed-off-by: Alex Konradi <akonradi@google.com>

repokitteh-read-only Bot assigned antoniovicente Sep 16, 2020

Fix test broken by googletest ADL incompatibility

cc17e75

DoAll now needs to be pulled into scope explicitly with a `using` declaration. Signed-off-by: Alex Konradi <akonradi@google.com>

akonradi requested a review from mattklein123 as a code owner September 16, 2020 18:40

antoniovicente reviewed Sep 17, 2020

View reviewed changes

akonradi added 4 commits September 18, 2020 12:33

Address review feedback

b427952

Signed-off-by: Alex Konradi <akonradi@google.com>

Address more review feedback

f864312

Signed-off-by: Alex Konradi <akonradi@google.com>

Merge remote-tracking branch 'upstream/master' into dynamic-queues-sc…

62832da

…aled-timer Signed-off-by: Alex Konradi <akonradi@google.com>

Use advanceTimeAndRun instead of advanceTimeAsync

6b54c0d

Signed-off-by: Alex Konradi <akonradi@google.com>

antoniovicente reviewed Sep 23, 2020

View reviewed changes

akonradi added 2 commits September 23, 2020 13:58

Address feedback and add event schedule test

5988be7

Signed-off-by: Alex Konradi <akonradi@google.com>

Fix clang-tidy errors

8af2067

Signed-off-by: Alex Konradi <akonradi@google.com>

antoniovicente previously approved these changes Sep 23, 2020

View reviewed changes

akonradi added 2 commits September 23, 2020 17:15

Address review nits.

045137b

Signed-off-by: Alex Konradi <akonradi@google.com>

Merge remote-tracking branch 'upstream/master' into dynamic-queues-sc…

8f3c206

…aled-timer Signed-off-by: Alex Konradi <akonradi@google.com>

akonradi dismissed antoniovicente’s stale review via 8f3c206 September 23, 2020 21:25

antoniovicente previously approved these changes Sep 23, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into dynamic-queues-sc…

94ea1ea

…aled-timer Signed-off-by: Alex Konradi <akonradi@google.com>

akonradi mentioned this pull request Sep 29, 2020

[http] Adaptive timeouts for idle HTTP client connections waiting for a request #11427

Closed

mattklein123 self-assigned this Sep 30, 2020

htuch reviewed Sep 30, 2020

View reviewed changes

mattklein123 requested changes Oct 1, 2020

View reviewed changes

repokitteh-read-only Bot added the waiting label Oct 1, 2020

akonradi added 3 commits October 1, 2020 14:11

Pass by value to Timer::enable(HR)?Timer

9f4941c

Signed-off-by: Alex Konradi <akonradi@google.com>

Pass by value to RangeTimer::enableTimer

cab6b6b

Signed-off-by: Alex Konradi <akonradi@google.com>

Added comments to address review comments

4fc6f60

Signed-off-by: Alex Konradi <akonradi@google.com>

akonradi dismissed antoniovicente’s stale review via 4fc6f60 October 1, 2020 20:59

repokitteh-read-only Bot removed the waiting label Oct 1, 2020

mattklein123 reviewed Oct 1, 2020

View reviewed changes

repokitteh-read-only Bot added the waiting label Oct 1, 2020

Fix formatting

7928dea

Signed-off-by: Alex Konradi <akonradi@google.com>

repokitteh-read-only Bot removed the waiting label Oct 2, 2020

akonradi added 2 commits October 2, 2020 10:53

Fix build

573b1a6

Signed-off-by: Alex Konradi <akonradi@google.com>

Fix MSVC build

32789d4

Signed-off-by: Alex Konradi <akonradi@google.com>

Merge branch 'master' into dynamic-queues-scaled-timer

272b9e8

Signed-off-by: Alex Konradi <akonradi@google.com>

mattklein123 approved these changes Oct 2, 2020

View reviewed changes

mattklein123 merged commit 80138b2 into envoyproxy:master Oct 5, 2020

akonradi deleted the dynamic-queues-scaled-timer branch October 5, 2020 15:37


		timer->enableTimer(std::chrono::seconds(5), std::chrono::seconds(100));

		simTime().advanceTimeAsync(std::chrono::seconds(5));


		TEST_F(ScaledRangeTimerManagerTest, ScheduledWithScalingFactorZero) {
		ScaledRangeTimerManager manager(dispatcher_, 0.0);


		timer->enableTimer(std::chrono::seconds(1), std::chrono::seconds(1));

		simTime().advanceTimeAndRun(std::chrono::seconds(1), dispatcher_, Dispatcher::RunType::Block);


		EXPECT_CALL(callback, Call).WillOnce([&] { EXPECT_EQ(dispatcher_.scope_, getScope()); });

		timer->enableTimer(std::chrono::seconds(0), std::chrono::seconds(1), getScope());

		virtual void enableTimer(const std::chrono::milliseconds& min_ms,
		const std::chrono::milliseconds& max_ms,

Conversation

akonradi commented Sep 16, 2020

Uh oh!

repokitteh-read-only Bot commented Sep 16, 2020

Uh oh!

antoniovicente commented Sep 16, 2020

Uh oh!

antoniovicente left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antoniovicente left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment