KAFKA-12319: Change calculation of window size used to calculate `Rate` by divijvaidya · Pull Request #12045 · apache/kafka

divijvaidya · 2022-04-13T15:02:16Z

Why does the test fail?

ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimit() sends 600 connection creation requests at a rate of 40/s with a listenerQuotaConnectionRateLimit set to 30/s. The test asserts that even though the rate of requests is higher than threshold, due to correct throttling, the measured rate at the completion of 600 requests is 30 +- epsilon. The value of epsilon is set to 7 which is exceeded from time to time leading to flaky test failures.

The problem

Currently, calculate of rate function (used for rate limiting) holds the following assumptions:

Start time of a new sample window is time at which first event in that window is recorded.

If we don't have quota.window.num windows retained, assume that we have prior windows with zero records while calculating the rate.

These assumptions lead to wrong calculation of rate in certain scenario as described below.

Consider a scenario when we have some initial requests, followed by a small gap without any requests and then another bunch of requests. More specifically:

Configuration = quota.window.size.seconds= 1s quota.window.num = 2 listenerName.max.connection.creation.rate = 30/s`

Rate calculated as per current implementation:
Rate at T1 = 1 / (length of hypothetical prior samples + time elapsed for current sample) = 1 / (1 + 0) = 1 events per second
Rate at T2 = 2 / (1 + 0.030) = 1.94 events per second
Rate at T3 = 3/ (1 + 0.995) = 1.5 events per second
Rate at T4 = 4/ (now - start time of oldest window) = 4 / 1.02 = 3.92 events per second

When calculating rate for T5, first "obsolete windows" are purged, i.e. any window with start time < T5 - (quota.window.size.seconds * quota.window.num), thus, Window#1 is purged (because T1 < T5-2s)

Rate at T5 = 2/ (length of hypothetical prior samples + time elapsed for current sample) = 2 / 1.99 = 1.005 events per second

Note how the rate calculation for T5 has fallen back to using the assumption that there exists prior windows with zero events (due to purge) whereas we do actually have a previous window with > 0 events in it. Hence, rate calculated at T5 is incorrect. In worst case scenarios Window#1 could have large number of events in it but calculation of rate towards end of Window#2 would ignore all those earlier events leading to an incorrect low value of current rate. For throttling use cases, this would lead to allowing more events (since current observed rate is low) and thus, violating the contract to maintain a sustained max.connection.creation.rate

The flaky test ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimit suffers from the problem described here from time to time leading to higher rate of connection creation than expected.

The solution

The solution is to remove assumption 1 stated earlier. Instead replace assumption 1 with:

Start time of a new sample window is the nearest time at which the window should have started assuming no gaps.

The nearest time is calculated as

currentWindowStartTimeMs = recordingTimeMs - ((recordingTimeMs - previousWindowEndtime) % config.timeWindowMs())

where 
recordingTimeMs is time of first record in a window
previousWindowEndtime is end time for previous window calculated as previousWindowStartTime + quota.window.size.seconds
config.timeWindowMs is quota.window.size.seconds

With the solution, T5 moves to 3rd window (window rollover occurs at T1 + 2000ms) and the rate at T5 becomes:
Rate at T5 = 2/ (now - start time of oldest window) = 2 / 1.010 = 1.98 events per second

This scenario has also been added as a unit test in MetricsTest.java

Code changes

Changes in SampledStat#record() to make the change in assumption 1 as described above. The change is made when rollover into a window occurs.
Add new tests in MetricsTest.java
Cosmetic syntax changes across files.

Testing

New test added to validate the change in assumption.
./gradlew unitTest passes.
./gradlew integrationTest passes.

Longer term solutions

Longer term I think we should move to a sliding window based approach to calculate rate instead of the fixed window approach applied today.
The current rate limiting approach also allows short burst of traffic. There should be a configurable option for the users to choose between the approach which allows short bursts vs. an approach where system tried to maintain a smooth rate over time such that at no time does it go beyond the allocated threshold.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

…enActualRateAboveLimit()

divijvaidya · 2022-04-19T15:42:00Z

Requesting review from @mjsax since you commented on the associated JIRA: https://issues.apache.org/jira/browse/KAFKA-12319

Requesting review from @ijuma @jjkoshy since you folks reviewed the last set of changes in Rate.java file.

Please take a look when you get a chance 🙏

ijuma · 2022-04-19T16:13:20Z

cc @apovzner @dajac

divijvaidya · 2022-04-25T14:04:57Z

Hey @apovzner @dajac, did you get a chance to take a look at this? Please let me know if I can make explanation simpler or if you have any questions.

mimaison

Thanks for the PR!
Unfortunately we can't change public APIs without a KIP. I wonder if we could fix the flaky test while keeping the current sampling logic by adjusting the assertions slightly.

You make a few good points in the description for tweaking how quotas are computed. It's maybe worth starting a discussion on the dev mailing list and if we get consensus then follow up with a KIP. WDYT?

divijvaidya · 2022-05-02T13:39:16Z

@mimaison Thinking about it, I can actually reduce the code changes such that no modifications to any public interface is made. Do you still think a KIP is required for this change in that case? (I am new to Kafka so I am not fully sure what qualifies for a KIP vs. what doesn't)

divijvaidya · 2022-05-06T11:23:03Z

CC'ing a couple of folks who may be interested to review this.
@mimaison @showuon @dengziming @apovzner @wyuka @satishd

tombentley

Thanks @divijvaidya, I left a few comments.

I think this would be an improvement, but it would be great to have more eyes on this (cc @guozhangwang @dajac).

tombentley · 2022-06-13T10:30:22Z

-                oldest = curr;
-        }
-        return oldest;
+        return samples.stream().min(Comparator.comparingLong(s -> s.lastWindowMs)).orElse(samples.get(0));


I wonder whether this is worth doing. I know it's shorter, but I think it will be slower, at least until the JIT optimizes it.

I find the new code more readable since we can immediately eye ball that a min is being calculated vs. in the previous version where we have to understand the assignments and logic in for loop to determine what is going on.

Nevertheless, I don't have strong opinion on this one. If you still think we need to revert it back, I will do it. Let me know.

tombentley · 2022-06-13T11:01:56Z

+        if (sample.isComplete(recordingTimeMs, config)) {
+            final long previousWindowStartTime = sample.lastWindowMs;
+            final long previousWindowEndtime = previousWindowStartTime + config.timeWindowMs();
+            final long startTimeOfNewWindow = recordingTimeMs - ((recordingTimeMs - previousWindowEndtime) % config.timeWindowMs());


recordingTimeMs seems to usually come from Time.milliseconds, thus from System.currentTimeMillis:

It's therefore not guaranteed to be monotonic.

Which could be exacerbated by the synchronized blocks in Sensor.recordInternal, because synchronized provides no guarantee about fairness for blocked threads.

sample.isComplete(recordingTimeMs, config) could return true based on the number of samples, not the time.

So I think it's possible that recordingTimeMs < previousWindowEndtime, so that startTimeOfNewWindow ends up ahead of recordingTimeMs. Which I don't think is intended. Or if it is it's definitely something that's worthy of a comment.

That is a great observation Tom! Ideally the code should be written to ensure that recording a metric should not block because the operation is wall clock time sensitive. But as you observed, we have synchronized at multiple places which may lead to sample being recorded in a window which has already completed in the past.

For cases when the sensor is used for calculating the ConnectionQuota, this problem wouldn't occur because the calculation of Time.milliseconds is done inside a synchronised block which ensures that ensures that only one thread with latest timestamp will be accessing the sensor.record at a time.

But I don't know about other code paths other than ConnectionQuota that use sensor and your observation is valid.

Since this problem is independent of this code change, and breaks existing logic if/when recordingTimeMs < endTimeOfPreviousWindow, I have created a JIRA to address this in a separate PR: https://issues.apache.org/jira/browse/KAFKA-13994

[1] https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/SocketServer.scala#L1541-L1542

divijvaidya · 2023-05-11T08:45:04Z

@machi1990 since you opened a PR to fix this flaky test, you might be familiar with this part of code. May I request you to review this PR please.

machi1990 · 2023-05-11T11:24:27Z

@machi1990 since you opened a PR to fix this flaky test, you might be familiar with this part of code. May I request you to review this PR please.

Hey @divijvaidya I am new to Kafka and to this part of the code. It'll be good to get another round of reviews from committers since some of them have started to have a look at this PR. My attempt to fix the the flaky test in #13702 was by slightly modifying the assertions which was more of having a quick win and stabilize the test. While this PR attempts to sort out the underlying issue with quota computation. I think it'll be good to get more eyes on the PR as suggested by #12045 (review) and #12045 (review) what do you think?

machi1990 · 2023-06-20T07:49:22Z

One of my PR[1] was bitten again by the test failure that this change is attempting to fix.
Would people be open to getting in a tactical fix[2] to stabilize CI while this PR is being reviewed? @mimaison @tombentley @divijvaidya any thoughts?

github-actions · 2024-12-26T03:37:58Z

This PR is being marked as stale since it has not had any activity in 90 days. If you
would like to keep this PR alive, please leave a comment asking for a review. If the PR has
merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact).

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

github-actions · 2025-01-26T03:34:38Z

This PR has been closed since it has not had any activity in 120 days. If you feel like this
was a mistake, or you would like to continue working on it, please feel free to re-open the
PR and ask for a review.

divijvaidya added 2 commits April 13, 2022 15:12

Fix flaky test ConnectionQuotasTest.testListenerConnectionRateLimitWh…

84bdc07

…enActualRateAboveLimit()

Modify MetricsTest

961f7df

divijvaidya mentioned this pull request Apr 13, 2022

(draft) Kafka-12319: Improve rate calculation for first sample window #12034

Closed

3 tasks

divijvaidya mentioned this pull request May 2, 2022

MINOR: Small cleanups in connect/mirror #12113

Merged

3 tasks

Hangleton reviewed May 2, 2022

View reviewed changes

mimaison reviewed May 2, 2022

View reviewed changes

Comment thread clients/src/main/java/org/apache/kafka/common/metrics/stats/Rate.java

Comment thread clients/src/main/java/org/apache/kafka/common/metrics/stats/SampledStat.java Outdated

Comment thread clients/src/main/java/org/apache/kafka/common/metrics/stats/SampledStat.java Outdated

divijvaidya added 4 commits May 2, 2022 15:58

Revert changes to public interfaces

67c92ff

Fix typo

2ee5033

remove public API

d437110

Add some docs

0a8d4de

divijvaidya mentioned this pull request May 13, 2022

MINOR: Clarify impact of num.replica.fetchers #12153

Merged

3 tasks

divijvaidya mentioned this pull request May 31, 2022

MINOR: Include the inner exception stack trace when re-throwing an exception #12229

Merged

tombentley reviewed Jun 13, 2022

View reviewed changes

divijvaidya added 3 commits June 15, 2022 10:58

Fix suggestions in PR comments

83641b6

Fix java doc for samples

b0ac882

Fix the java doc for samples

365b2ac

divijvaidya mentioned this pull request Nov 1, 2022

KAFKA-14345: Fix flakiness with more accurate bound in (Dynamic)ConnectionQuotaTest #12806

Closed

3 tasks

divijvaidya mentioned this pull request May 10, 2023

KAFKA-14985: attempt to fix ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimit() test by bumping episilon to 8 from 7 #13702

Closed

3 tasks

github-actions Bot added the stale Stale PRs label Dec 26, 2024

github-actions Bot added the closed-stale PRs that were closed due to inactivity label Jan 26, 2025

github-actions Bot closed this Jan 26, 2025

Conversation

divijvaidya commented Apr 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why does the test fail?

The problem

The solution

Code changes

Testing

Longer term solutions

Committer Checklist (excluded from commit message)

Uh oh!

divijvaidya commented Apr 19, 2022

Uh oh!

ijuma commented Apr 19, 2022

Uh oh!

divijvaidya commented Apr 25, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mimaison left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

divijvaidya commented May 2, 2022

Uh oh!

divijvaidya commented May 6, 2022

Uh oh!

tombentley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tombentley Jun 13, 2022

Choose a reason for hiding this comment

Uh oh!

divijvaidya Jun 15, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tombentley Jun 13, 2022

Choose a reason for hiding this comment

Uh oh!

divijvaidya Jun 15, 2022

Choose a reason for hiding this comment

Uh oh!

divijvaidya commented May 11, 2023

Uh oh!

machi1990 commented May 11, 2023

Uh oh!

machi1990 commented Jun 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Dec 26, 2024

Uh oh!

github-actions Bot commented Jan 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

divijvaidya commented Apr 13, 2022 •

edited

Loading

machi1990 commented Jun 20, 2023 •

edited

Loading