KAFKA-12319: Change calculation of window size used to calculate Rate#12045
KAFKA-12319: Change calculation of window size used to calculate Rate#12045divijvaidya wants to merge 9 commits intoapache:trunkfrom
Rate#12045Conversation
…enActualRateAboveLimit()
|
Requesting review from @mjsax since you commented on the associated JIRA: https://issues.apache.org/jira/browse/KAFKA-12319 Requesting review from @ijuma @jjkoshy since you folks reviewed the last set of changes in Please take a look when you get a chance 🙏 |
mimaison
left a comment
There was a problem hiding this comment.
Thanks for the PR!
Unfortunately we can't change public APIs without a KIP. I wonder if we could fix the flaky test while keeping the current sampling logic by adjusting the assertions slightly.
You make a few good points in the description for tweaking how quotas are computed. It's maybe worth starting a discussion on the dev mailing list and if we get consensus then follow up with a KIP. WDYT?
|
@mimaison Thinking about it, I can actually reduce the code changes such that no modifications to any public interface is made. Do you still think a KIP is required for this change in that case? (I am new to Kafka so I am not fully sure what qualifies for a KIP vs. what doesn't) |
tombentley
left a comment
There was a problem hiding this comment.
Thanks @divijvaidya, I left a few comments.
I think this would be an improvement, but it would be great to have more eyes on this (cc @guozhangwang @dajac).
| oldest = curr; | ||
| } | ||
| return oldest; | ||
| return samples.stream().min(Comparator.comparingLong(s -> s.lastWindowMs)).orElse(samples.get(0)); |
There was a problem hiding this comment.
I wonder whether this is worth doing. I know it's shorter, but I think it will be slower, at least until the JIT optimizes it.
There was a problem hiding this comment.
I find the new code more readable since we can immediately eye ball that a min is being calculated vs. in the previous version where we have to understand the assignments and logic in for loop to determine what is going on.
Nevertheless, I don't have strong opinion on this one. If you still think we need to revert it back, I will do it. Let me know.
| if (sample.isComplete(recordingTimeMs, config)) { | ||
| final long previousWindowStartTime = sample.lastWindowMs; | ||
| final long previousWindowEndtime = previousWindowStartTime + config.timeWindowMs(); | ||
| final long startTimeOfNewWindow = recordingTimeMs - ((recordingTimeMs - previousWindowEndtime) % config.timeWindowMs()); |
There was a problem hiding this comment.
recordingTimeMs seems to usually come from Time.milliseconds, thus from System.currentTimeMillis:
- It's therefore not guaranteed to be monotonic.
- Which could be exacerbated by the
synchronizedblocks inSensor.recordInternal, because synchronized provides no guarantee about fairness for blocked threads.
sample.isComplete(recordingTimeMs, config) could return true based on the number of samples, not the time.
So I think it's possible that recordingTimeMs < previousWindowEndtime, so that startTimeOfNewWindow ends up ahead of recordingTimeMs. Which I don't think is intended. Or if it is it's definitely something that's worthy of a comment.
There was a problem hiding this comment.
That is a great observation Tom! Ideally the code should be written to ensure that recording a metric should not block because the operation is wall clock time sensitive. But as you observed, we have synchronized at multiple places which may lead to sample being recorded in a window which has already completed in the past.
For cases when the sensor is used for calculating the ConnectionQuota, this problem wouldn't occur because the calculation of Time.milliseconds is done inside a synchronised block which ensures that ensures that only one thread with latest timestamp will be accessing the sensor.record at a time.
But I don't know about other code paths other than ConnectionQuota that use sensor and your observation is valid.
Since this problem is independent of this code change, and breaks existing logic if/when recordingTimeMs < endTimeOfPreviousWindow, I have created a JIRA to address this in a separate PR: https://issues.apache.org/jira/browse/KAFKA-13994
|
@machi1990 since you opened a PR to fix this flaky test, you might be familiar with this part of code. May I request you to review this PR please. |
Hey @divijvaidya I am new to Kafka and to this part of the code. It'll be good to get another round of reviews from committers since some of them have started to have a look at this PR. My attempt to fix the the flaky test in #13702 was by slightly modifying the assertions which was more of having a quick win and stabilize the test. While this PR attempts to sort out the underlying issue with quota computation. I think it'll be good to get more eyes on the PR as suggested by #12045 (review) and #12045 (review) what do you think? |
|
One of my PR[1] was bitten again by the test failure that this change is attempting to fix. |
|
This PR is being marked as stale since it has not had any activity in 90 days. If you If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact). If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed. |
|
This PR has been closed since it has not had any activity in 120 days. If you feel like this |
Why does the test fail?
ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimit()sends 600 connection creation requests at a rate of 40/s with a listenerQuotaConnectionRateLimit set to 30/s. The test asserts that even though the rate of requests is higher than threshold, due to correct throttling, the measured rate at the completion of 600 requests is 30 +- epsilon. The value of epsilon is set to 7 which is exceeded from time to time leading to flaky test failures.The problem
Currently, calculate of rate function (used for rate limiting) holds the following assumptions:
These assumptions lead to wrong calculation of rate in certain scenario as described below.
Consider a scenario when we have some initial requests, followed by a small gap without any requests and then another bunch of requests. More specifically:
Configuration = quota.window.size.seconds= 1s quota.window.num = 2 listenerName.max.connection.creation.rate = 30/s`
Record events (E) at timestamps:
E1 | CurrentTimeStamp (T1) | Window#1 (start time = T1)
E2 | T2 = T1 + 30ms | Window#1
E3 | T3 = T1 + 995ms | Window#1
< No events from T3 to T4 >
E4 | T4 = T1 + 1020ms | Window#2 (start time = T1 + 1020ms)
E5 | T5 = T1 + 2010ms | Window#2
Rate calculated as per current implementation:
Rate at T1 = 1 / (length of hypothetical prior samples + time elapsed for current sample) = 1 / (1 + 0) = 1 events per second
Rate at T2 = 2 / (1 + 0.030) = 1.94 events per second
Rate at T3 = 3/ (1 + 0.995) = 1.5 events per second
Rate at T4 = 4/ (now - start time of oldest window) = 4 / 1.02 = 3.92 events per second
When calculating rate for T5, first "obsolete windows" are purged, i.e. any window with start time < T5 - (quota.window.size.seconds * quota.window.num), thus, Window#1 is purged (because T1 < T5-2s)
Rate at T5 = 2/ (length of hypothetical prior samples + time elapsed for current sample) = 2 / 1.99 = 1.005 events per second
Note how the rate calculation for T5 has fallen back to using the assumption that there exists prior windows with zero events (due to purge) whereas we do actually have a previous window with > 0 events in it. Hence, rate calculated at T5 is incorrect. In worst case scenarios Window#1 could have large number of events in it but calculation of rate towards end of Window#2 would ignore all those earlier events leading to an incorrect low value of current rate. For throttling use cases, this would lead to allowing more events (since current observed rate is low) and thus, violating the contract to maintain a sustained
max.connection.creation.rateThe flaky test
ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimitsuffers from the problem described here from time to time leading to higher rate of connection creation than expected.The solution
The solution is to remove assumption 1 stated earlier. Instead replace assumption 1 with:
The nearest time is calculated as
With the solution, T5 moves to 3rd window (window rollover occurs at T1 + 2000ms) and the rate at T5 becomes:
Rate at T5 = 2/ (now - start time of oldest window) = 2 / 1.010 = 1.98 events per second
This scenario has also been added as a unit test in
MetricsTest.javaCode changes
SampledStat#record()to make the change in assumption 1 as described above. The change is made when rollover into a window occurs.MetricsTest.javaTesting
./gradlew unitTestpasses../gradlew integrationTestpasses.Longer term solutions
Committer Checklist (excluded from commit message)