KAFKA-2443 Expose windowSize on Rate; KAFKA-2567 - Throttle time should not return NaN#213
KAFKA-2443 Expose windowSize on Rate; KAFKA-2567 - Throttle time should not return NaN#213auradkar wants to merge 6 commits intoapache:trunkfrom
Conversation
There was a problem hiding this comment.
nit: extra blank line. ;)
There was a problem hiding this comment.
I'll address this on my next commit. Thanks.
There was a problem hiding this comment.
I think this comment should now be moved to measurableAsRate.
There was a problem hiding this comment.
I had to stare at the SampledStat code most of today. I actually think the original code is correct. The elapsed time should be the time elapsed in the current window plus the previous (non-obsolete) windows. It seems this is exactly what the current code is doing. Your change I think would effectively almost double the actual elapsed time. Maybe we can discuss this offline.
There was a problem hiding this comment.
I second Joel's comment.. In particular, the difference in lastWindowMs of different samples is not necessarily multiple of config.timeWindowMs()
There was a problem hiding this comment.
Take a look at the testcase RateWindowing I added. Is the output time correct? i.e. 75 seconds.
Let me provide an example: Assume we have 3 samples of size 30 seconds each.
T0 - Start of time
T60 - 60 seconds have elapsed. 2 full sample windows
T75 - A record is called at this time. Note that there was no activity from 60-75.
the current code will produce windowSize of 60 seconds because:
long elaspedCurrentWindowMs = 0 seconds (because the sample just got created). The timestamp used is the create timestamp of the current sample and not the end timestamp of the previous sample. Shouldn't the samples be contiguous?
long elapsedPreviousWindow = (numSamples - 1) * sampleSize = 2 * 30 = 60 seconds
The current code produces this output:
long elapsedPreviousWindow = 60 seconds (same as above)
long elapsedCurrentWindow = 15 seconds.
Shouldn't this be valid because we actually have 75 seconds of data? IIUC, the current code creates a blackout of 15 seconds.
There was a problem hiding this comment.
Ah - you are not doubling the elapsed time because you are actually doing a modulo on the window size. That said, I think the current code should still be correct. Note that in your test you haven't actually created three samples because you didn't call record at the 60 second or later mark. i.e., if you debug through you will find only two samples. So the "current" time is taken off now minus the lastWindowMs of the "current" sample which is the second sample and that ends up being 105 seconds for me (which is correct because the current sample has not rolled over due to the absence of a record).
There was a problem hiding this comment.
For anyone reading - Joel and I had a very long chat wherein we agreed that the patch is correct. I've changed the code a little and added a bunch of comments to make it easier to read
There was a problem hiding this comment.
I see the problem with original approach after looking your test case. Now I think the new approach is correct.
|
Also, just left a comment on the original delay computation RB - I think it will be helpful to update the comment on the delay computation as described (https://reviews.apache.org/r/33049/) |
There was a problem hiding this comment.
Thanks for adding the comment and the fix. Minor typo: "gives". I'm going to check-in with this though.
…pache#213) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
This is a followup ticket from KAFKA-2084 to improve the windowSize calculation in Quotas. I've made the following changes: