Skip to content

KAFKA-2443 Expose windowSize on Rate; KAFKA-2567 - Throttle time should not return NaN#213

Closed
auradkar wants to merge 6 commits intoapache:trunkfrom
auradkar:K-2443
Closed

KAFKA-2443 Expose windowSize on Rate; KAFKA-2567 - Throttle time should not return NaN#213
auradkar wants to merge 6 commits intoapache:trunkfrom
auradkar:K-2443

Conversation

@auradkar
Copy link
Copy Markdown
Contributor

This is a followup ticket from KAFKA-2084 to improve the windowSize calculation in Quotas. I've made the following changes:

  1. Added a windowSize function on Rate
  2. Calling Rate.windowSize in ClientQuotaManager to return the exact window size to use when computing the delay time.
  3. Changed the window size calculation subtly. The current calculation had a bug wherein, it used the number of elapsed seconds from the "lastWindowSeconds" of the most recent Sample object. However, the lastWindowSeconds is the time when the sample is created.. this causes an issue because it implies that the current window elapsed time is always "0" when the sample is created. This is incorrect as demonstrated in a testcase I added in MetricsTest. I've fixed the calculation to count the elapsed time from the "oldest" sample in the set since that gives us an accurate value of the exact amount of time elapsed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra blank line. ;)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll address this on my next commit. Thanks.

@auradkar auradkar changed the title KAFKA-2443 Expose windowSize on Rate KAFKA-2443 Expose windowSize on Rate; KAFKA-2567 - Throttle time should not return NaN Oct 6, 2015
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment should now be moved to measurableAsRate.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to stare at the SampledStat code most of today. I actually think the original code is correct. The elapsed time should be the time elapsed in the current window plus the previous (non-obsolete) windows. It seems this is exactly what the current code is doing. Your change I think would effectively almost double the actual elapsed time. Maybe we can discuss this offline.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second Joel's comment.. In particular, the difference in lastWindowMs of different samples is not necessarily multiple of config.timeWindowMs()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the testcase RateWindowing I added. Is the output time correct? i.e. 75 seconds.

Let me provide an example: Assume we have 3 samples of size 30 seconds each.
T0 - Start of time
T60 - 60 seconds have elapsed. 2 full sample windows
T75 - A record is called at this time. Note that there was no activity from 60-75.

the current code will produce windowSize of 60 seconds because:
long elaspedCurrentWindowMs = 0 seconds (because the sample just got created). The timestamp used is the create timestamp of the current sample and not the end timestamp of the previous sample. Shouldn't the samples be contiguous?

long elapsedPreviousWindow = (numSamples - 1) * sampleSize = 2 * 30 = 60 seconds

The current code produces this output:
long elapsedPreviousWindow = 60 seconds (same as above)
long elapsedCurrentWindow = 15 seconds.

Shouldn't this be valid because we actually have 75 seconds of data? IIUC, the current code creates a blackout of 15 seconds.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - you are not doubling the elapsed time because you are actually doing a modulo on the window size. That said, I think the current code should still be correct. Note that in your test you haven't actually created three samples because you didn't call record at the 60 second or later mark. i.e., if you debug through you will find only two samples. So the "current" time is taken off now minus the lastWindowMs of the "current" sample which is the second sample and that ends up being 105 seconds for me (which is correct because the current sample has not rolled over due to the absence of a record).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone reading - Joel and I had a very long chat wherein we agreed that the patch is correct. I've changed the code a little and added a bunch of comments to make it easier to read

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the problem with original approach after looking your test case. Now I think the new approach is correct.

@jjkoshy
Copy link
Copy Markdown
Contributor

jjkoshy commented Oct 8, 2015

Also, just left a comment on the original delay computation RB - I think it will be helpful to update the comment on the delay computation as described (https://reviews.apache.org/r/33049/)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the comment and the fix. Minor typo: "gives". I'm going to check-in with this though.

@asfgit asfgit closed this in 9fde92f Oct 12, 2015
jsancio pushed a commit to jsancio/kafka that referenced this pull request Aug 6, 2019
…pache#213)

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
davide-armand pushed a commit to aiven/kafka that referenced this pull request Dec 1, 2025
jeqo pushed a commit to aiven/kafka that referenced this pull request Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants