Skip to content

KAFKA-6948 - Change comparison to avoid overflow inconsistencies#5183

Closed
thisthat wants to merge 2 commits intoapache:trunkfrom
thisthat:KAFKA-6948
Closed

KAFKA-6948 - Change comparison to avoid overflow inconsistencies#5183
thisthat wants to merge 2 commits intoapache:trunkfrom
thisthat:KAFKA-6948

Conversation

@thisthat
Copy link
Copy Markdown

@thisthat thisthat commented Jun 11, 2018

Change timestamp comparison following what the Java documentation recommends to help preventing such errors.

@guozhangwang I create the new PR as requested 😄

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

maybeAutoCommitOffsetsSync(timeoutMs);
now = time.milliseconds();
if (pendingAsyncCommits.get() > 0 && endTimeMs > now) {
if (pendingAsyncCommits.get() > 0 && now - endTimeMs < 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've added a remainingTimeAtLeastZero function in multiple classes of consumer, maybe we can use that function?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to use it in line 575 to protect the call to ensureCoordinatorReady but not for the if guard due to the strict comparator. If it is correct to write

now - endTimeMs <= 0

then I can replace it with

remainingTimeAtLeastZero(now, endTimeMs) == 0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, makes sense.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: thinking about this again, I think we need to consider two cases separately

  1. for a parameter passed into a function call like poll() higher in the call trace like line 575 below, we need to make sure it is not a negative value, and hence it's better to use remainingTimeAtLeastZero.

  2. for a condition comparing two values, we need to first of all make sure none of the comparing values may overflow, for example in KerberosLogin.java below if now + minTimeBeforeRelogin already overflows, then expiry - (now + minTimeBeforeRelogin) < 0 is not safe either. So I think a better solution would be, for any comparing values we makes sure they are directly from timer than as a result of sum, before going on to the condition check: for example in NetworkClientUtils.java above, the comparing expiryTime is a result of startTime + timeoutMs, instead, we should refactor it as:

long startTime = time.milliseconds();

// some calls that may take time

long attemptStartTime = time.milliseconds();

while (!client.isReady(node, attemptStartTime) && attemptStartTime - startTime < timeoutMs)

WDYT?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first point I'm totally agreed with you.
Regarding the second instead, the inequation expiry - (now + minTimeBeforeRelogin) < 0 is still safe also if now + minTimeBeforeRelogin already overflowed. The critical bit with a timestamp comparison is to compare always with a constant/safe value.
Due to the finite arithmetic of computers, the safest way to compare timestamp variables is to perform operations on them only on one side of the inequation. I created a tool that automatically identifies the critical program points and performs these changes. Therefore, I use 0 as safe value. However, your suggestion of comparing values directly from timers such as attemptStartTime - startTime < timeoutMs is even better because it is more natural and readable.

If you agree, I would go though the code again and apply the changes according to the two cases.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks.

@hachikuji
Copy link
Copy Markdown
Contributor

For reference, #5087 would allows us to consolidate these checks and avoid all the inconsistent usage.

@guozhangwang
Copy link
Copy Markdown
Contributor

Thanks for pointing out @hachikuji , is this PR going to be subsumed by #5087 ?

@thisthat
Copy link
Copy Markdown
Author

I checked the changes in #5087 and, currently, there is an overlap only in the ConsumerCoordinator class.
I think the solution of @hachikuji that centralizes the handling of time operations will reduce the probability of introducing errors.
We may think to gradually replace the time comparisons with the Timer class provided in #5087

@guozhangwang
Copy link
Copy Markdown
Contributor

The current PR LGTM. @hachikuji could you make a second pass before merging?

@guozhangwang
Copy link
Copy Markdown
Contributor

cc @hachikuji again.

@ableegoldman
Copy link
Copy Markdown
Member

A number of these fixes are no longer relevant since switching to Timers (I guess that was #10537 ), the remaining ones I just tacked onto this PR addressing a different overflow bug. So I think we can close this in favor of #11111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants