KAFKA-6948 - Change comparison to avoid overflow inconsistencies#5183
KAFKA-6948 - Change comparison to avoid overflow inconsistencies#5183thisthat wants to merge 2 commits intoapache:trunkfrom thisthat:KAFKA-6948
Conversation
| maybeAutoCommitOffsetsSync(timeoutMs); | ||
| now = time.milliseconds(); | ||
| if (pendingAsyncCommits.get() > 0 && endTimeMs > now) { | ||
| if (pendingAsyncCommits.get() > 0 && now - endTimeMs < 0) { |
There was a problem hiding this comment.
We've added a remainingTimeAtLeastZero function in multiple classes of consumer, maybe we can use that function?
There was a problem hiding this comment.
I think it is better to use it in line 575 to protect the call to ensureCoordinatorReady but not for the if guard due to the strict comparator. If it is correct to write
now - endTimeMs <= 0then I can replace it with
remainingTimeAtLeastZero(now, endTimeMs) == 0There was a problem hiding this comment.
EDIT: thinking about this again, I think we need to consider two cases separately
-
for a parameter passed into a function call like
poll()higher in the call trace like line 575 below, we need to make sure it is not a negative value, and hence it's better to useremainingTimeAtLeastZero. -
for a condition comparing two values, we need to first of all make sure none of the comparing values may overflow, for example in
KerberosLogin.javabelow ifnow + minTimeBeforeReloginalready overflows, thenexpiry - (now + minTimeBeforeRelogin) < 0is not safe either. So I think a better solution would be, for any comparing values we makes sure they are directly from timer than as a result of sum, before going on to the condition check: for example inNetworkClientUtils.javaabove, the comparingexpiryTimeis a result ofstartTime + timeoutMs, instead, we should refactor it as:
long startTime = time.milliseconds();
// some calls that may take time
long attemptStartTime = time.milliseconds();
while (!client.isReady(node, attemptStartTime) && attemptStartTime - startTime < timeoutMs)
WDYT?
There was a problem hiding this comment.
For the first point I'm totally agreed with you.
Regarding the second instead, the inequation expiry - (now + minTimeBeforeRelogin) < 0 is still safe also if now + minTimeBeforeRelogin already overflowed. The critical bit with a timestamp comparison is to compare always with a constant/safe value.
Due to the finite arithmetic of computers, the safest way to compare timestamp variables is to perform operations on them only on one side of the inequation. I created a tool that automatically identifies the critical program points and performs these changes. Therefore, I use 0 as safe value. However, your suggestion of comparing values directly from timers such as attemptStartTime - startTime < timeoutMs is even better because it is more natural and readable.
If you agree, I would go though the code again and apply the changes according to the two cases.
|
For reference, #5087 would allows us to consolidate these checks and avoid all the inconsistent usage. |
|
Thanks for pointing out @hachikuji , is this PR going to be subsumed by #5087 ? |
|
I checked the changes in #5087 and, currently, there is an overlap only in the |
|
The current PR LGTM. @hachikuji could you make a second pass before merging? |
|
cc @hachikuji again. |
Change timestamp comparison following what the Java documentation recommends to help preventing such errors.
@guozhangwang I create the new PR as requested 😄
Committer Checklist (excluded from commit message)