KAFKA-4777 fix client heartbeat non-stop retry issue.#2564
KAFKA-4777 fix client heartbeat non-stop retry issue.#2564allenxiang wants to merge 6 commits intoapache:trunkfrom allenxiang:client-heartbeat-fix
Conversation
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
| lookupCoordinator(); | ||
| else | ||
| // coordinator may still be unknown, instead of an immediate retry, let's wait for a bit. | ||
| if (coordinatorUnknown()) |
There was a problem hiding this comment.
Are we attempting to handle the case that there are no brokers available to send the GroupCoordinator request to?
There was a problem hiding this comment.
Yes. In that case, the while loop will come back to this part immediately. This loop is non-stop and will chew up all the cpu cycles.
| else | ||
| AbstractCoordinator.this.wait(retryBackoffMs); | ||
| // coordinator may still be unknown, instead of an immediate retry, let's wait for a bit. | ||
| AbstractCoordinator.this.wait(retryBackoffMs); |
There was a problem hiding this comment.
Instead of waiting in all cases, I'm wondering if it makes sense to check the result of lookupCoordinator. If the future has failed, then we backoff.
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
| lookupCoordinator(); | ||
| else | ||
| if (findCoordinatorFuture == null) { | ||
| if (lookupCoordinator().failed()) |
There was a problem hiding this comment.
Might be worth a comment explaining that the immediate check ensures that we backoff properly in the case that no brokers are available to connect to. I think these checks could also be condensed:
if (findCoordinatorFuture != null || lookupCoordinator().failed())
AbstractCoordinator.this.wait(retryBackoffMs);There was a problem hiding this comment.
Good point. Updated pull request.
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
hachikuji
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the patch!
…rs are available Author: Allen Xiang <allen.xiang@monsanto.com> Reviewers: Jason Gustafson <jason@confluent.io> Closes apache#2564 from allenxiang/client-heartbeat-fix
No description provided.