KAFKA-18569: New consumer close may wait on unneeded FindCoordinator#18590
KAFKA-18569: New consumer close may wait on unneeded FindCoordinator#18590lianetm merged 2 commits intoapache:trunkfrom
Conversation
fae88bc to
3bb4130
Compare
|
Thanks for the PR @frankvicky! I was curious if you had a chance to look into using the Thanks! |
3bb4130 to
f6a878e
Compare
|
Hi @kirktrue |
f3def68 to
ba51a9d
Compare
kirktrue
left a comment
There was a problem hiding this comment.
Thanks for the refresh on the PR @frankvicky! This looks much more succinct.
I'm still unsure what the behavior is for this sequence of events:
- The coordinator is marked as unknown
CoordinatorRequestManager.poll()is called and creates a newFindCoordinatorRequest- The
NetworkClientDelegatesends the request to the broker Consumer.close()is called with a timeout of 30 secondsConsumerNetworkThread.sendUnsentRequests()is called
In step 5, won't it continue to loop for ~30 seconds because the find request created in step 2 (and sent in step 3) is still inflight when ConsumerNetworkThread.sendUnsentRequests() is called?
do {
networkClientDelegate.poll(timer.remainingMs(), timer.currentTimeMs());
timer.update();
} while (timer.notExpired() && networkClientDelegate.hasAnyPendingRequests());NetworkClientDelegate.hasAnyPendingRequests() will return true while there are any in-flight requests.
Any thoughts?
Thanks!
|
Hi @kirktrue, Thanks for the review. |
b9fa0df to
97e53cb
Compare
|
Currently, It seems that the behavior describe in comment are not followed: kafka/core/src/test/scala/integration/kafka/api/ConsumerBounceTest.scala Lines 300 to 304 in 3276759 Updated: Now the |
|
Hey here, I don't quite get how the The To find a solution, let's look at the classic consumer first, this is my understanding:
Correct me there, but if that's the behaviour, could it be achieved in the new consumer by allowing the |
|
Hi @lianetm @kirktrue, In the classic consumer, the timeout respects However, in the async consumer, this logic is either missing or only applies to individual requests. Should we align the behavior between async and classic consumers? |
|
Hey @frankvicky, good finding. Agree that the behaviour is not aligned in the close timeout handling, so in practice the classic consumer.close will never wait for more than the request timeout if there is a call to close with a larger timeout (and that's indeed missing on the async close timeout) Actually, the behaviour is explicitly called out in one of the tests: So I do agree that we need to align this. But just for my understanding, this is something else we need here to unblock these tests (the If my understanding is right then I think we should file a separate jira for the close timeout considering the request timeout, and if you can validate locally that it's the only fix required to enable the |
|
Hi @lianetm |
I agree that we should align the behavior with how it has functioned for a long time (f72203e). Additionally, we should document this behavior for both |
97e53cb to
4eb61e0
Compare
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
fee2041 to
d236715
Compare
|
The old/new approach to include a specialized event makes sense. Thanks for the suggestion @lianetm! |
d236715 to
09fd01b
Compare
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
lianetm
left a comment
There was a problem hiding this comment.
Thanks @frankvicky ! Just one nit left. Also pls merge trunk latest changes to get the latests test fixed and will check the build again. Thanks!
| * limitations under the License. | ||
| */ | ||
| package org.apache.kafka.clients.consumer.internals.events; | ||
| public class StopFindCoordinatorOnCloseEvent extends ApplicationEvent { |
There was a problem hiding this comment.
Should we add a java doc here? Mainly to describe that the purpose of this event is to ensure that the CoordinatorRequestManager does not generate FindCoordinator requests when the consumer is closing and has already completed the operations that require a coordinator.
There was a problem hiding this comment.
Sure, I have just written some description for it. PTAL 😺
09fd01b to
5fede5a
Compare
JIRA: KAFKA-18569 Please refer to ticker for further details
Co-authored-by: Lianet Magrans <98415067+lianetm@users.noreply.github.com>
7c92923 to
9a2e706
Compare
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
|
Failed test is handled by #18735 |
…18590) Reviewers: Lianet Magrans <lmagrans@confluent.io>, Kirk True <ktrue@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
|
Merged to trunk and cherry-picked to 4.0 |
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
…pache#18590) Reviewers: Lianet Magrans <lmagrans@confluent.io>, Kirk True <ktrue@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
…pache#18590) Reviewers: Lianet Magrans <lmagrans@confluent.io>, Kirk True <ktrue@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
ShareConsumers` may wait on an unneeded `FindCoordinator` during `close()`(i.e after the acknowledgements are sent). #18590 added the `StopFindCoordinatorOnClose` event and was used by the regular consumers. We are using the same event in `ShareConsumers` as well to prevent sending this event when coordinator is no longer needed. Reviewers: Andrew Schofield <aschofield@confluent.io>
JIRA: KAFKA-18569
Please refer to ticket for further details.
In short, now new consumer close may wait for a
FindCoordinatorunsent request to go out when closing the consumer, even after the commit/leaveGroup stages of close are done.Committer Checklist (excluded from commit message)