KAFKA-15696: Refactor AsyncConsumer close procedure#14920
KAFKA-15696: Refactor AsyncConsumer close procedure#14920philipnee wants to merge 3 commits intoapache:trunkfrom
Conversation
|
Hi @lucasbru - I intended to open this for review however, I'm having a hard time to get testCommitAsyncLeaderEpochUpdate because it somehow seems to try to send a commit request with -1 epoch in it (as well as several other tests). I thought the intention was to just send an async request, i wonder why is the epoch = -1... |
|
@philipnee I don't see I'd suggest getting this PR in shape and test the close functionality specifically. If there is something in the existing test setup that needs to change to make the close go through smoothly in the other tests, we can do that (once we see it failing on CI), but ideally we just move the problematic tests out of the spy-based test framework, which should resolve the problem (as no close functionality will be required). I'm working on a PR to move some of those tests out of |
|
Hi @lucasbru - Thanks for the response. unfortunately, the way the mock is setup and how other tests are written, close will have dependencies to the network thread, which means we are actually testing all related components. I'll get the PR in shape and have you review it asap. |
7eb1e25 to
4e2a6f6
Compare
kirktrue
left a comment
There was a problem hiding this comment.
Thanks for the PR, @philipnee. I left a few comments/questions.
| // Ensure all async commit callbacks are invoked | ||
| swallow(log, Level.ERROR, "Failed invoking asynchronous commit callback", this::maybeInvokeCommitCallbacks, firstException); |
There was a problem hiding this comment.
These should be done before we shut down the network, right?
| /** | ||
| * Prior to closing the network thread, we need to make sure the following operations happen in the right sequence: | ||
| * 1. autocommit offsets | ||
| * 2. revoke all partitions | ||
| */ | ||
| private void prepareShutdown(final Timer timer) { | ||
| if (!groupMetadata.isPresent()) | ||
| return; | ||
|
|
||
| maybeAutoCommitSync(timer); | ||
| timer.update(); | ||
| if (!subscriptions.hasAutoAssignedPartitions() || subscriptions.assignedPartitions().isEmpty()) | ||
| return; | ||
|
|
||
| try { | ||
| // If the consumer is in a group, we will pause and revoke all assigned partitions | ||
| onLeavePrepare().get(timer.remainingMs(), TimeUnit.MILLISECONDS); | ||
| timer.update(); | ||
| } catch (Exception e) { | ||
| Exception exception = e; | ||
| if (e instanceof ExecutionException) | ||
| exception = (Exception) e.getCause(); | ||
| throw new KafkaException("User rebalance callback throws an error", exception); | ||
| } finally { | ||
| subscriptions.assignFromSubscribed(Collections.emptySet()); | ||
| } | ||
| } | ||
|
|
||
| private void maybeAutoCommitSync(final Timer timer) { | ||
| if (autoCommitEnabled) { | ||
| Map<TopicPartition, OffsetAndMetadata> allConsumed = subscriptions.allConsumed(); | ||
| try { | ||
| log.debug("Sending synchronous auto-commit of offsets {} on closing", allConsumed); | ||
| commitSync(allConsumed, Duration.ofMillis(timer.remainingMs())); | ||
| } catch (Exception e) { | ||
| // consistent with async auto-commit failures, we do not propagate the exception | ||
| log.warn("Synchronous auto-commit of offsets {} failed: {}", allConsumed, e.getMessage()); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| private CompletableFuture<Void> onLeavePrepare() { | ||
| SortedSet<TopicPartition> droppedPartitions = new TreeSet<>(MembershipManagerImpl.TOPIC_PARTITION_COMPARATOR); | ||
| droppedPartitions.addAll(subscriptions.assignedPartitions()); | ||
| if (!subscriptions.hasAutoAssignedPartitions() || droppedPartitions.isEmpty()) { | ||
| return CompletableFuture.completedFuture(null); | ||
| } | ||
| // TODO: Invoke rebalanceListener via KAFKA-15276 | ||
| return CompletableFuture.completedFuture(null); | ||
| } | ||
|
|
There was a problem hiding this comment.
I would imagine that the leave group process could be mostly performed using an event and a callback execution, like in #14931.
We'd need to submit an event to the background thread (e.g. PrepareCloseEvent) so that the member manager can orchestrate the callback request and the heartbeat request.
There was a problem hiding this comment.
Thanks. I mostly agree with your idea. Though - I think simply firing the callback revocation from the close() should be enough - but I think sending leave-group and closing events as you suggested is a good idea.
There was a problem hiding this comment.
Several people seem to agree that we should solve this as much as possible via an event. Is the new draft PR going to replace this PR, or should we try to merge this one?
There was a problem hiding this comment.
I want to be consistent with Kirk's approach for unsubscribe() in terms of callback invocation - So I took the previous comment back (sorry about the confusion)
| } | ||
| } | ||
|
|
||
| private CompletableFuture<Void> onLeavePrepare() { |
There was a problem hiding this comment.
May miss a bit of context, but I'm not yet sure what this function is achieving. If this is for KAFKA-15276, maybe we can implement this as part of that ticket, because this function is mostly confusing me.
There was a problem hiding this comment.
I need to speak to kirk about how he wants to implement the callback invocation.
| metadata, | ||
| subscriptions, | ||
| fetchConfig, | ||
| deserializers, |
There was a problem hiding this comment.
It seems like you have slightly different formatting settings than somebody else. I wouldn't do these kinds of whitespace changes unless it's obviously unclean
| commitFuture.complete(null); | ||
|
|
||
| prepareCommit(Arrays.asList(t1, t0), DEFAULT_GROUP_ID, Errors.NONE); | ||
| // TODO: The log shows NPE thrown from the CommitRequestManager, which is caused by the use of mock. |
There was a problem hiding this comment.
I think as long as the test isn't failing this is fine, since the problem should go away once we move to mocks
80e6162 to
364782a
Compare
Update AsyncKafkaConsumer.java test ensure heartbeat manager is polled Clean up Update AsyncKafkaConsumerTest.java clean up Update ConsumerNetworkThreadTest.java leave group on close stuff
364782a to
e22a19f
Compare
This ticket encompasses several different tickets:
KAFKA-15696/KAFKA-15548
When closing the consumer we need to perform a few tasks