KIP-396: Add AlterConsumerGroup/List Offsets to AdminClient#7296
KIP-396: Add AlterConsumerGroup/List Offsets to AdminClient#7296hachikuji merged 12 commits intoapache:trunkfrom
Conversation
|
@guozhangwang @vahidhashemian @bbejeck @harshach @cmccabe @hachikuji As you all voted on this KIP, can a few of you review the PR? Thanks |
ryannedolan
left a comment
There was a problem hiding this comment.
Looking forward to this!
guozhangwang
left a comment
There was a problem hiding this comment.
I did a quick look at the PR and it looks good, would try to squeeze out some time with a thorough review.
|
@guozhangwang @vahidhashemian @bbejeck @harshach @cmccabe @hachikuji I'd love to get this in 2.4, can you take a look? It's a relatively straight forward KIP/PR. |
|
I couldn't help reviewing but this would be really helpful to the Kafka integration on Spark. Looking forward to the feature! |
hachikuji
left a comment
There was a problem hiding this comment.
Thanks, left a few initial comments.
e0c6be4 to
412608e
Compare
|
Thanks @hachikuji and @guozhangwang for the feedback! I've pushed an update:
I'm flying back to the UK tomorrow evening but I should be able to make more changes tomorrow if needed. If we're not happy with the metadata retry logic, maybe a temporary solution would be to remove it and keep using the Consumer in the consumer group tool for now. Then next week I can revisit it. What do you think? |
hachikuji
left a comment
There was a problem hiding this comment.
Thanks for the updates, left some more comments.
|
@hachikuji Thanks again for the review. I've pushed an update. I'll start adding coverage in |
There was a problem hiding this comment.
Not something we have to do here, but one way we could improve this in the future is by taking into account leader epoch information from individual partitions. We can ensure that epochs increase monotonically in order to prevent using stale information during retry.
Another thing we could do is reduce the topics we are fetching metadata for as the ListOffsets requests complete. Ideally we'd only be refetching metadata for topics with metadata errors.
There was a problem hiding this comment.
Yes these improvements would be nice. At the moment, I've kept it very simple and just make it retry the full metadata request every time.
There was a problem hiding this comment.
I'm a bit confused what's going on with this API. We first wait on the aggregate future and then we wrap it in another future. That seems wrong, right? The call to all() shouldn't itself block.
There was a problem hiding this comment.
I don't think we need the nested futures here. The new api just works with a single group, so seems like the type should just be KafkaFuture<Map<TopicPartition, Void>>. Also, note that we don't want to return KafkaFutureImpl directly.
There was a problem hiding this comment.
Yes. For consistency, it's actually best to have KafkaFuture<Map<TopicPartition, Errors>>. So it's the same as deleteConsumerGroupOffsets().
There was a problem hiding this comment.
Huhmm... Actually I think that's a mistake in deleteConsumerGroupOffsets. We don't want to expose Errors directly. I will submit a separate PR
There was a problem hiding this comment.
There was a problem hiding this comment.
Let's just use the same two APIs from deleteConsumerGroupOffsets:
public KafkaFuture<Void> partitionResult(final TopicPartition partition);
public KafkaFuture<Void> all();There was a problem hiding this comment.
This is a good catch, thanks @hachikuji , we can address 8992 within the 2.4 deadline.
|
@mimaison Note the compilation failure: |
|
Thanks @hachikuji, fixed |
hachikuji
left a comment
There was a problem hiding this comment.
Thanks, I think we're almost there, but still a couple problems to fix.
There was a problem hiding this comment.
Hmm.. This is a little different from what we have in DeleteConsumerGroupOffsetsResult. I think it makes sense to check all the partition level errors. cc @dajac
There was a problem hiding this comment.
That's a fair point but I am not sure what the best one is. The rational behind not looking at individual topic/partitions was that it allows to use all() to wait for the completion of the request and then check the individual results. In this case all() fails only if the whole group has failed.
To be more concrete, it allows to do the following:
DeleteConsumerGroupOffsetsResult result = ...;
try {
// wait for the whole group, only raise when a group level or
// transport level exception affection the whole request occurs
result.all().get()
// inspect individual topic/partition
try {
result.partitionResult(...).get()
} catch (Exception e) {
// handle partition exception
}
} catch (Exception e) {
// handle group level exception
}I think that this facilitates the error handling. What do you think?
There was a problem hiding this comment.
That's an interesting point. I think the usual semantics of all is to only succeed if all individual operations have succeeded. It's sort of designed for lazy error handling I guess. If users care about the individual operations, they can check them individually. Otherwise they have a convenient way to check for any errors. Based on what I've seen, this tends to be the most frequent use. I think also part of the idea is to abstract away from the underlying requests. Some of the admin APIs result in multiple broker requests which makes exposing the full granularity of errors quite cumbersome.
There was a problem hiding this comment.
Just made a pass on all XXXResult classes and I think the API semantics are a bit inconsistency in general: originally I thought we only need the all function if the result contains futures in the form of Map<..., KafkaFuture<...>> which potentially requires one trip for each nested future, and the all function is used as a lazy way to check that all entries have completed successfully. But some (e.g. RemoveMemberFromGroupResult in form of Map<MemberIdentity, KafkaFuture<Void>>) actually only requires one request too, so all futures would actually be always completed at the same time. For those cases we do not need an all function either.
But it seems like for results that only contain a KafkaFuture<Object> we also have a dummy all function, and many of their all semantics are different too.
Honestly I think not all results needs an all function, but it seems we are already a bit messy here..
There was a problem hiding this comment.
Yeah, unfortunately the admin APIs have such a big surface area it's hard to maintain consistency. I think the original intent is what I described though.
|
@mimaison I see the recent comments were marked resolved, but I don't see the changes. Are you still working on an update? |
|
Yes I started making the changes but I haven't had the time to finish them
yet. I'll push an update tomorrow or Friday.
Sorry for the delay.
…On Wed, 9 Oct 2019, 21:42 Jason Gustafson, ***@***.***> wrote:
@mimaison <https://github.com/mimaison> I see the recent comments were
marked resolved, but I don't see the changes. Are you still working on an
update?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7296?email_source=notifications&email_token=AAG4TPZ33VDOHJ5Y6CS26I3QNY62PA5CNFSM4ITUTAJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAZJBFI#issuecomment-540184725>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG4TP3I4FVD63QMS5UHKK3QNY62PANCNFSM4ITUTAJQ>
.
|
|
@hachikuji I've pushed an update |
|
Thanks @hachikuji for the feedback, I've pushed another update |
There was a problem hiding this comment.
The user is trying to access a partition that was not requested. I think we could raise IllegalArgumentException directly to the user.
There was a problem hiding this comment.
This is a bit subtle, but I think we want to raise the InvalidMetadataException rather than constructing a new Call. The problem is that we lose the retry bookkeeping which means these retries will not respect the backoff. By throwing the exception, we let the retry logic in Call.fail kick in. This would be consistent with the logic in getFindCoordinatorCall.
|
Thanks @hachikuji, I've updated the PR and rebased on trunk |
hachikuji
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the patch!
|
retest this pleasee |
|
LGTM! |
|
Amazing! Thanks all for the every efforts on this! |
|
Thanks guys, starting the Spark integration part. |
Committer Checklist (excluded from commit message)