KAFKA-4633: Always using regex pattern subscription in StreamThread#2379
KAFKA-4633: Always using regex pattern subscription in StreamThread#2379guozhangwang wants to merge 17 commits intoapache:trunkfrom
Conversation
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
| for (TopicPartition tp : assignments) | ||
| if (!this.subscription.contains(tp.topic())) | ||
| throw new IllegalArgumentException("Assigned partition " + tp + " for non-subscribed topic."); | ||
| // for (TopicPartition tp : assignments) |
There was a problem hiding this comment.
Why remove this check? We still know all topics we want to subscribe to.
There was a problem hiding this comment.
I'm still pondering on it, the issue is that during rebalance streams leader can possibly assign internal topic partitions while the consumer’s not aware of the topic yet (since these topics are not created in Kafka), the above check will fail. This is actually not introduced in this topic as even before we could hit this issue but it was not exposed by tests.
I think in general this check is valid, and in that case maybe I have to get an extended consumer class as we discussed offline..
There was a problem hiding this comment.
@hachikuji @mjsax Updated to use the subscriptionPattern if possible; this is discussed offline and please see if it looks "simple enough".
|
Also by doing this PR now I see more failures that is being worked on with #2371. This is because now Streams rebalance gets shorter and the race condition is then more likely to happen. |
|
retest this please. |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
retest this please |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
| private final Deserializer<?> valDeserializer; | ||
|
|
||
| private SourceNodeFactory(String name, String[] topics, Pattern pattern, Deserializer keyDeserializer, Deserializer valDeserializer) { | ||
| private SourceNodeFactory(String name, String[] topics, Pattern pattern, Deserializer<?> keyDeserializer, Deserializer<?> valDeserializer) { |
There was a problem hiding this comment.
Any reason why you don't make the topics param a list?
There was a problem hiding this comment.
Input parameters are directly from the addSource calls, whose param is String[]. I feel like deferring the asList to its constructor than letting the caller to do it. No strong opinions though.
|
retest this please |
…into K4633-regex-pattern
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
|
||
| Map<String, Assignment> assignment = assignor.assign(metadata.fetch(), subscriptions); | ||
|
|
||
| // check if the assignor has realized some new topics, if yes update the snapshot |
There was a problem hiding this comment.
@hachikuji This is just to show my current thoughts about how to work around the unnecessary rebalance without changing Assignor APIs, this is really not that elegant but just as a demo, and I'm happy to revert this back for future general solution; my take is that if we would allow assign() functions to update the metadatasnapshot then we probably would need to change the API.
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Build finished. 4117 tests run, 0 skipped, 0 failed. |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
|
||
| // update partition assignment | ||
| subscriptions.assignFromSubscribed(assignment.partitions()); | ||
| this.joinedSubscription = subscriptions.subscription(); |
There was a problem hiding this comment.
I'm not sure this works. The purpose of the joinedSubscription field is to remember the exact list of topics that were used when joining the group. If a metadata update arrives after the rebalance has begun, then we can notice the fact that the joined subscription does not match the current subscription and we can trigger another rebalance. With this change, we will no longer be able to detect this case, which means that consumption from a topic matching the subscribed regex will be delayed (perhaps indefinitely).
It seems the behavior we want is to only add to joinedSubscription those topics which were added to the assignment by the leader.
There was a problem hiding this comment.
@hachikuji Here is my understanding, and correct me if I'm wrong:
-
In
assignFromSubscribed, we will only change thesubscriptionif it is regex pattern and the collected topics as intopicsToSubscribeis a superset ofthis.subscription. -
Besides this added line,
joinedSubscriptionis only set tothis.subscriptionbefore sending the join group request, and thethis.subscriptionitself is only modified either insidesubscribe()calls on inside the previously line if the assigned topics is a super set of the subscription itself.
So given these two, I think this line will only update joinedSubscription with those topics that were added to the assignment by the leader If that does happen, right?
There was a problem hiding this comment.
@guozhangwang A metadata fetch could return in the middle of a rebalance (i.e. with the JoinGroup on the wire). In this case, we may end up discovering a new topic that we didn't know about before and adding it to our subscription (see subscribeFromPattern). We are able to catch this case currently because although the subscription has changed, joinedSubscription has not. With this patch, this is no longer true.
There was a problem hiding this comment.
That makes sense. I can go ahead and tackle on this.
Just clarifying: After the group has formed, both leader and follower can still trigger a rebalance: leader will trigger a rebalance if the existing topics number of partitions has changed (including the topic is deleted); follower will trigger a rebalance if the subscription has changed (both due to a metadata refresh with regex pattern or user called subscribe again). Is that right?
And if we change the consumer coordinator to allow passing regex to the leader, I think joinedSubscription can be removed completely and only leader need to trigger rebalances unless users call subscribe again on any of the consumer member, is that right?
There was a problem hiding this comment.
Just clarifying: After the group has formed, both leader and follower can still trigger a rebalance: leader will trigger a rebalance if the existing topics number of partitions has changed (including the topic is deleted); follower will trigger a rebalance if the subscription has changed (both due to a metadata refresh with regex pattern or user called subscribe again). Is that right?
Yes, right.
And if we change the consumer coordinator to allow passing regex to the leader, I think joinedSubscription can be removed completely and only leader need to trigger rebalances unless users call subscribe again on any of the consumer member, is that right?
The leader will still have to deal with the potential for a metadata update during a rebalance, so I'm not sure we can remove joinedSubscription. At least we won't need this funky logic to try to change joinedSubscription after the rebalance though.
| // TODO this part of the logic should be removed once we allow regex on leader assign and remove joinedSubscription | ||
| Set<String> addedTopics = new HashSet<>(); | ||
| for (TopicPartition tp : subscriptions.assignedPartitions()) { | ||
| if (!subscriptions.subscription().contains(tp.topic())) |
There was a problem hiding this comment.
I'm puzzling a bit over whether we should be using joinedSubscription here instead of subscriptions.subscription(), or whether it matters.
Seems it should be joinedSubscription. Suppose that joinedSubscription contained only [A] at the time of rebalance. Topic B is then created by the leader and assigned. Before the rebalance completes, the consumer discovers topic B, and subscription is updated to [A, B], while joinedSubscription is still [A]. The consumer then receives the assignment containing partitions from both A and B, but addedTopics will be empty (since B was already added to subscription). The consumer will then notice that joinedSubscription is [A], while subscription is [A, B], and request an unneeded rebalance.
|
|
||
| if (!addedTopics.isEmpty()) { | ||
| Set<String> newSubscription = new HashSet<>(subscriptions.subscription()); | ||
| Set<String> newJoinedSubsciprtion = new HashSet<>(joinedSubscription); |
There was a problem hiding this comment.
nit: typo newJoinedSubsciprtion
|
@hachikuji Found an issue with Could you please take another look? |
|
|
||
| // check if the assignment contains some topics that is not in the original | ||
| // subscription, if yes we will obey what leader has decided and add these topics | ||
| // into the subscriptions |
There was a problem hiding this comment.
Add to the end, "as long as they still match the subscribed pattern"
| // update partition assignment | ||
| subscriptions.assignFromSubscribed(assignment.partitions()); | ||
|
|
||
| // check if the assignment contains some topics that is not in the original |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
@hachikuji Addressed, thanks. |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
hachikuji
left a comment
There was a problem hiding this comment.
Minor comments. This is looking as good as it will, I guess.
| final String consumerId = "leader"; | ||
|
|
||
| subscriptions.subscribe(singleton(topicName), rebalanceListener); | ||
| subscriptions.subscribe(Pattern.compile("test.*"), rebalanceListener); |
There was a problem hiding this comment.
Not sure why this was changed. Do we have "normal" subscription covered elsewhere?
| Map<String, List<String>> memberSubscriptions = Collections.singletonMap(consumerId, singletonList(topicName)); | ||
| partitionAssignor.prepare(Collections.singletonMap(consumerId, singletonList(tp))); | ||
| Map<String, List<String>> memberSubscriptions = Collections.singletonMap(consumerId, singletonList(topic1)); | ||
| partitionAssignor.prepare(Collections.singletonMap(consumerId, Arrays.asList(t1p, t2p))); |
There was a problem hiding this comment.
Related to comment above. Can we do this in a separate test case? This is not the "normal" case, but it should be covered somewhere.
|
|
||
| Map<String, Assignment> assignment = assignor.assign(metadata.fetch(), subscriptions); | ||
|
|
||
| // user-customized assignor may have created some topics that are not in the subscription list |
There was a problem hiding this comment.
Can we mention in this comment that this is a hack and not something we want to support long-term unless we push regex into the protocol?
|
@hachikuji addressed. |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
1. In StreamThread, always use subscribe(Pattern, ..) function in order to avoid sending MetadataRequest with specific topic names and cause brokers to possibly auto-create subscribed topics; the pattern is generated as "topic-1|topic-2..|topic-n". 2. In ConsumerCoordinator, let the leader to refresh its metadata if the generated assignment contains some topics that is not contained in the subscribed topics; also in SubscriptionState, modified the verification for regex subscription to against the regex pattern instead of the matched topics since the returned assignment may contain some topics not yet created when joining the group but existed after the rebalance; also modified some unit tests in `KafkaConsumerTest` to accommodate the above changes. 3. Minor cleanup: changed String[] to List<String> to avoid overloaded functions. 4. Minor cleanup: enforced strong typing in SinkNodeFactory and removed unnecessary unchecked tags. 5. Minor cleanup: augmented unit test error message and fixed a potential transient failure in KafkaStreamTest. Author: Guozhang Wang <wangguoz@gmail.com> Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Jason Gustafson <jason@confluent.io> Closes #2379 from guozhangwang/K4633-regex-pattern (cherry picked from commit 3400d0c) Signed-off-by: Jason Gustafson <jason@confluent.io>
1. In StreamThread, always use subscribe(Pattern, ..) function in order to avoid sending MetadataRequest with specific topic names and cause brokers to possibly auto-create subscribed topics; the pattern is generated as "topic-1|topic-2..|topic-n". 2. In ConsumerCoordinator, let the leader to refresh its metadata if the generated assignment contains some topics that is not contained in the subscribed topics; also in SubscriptionState, modified the verification for regex subscription to against the regex pattern instead of the matched topics since the returned assignment may contain some topics not yet created when joining the group but existed after the rebalance; also modified some unit tests in `KafkaConsumerTest` to accommodate the above changes. 3. Minor cleanup: changed String[] to List<String> to avoid overloaded functions. 4. Minor cleanup: enforced strong typing in SinkNodeFactory and removed unnecessary unchecked tags. 5. Minor cleanup: augmented unit test error message and fixed a potential transient failure in KafkaStreamTest. Author: Guozhang Wang <wangguoz@gmail.com> Reviewers: Damian Guy <damian.guy@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Jason Gustafson <jason@confluent.io> Closes apache#2379 from guozhangwang/K4633-regex-pattern
In StreamThread, always use subscribe(Pattern, ..) function in order to avoid sending MetadataRequest with specific topic names and cause brokers to possibly auto-create subscribed topics; the pattern is generated as "topic-1|topic-2..|topic-n".
In ConsumerCoordinator, let the leader to refresh its metadata if the generated assignment contains some topics that is not contained in the subscribed topics; also in SubscriptionState, modified the verification for regex subscription to against the regex pattern instead of the matched topics since the returned assignment may contain some topics not yet created when joining the group but existed after the rebalance; also modified some unit tests in
KafkaConsumerTestto accommodate the above changes.Minor cleanup: changed String[] to List to avoid overloaded functions.
Minor cleanup: enforced strong typing in SinkNodeFactory and removed unnecessary unchecked tags.
Minor cleanup: augmented unit test error message and fixed a potential transient failure in KafkaStreamTest.