KAFKA-7317: Use collections subscription for main consumer to reduce metadata#7969
Conversation
There was a problem hiding this comment.
This is just moved since it's only used by a single class
There was a problem hiding this comment.
Also update sourceTopicNames to keep in sync with nodeToSourceTopics
There was a problem hiding this comment.
Do we need to do this at this point? I guess so at it makes sense to have sourceTopicNames match what's in nodeToSourceTopics. I'm only asking as we never had this before and I'm curious as to why.
There was a problem hiding this comment.
Well I did switch #sourceTopicPattern and the new #sourceTopicCollection to use sourceTopicNames instead of nodeToSourceTopics.values, in which case we do still need to do this.
It shouldn't matter, although I will say the hardest part of working with this code/class was figuring out which data structures did/did not contain what contents or updates. I'm inclined to leave this in so that the two similar data structures are kept in sync, but I'm fine with removing it just to keep the changes minimal/necessary
There was a problem hiding this comment.
This can be simplified by just using sourceTopicNames, which is identical to nodeToSourceTopics.values() but without the global topics, which we remove
There was a problem hiding this comment.
The SubscriptionUpdates class and various updateXXX methods were unnecessarily complex given they all boiled down to just the 4 lines of actual actions below
There was a problem hiding this comment.
This is the actual fix; if the user has not themselves added pattern source topics we will go back to using regular subscription (having safely disabled auto topic creation)
bbejeck
left a comment
There was a problem hiding this comment.
Thanks for the PR @ableegoldman. I've made a quick pass.
The build didn't run but when I ran the tests locally I get some checkstyle failures for InternalTopologyBuilder.
There was a problem hiding this comment.
qq - what does this mean for users that have enabled auto topic creation? Although it's not a best practice, this seems it could lead to unexpected behavior.
There was a problem hiding this comment.
Do you mean on the server side or on the client? I agree we should consider checking for this in the user supplied configs before overriding it, but I think the reasonable default behavior is to disable it (by client config, since it's enabled server-side by default).
If users really do want it then it's on them to enable it through the main consumer config. On the other hand, prior to this it was effectively disabled permanently by the workaround of using pattern subscription. WDYT?
There was a problem hiding this comment.
Yeah, that SGTM I forgot that auto topic creation is the default server-side.
|
Retest this please. |
|
Strange, the PR builder didn't run on PR submission but I was able to trigger the build with "Retest..." |
|
@ableegoldman failures are related - checkstyle failures. |
|
retest this please |
bbejeck
left a comment
There was a problem hiding this comment.
Thanks for this fix @ableegoldman. LGTM, I only have minor comments. We need to get a build running before merging though.
There was a problem hiding this comment.
Nice to get rid of the use of reflection in the tests here and below
There was a problem hiding this comment.
I think in a previous commit this was static and I had concerns that it wouldn't work, so I'm glad to see you changed this.
There was a problem hiding this comment.
nit: I understand the change here for compactness, but I find it a little hard to follow. This is subjective however so feel free to keep as is.
There was a problem hiding this comment.
I kind of thought this way was easier to understand, but I did go back and forth on it. I'm happy to change it back
There was a problem hiding this comment.
Do we need to do this at this point? I guess so at it makes sense to have sourceTopicNames match what's in nodeToSourceTopics. I'm only asking as we never had this before and I'm curious as to why.
|
Retest this please. |
1 similar comment
|
Retest this please. |
|
Java 8 build https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/321/
EDIT: Builds just showed up now. ??? |
|
retest this please |
|
Retest this please. |
|
Java 8 passed Retest this please. |
05dc812 to
9ba3ee4
Compare
|
Retest this please. |
|
Java 8 failed with Java 11 passed https://builds.apache.org/view/All/job/kafka-pr-jdk11-scala2.13/4355/ |
|
Java 11 failed with |
|
Retest this please. |
|
Java 8 on previous PR build passed. On this run it failed on an environmental issue Java 11 passed this run. So I'm merging this PR. |
|
Merged #7969 into trunk. |
|
|
||
| synchronized void updateSubscribedTopics(final Set<String> topics, | ||
| final String logPrefix) { | ||
| log.debug("{}found {} topics possibly matching subscription", logPrefix, topics); |
There was a problem hiding this comment.
nit: missing space between logPrefix and 'found'
There was a problem hiding this comment.
topics is a Set.
What's your intention for the second parameter ?
If you want the number of topics logged, you should use topics.size().
There was a problem hiding this comment.
The spacing kept bothering me too but it's actually correct (logPrefix has a space at the end) -- we should actually clean up this class at some point to use log = new LogContext(logPrefix).logger(getClass()); rather than explicitly insert the prefix everywhere.
I didn't write this log message, just moved things around, but I think the intention was to list the actual topics not just the size. I probably could've improved the wording of it though
There was a problem hiding this comment.
Since topics Set can be quite large, I doubt the intention was to show the contents.
'{} topics' reads like the count of entries should be shown.
There was a problem hiding this comment.
Well the topics set should only include the matching topics that the Streams app is actually subscribed to, but I agree the wording suggests it should be the size/count. Since this PR is already merged do you want to submit a quick follow-up PR?
| subscriptionUpdates.clear(); | ||
| subscriptionUpdates.addAll(topics); | ||
|
|
||
| log.debug("{}updating builder with {} topic(s) with possible matching regex subscription(s)", |
There was a problem hiding this comment.
Similar comment as above two.
Conflicts and/or compiler errors due to the fact that we temporarily reverted the commit that removes Scala 2.11 support: * Exit.scala: replace SAMs with anonymous inner classes. * MiniKdc.scala: take upstream changes. # By A. Sophie Blee-Goldman (1) and others # Via Jason Gustafson * apache-github/trunk: KAFKA-9254; Overridden topic configs are reset after dynamic default change (apache#7870) MINOR: MiniKdc JVM shutdown hook fix (apache#7946) KAFKA-9152; Improve Sensor Retrieval (apache#7928) Correct exception message in DistributedHerder (apache#7995) KAFKA-7317: Use collections subscription for main consumer to reduce metadata (apache#7969) KAFKA-9181; Maintain clean separation between local and group subscriptions in consumer's SubscriptionState (apache#7941) KAFKA-7737; Use single path in producer for initializing the producerId (apache#7920) # Conflicts: # core/src/test/scala/kafka/security/minikdc/MiniKdc.scala
Also addresses KAFKA-8821
Note that we still have to fall back to using pattern subscription if the user has added any regex-based source nodes to the topology. Includes some minor cleanup on the side