KAFKA-6958: Allow to name operation using parameter classes#6410
KAFKA-6958: Allow to name operation using parameter classes#6410bbejeck merged 1 commit intoapache:trunkfrom
Conversation
55fd810 to
fda8784
Compare
|
Java 8 and Java 11 failed test results already cleaned up retest this please |
|
\cc @vvcephei @ableegoldman for reviews |
bbejeck
left a comment
There was a problem hiding this comment.
Thanks @fhussonnois, I've made a pass and have a few minor comments.
There was a problem hiding this comment.
While I'm in favor of code re-use, in this case, the code in Topic.validate is not too large and could be easily ported to a Named.validate method. By doing so, Kafka Streams can change naming rules as needed.
I realize that Materialized.as usesTopic.validate to validate the name of the store, but I'd suggest updating to use Named.validate there as well.
NOTE: If we do this, we'd need to update the KIP
EDIT: Actually I'm not sure we'd need to update the KIP as most likely this method would not be publicly accessible.
There was a problem hiding this comment.
@fhussonnois thinking about this some more, what is the motivation for doing a validation here for processor names?
When Streams starts up the AdminClient will attempt to create any internal topics and the full topic names are validated at that point, so we don't need this check up front.
\cc @guozhangwang
There was a problem hiding this comment.
@bbejeck, actually I've followed the same logic than for the class Materialized. In general, we should try to throw the exception as soon as possible. Also, here the exception is thrown during the build of the topology (a bad name will be detected during unit tests). If we do not check the name in the Named/Materialized classes then an nivalid name will be only detected during runtime. Depending of the semantic we would like to have, we can consider than an invalid name is topology exception or an invalid topic exception ?
Also, I'm also in favor to duplicate Topic.validate as this will allow to not tie the operation name with the topic name in the exception message.
There was a problem hiding this comment.
You have a point about checking via unit tests, but when the AdminClient attempts to create the topic, the check is against the full name.
The check here only looks at the user-supplied name, which is only part of the topic name so that we could have a situation, although admittedly rare, where the unit tests pass, the full name fails.
So I'm inclined not to have the check here or in the Materialized class. Let's see what others think.
\cc @guozhangwang @mjsax @vvcephei @ableegoldman
There was a problem hiding this comment.
I agree, that we should check when using AdminClient -- however, I also agree that we should check early if possible. (I don't see an issue with double checking.) Sometimes a name is used as part of a topic name, and we can check for invalid characters here already. Could we add a flag like "usedInTopicName" and do different check accordingly? On the other hand, I don't see a big issue with enforcing the topic naming conventions to processor names, too, even if it's just a processor name and not used as part of a topic name.
There was a problem hiding this comment.
I've thought about this some more and comments from @mjsax and @fhussonnois have convinced me the checking earlier is a good idea as well.
But I still think we should duplicate the current logic and put the name checking logic in the Named class.
There was a problem hiding this comment.
To re-start this thread, I also feel like we should have our own check for Named operations to use.
- we may want to make operation names more restrictive than topic names, for example to prevent collisions with automatically-generated partition names
- the topic validation throws an exception that mentions the name is "an invalid topic name". This statement is nonsense if I'm naming an operator. We should throw an exception that says it's an "invalid operator name" or similar.
There was a problem hiding this comment.
According to the KIP Printed should add a Printed.as(final String name) method
There was a problem hiding this comment.
@bbejeck If we add a method Printed.as then currenlty there is not method withFile or withSysOut to set the the outputstream.
I think we should not add the Printed.as method and update the KIP ?
There was a problem hiding this comment.
That sounds good to me, and if I recall correctly we had similar reasons for not adding as to Suppressed.
There was a problem hiding this comment.
SGTM.
If we update the KIP, we should send a follow up email to the VOTE thread summarizing the changes (we can do this after all PRs are merged in case there is more)
There was a problem hiding this comment.
appending KTable source operators with -table-source is not in the KIP, so we'll either need to remove this or update the KIP
There was a problem hiding this comment.
I think we need to use some suffix because otherwise we would generate two names, ie, end up with a naming conflict -- problem is, that a KTable results in two processors and we need a name for each.
There was a problem hiding this comment.
@bbejeck actually, the table() method creates one SourceNode and one ProcessorNode. We need a way to differentiate those two nodes. I should update the KIP to mention this particularity.
There was a problem hiding this comment.
Ack, I get it now. Thanks for clarifying.
There was a problem hiding this comment.
depending on the decision regarding naming sources for builder.table we'll need to update this test.
|
Java 8 passed, Java 11 failed results already cleaned up retest this please |
0321745 to
14fa150
Compare
There was a problem hiding this comment.
The empty constructor is used to build an empty NamedInternal, see NamedInternal.empty()
There was a problem hiding this comment.
Maybe just call new NamedInternal(null) instead?
04f343a to
573c770
Compare
mjsax
left a comment
There was a problem hiding this comment.
Overall LGTM.
I am personally not convinced that the pattern we use for NamedInternal is the best one. But it's highly subjective. I would just pass some Strings around -- seems easier to me personally.
There was a problem hiding this comment.
I agree, that we should check when using AdminClient -- however, I also agree that we should check early if possible. (I don't see an issue with double checking.) Sometimes a name is used as part of a topic name, and we can check for invalid characters here already. Could we add a flag like "usedInTopicName" and do different check accordingly? On the other hand, I don't see a big issue with enforcing the topic naming conventions to processor names, too, even if it's just a processor name and not used as part of a topic name.
There was a problem hiding this comment.
I am personally a little confused what orElseGenerateWithPrefix means? It's a personal preference, but I don't think it's easy to read. Similar for suffixWithOrElseGet. (Maybe it's just me, being not use to fancy Java8 constructs that are mimicked here...)
Curious to hear what others think.
There was a problem hiding this comment.
I'm ok with the names, but I don't have a strong opinion. We still have time to address between now and the final PR though.
|
@fhussonnois #6409 is merged, can you rebase this PR? Also, this PR is very close to merging, can you add the name verification logic into the |
573c770 to
3c9f23a
Compare
|
@bbejeck @mjsax This PR has been rebased (following #6409). The I think we should (for now) enforce the topic naming conventions to processor name because they are used for naming some metrics too. I also add simple tests for thanks for the review. |
bbejeck
left a comment
There was a problem hiding this comment.
Thanks for the updated PR @fhussonnois!
This looks good to me if we can address the last final comments we can get this PR merged.
3c9f23a to
0296f3b
Compare
There was a problem hiding this comment.
Not sure about this test the title says shouldUseSpecifiedNameForGlobalTableSourceProcessor but it's asserting the names of state-stores. But we can fix this in one of the following PRs.
There was a problem hiding this comment.
We should fix right away -- otherwise it might slip.
bbejeck
left a comment
There was a problem hiding this comment.
Latest updates LGTM. There are a few additional issues, but we should fix them in the next PR.
mjsax
left a comment
There was a problem hiding this comment.
Thanks for rebasing and sorry to the delay in reviewing. Some more follow ups. Mostly nits
There was a problem hiding this comment.
Seems not to align with the JavaDoc. I think passing in null should be fine.
There was a problem hiding this comment.
Is this change documented in the KIP?
There was a problem hiding this comment.
The KIP has been completed.
0296f3b to
50b706b
Compare
Sub-task required to allow to define custom processor names with KStreams DSL(KIP-307) : - update existing classes Consumed/Grouped/Printed/Produced to implement NamedOperation - introduce new public class Named
50b706b to
6836d5c
Compare
|
Java 8 failed with known flaky test retest this please |
|
Failed with retest this please |
|
Merged #6410 into trunk |
|
Thank you @fhussonnois! |
Hi @mjsax @bbejeck
This is the 2nd PR for the KIP-307.
NOTE : PR 6409 should be merge first
Thanks a lot for the review.