KAFKA-4607: Validate the names of auto-generated internal topics#2331
KAFKA-4607: Validate the names of auto-generated internal topics#2331nixsticks wants to merge 10 commits intoapache:trunkfrom
Conversation
|
Refer to this link for build results (access rights to CI server needed): |
|
I think it is better to add this check closer to the user to avoid deep stack traces. Thus, I would add it to |
|
Furthermore, can you please add a test for this. And please update the corresponding JavaDocs and state that store names but be valid topic names (and explain what this means, ie, which characters are allowed and which not). Thx. |
|
Refer to this link for build results (access rights to CI server needed): |
|
Will look into it. Thanks @mjsax ! |
|
Refer to this link for build results (access rights to CI server needed): |
|
@mjsax Do you think it might be good to put it both in |
|
Sounds like a good idea to me. Maybe we can put it as an assertion into |
5ffb1cc to
2f2d7c3
Compare
There was a problem hiding this comment.
I did a couple of cleanups like this in the cases where it seemed like the test name did not match the actual test, since I am technically writing and editing tests regarding the state store names -- see also shouldNotAcceptNullStateStoreSupplierWhenAggregatingSessionWindows and shouldNotAcceptNullStateStoreSupplierWhenCountingSessionWindows
There was a problem hiding this comment.
The basic rule is to have a single PR for a single JIRA -- changes like this add "noise" and complicate reviewing in general (review need to know/judge what is part of the actual PR and what is cleanup). For this case it is ok though IMHO :)
|
@mjsax There are now ... a lot of changes. I am fairly new to contributing to open source so please let me know if I have updated the pull request in the wrong way. I left the full validation in |
There was a problem hiding this comment.
This was taken from the Scala version of this utility class and I am not sure what best to call it
There was a problem hiding this comment.
That's a tricky one... we do not want to have code duplication -- but we also do not want to depend on kafka-core module...
\cc @guozhangwang What do you think?
There was a problem hiding this comment.
Should we change the scala version to use this one in common? i believe kafka-core depends on common. It would be better to not have the code duplicated if we can avoid @ijuma
There was a problem hiding this comment.
Yes, if we move the logic here, then we should reuse it and not duplicate it (including tests if they exist). A couple more things:
- This class should be in
common.internalssince we it's an internal class. - We are doing this validation client-side, which could potentially be different from the broker validation (e.g. a future version could allow longer topic names or restrict it further due to file system or OS restrictions). It would be good to explain what is the current behaviour to make it clear how this improves on that.
2f2d7c3 to
e664456
Compare
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
mjsax
left a comment
There was a problem hiding this comment.
Thanks for the update. I have some comment. Overall look already quite good. Fee free to wait with an update until @guozhangwang replied.
There was a problem hiding this comment.
Please add this to StatsStoreSupplier#name(), too? (plus add a test for it)
There was a problem hiding this comment.
Nit: we try to follow the following policy to make reviewing easier:
- each new sentence start a new line (you do follow this already)
- no line longer then 120 character (otherwise lines are longer then Github displays and you need to scroll to the right)
There was a problem hiding this comment.
This adds code redundancy. Please remove. Same blow. Only add validation to the already present not-null checks.
There was a problem hiding this comment.
Use a class member and use in all test methods to avoid code duplication.
Please add final keyword wherever possible.
There was a problem hiding this comment.
That's a tricky one... we do not want to have code duplication -- but we also do not want to depend on kafka-core module...
\cc @guozhangwang What do you think?
|
What's the status of this PR? I'd be great to get it merged soon. |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Seems @nixsticks has not addressed the comments yet. |
There was a problem hiding this comment.
A meta comment: instead of checking validity in the builder, could we just check validity once in the InternalTopicManager, before calling StreamsKafkaClient to create them, just like what we did in the scala's AdminUtils pattern?
The javadocs addition looks good to me.
|
@guozhangwang I prefer shallow stack traces and thus IMHO we should check as soon as we can if names are valid. (Of course, just my personal opinion.) |
I'm convinced. |
|
@guozhangwang Sorry, I was going to wait for an update on the kafka-core module vs code duplication, and then I went on PTO for two weeks! I will try to get my changes in soon (I'm returning to the US in a few days), but before that, would you mind addressing the comment from @mjsax about the above? |
852f24c to
0cbccc7
Compare
|
Not sure what happened to my previous comment, but: I ended up addressing most of the comments here tonight. Could you let me know whether the changes are acceptable, and what you'd like regarding the addition of |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
@nixsticks There was a checkstyle error in the build. Can you have a look. Thx. |
|
@mjsax I did, last night, but it is for a file (ByteArrayConverter on Connect) that I did not touch.
|
|
Try to rebase to |
e2ec08f to
427c72b
Compare
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
@mjsax @guozhangwang Rebased |
|
BTW, what is the generally followed process for squashing/rebasing commits before merging? |
|
We don't have an official policy. But this are my two cents:
When a PR get's merged, the committer doing the merge will squash all commits anyway (so you don't need to squash by yourself). The commit message will contain the github title and github description for the PR (IRRC). So overall, you don't need to worry about it too much :) |
|
The details can be found in the following wiki page: https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes What @mjsax is mostly right (apart from the |
|
By the way, one issue I raised early on is that doing this client-side won't necessarily work in every case. Here's a concrete example: https://issues.apache.org/jira/browse/KAFKA-4893 Probably still worthwhile doing it, but we still need to handle errors by the broker if it's stricter for whatever reason. |
|
@ijuma Thanks for clarification! :) About KAFKA-4893 -- I guess, we should add a check to Streams, but this would be a different PR. The store names checked here are only part of topic names, and thus, we cannot check the topic name length at this level. I created https://issues.apache.org/jira/browse/KAFKA-4912 for this. |
I considered catching errors to add further information about naming internal state stores. However, Topic.validate() will throw an error that prints the offending name, so I decided not to add too much complexity.