Skip to content

KAFKA-4607: Validate the names of auto-generated internal topics#2331

Closed
nixsticks wants to merge 10 commits intoapache:trunkfrom
nixsticks:internal-topics
Closed

KAFKA-4607: Validate the names of auto-generated internal topics#2331
nixsticks wants to merge 10 commits intoapache:trunkfrom
nixsticks:internal-topics

Conversation

@nixsticks
Copy link
Copy Markdown
Contributor

I considered catching errors to add further information about naming internal state stores. However, Topic.validate() will throw an error that prints the offending name, so I decided not to add too much complexity.

@asfbot
Copy link
Copy Markdown

asfbot commented Jan 6, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/590/
Test FAILed (JDK 8 and Scala 2.11).

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Jan 6, 2017

I think it is better to add this check closer to the user to avoid deep stack traces. Thus, I would add it to KGroupedStreamImpl and KGroupedKTabelImpl.

\cc @guozhangwang @dguy @enothereska

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Jan 6, 2017

Furthermore, can you please add a test for this. And please update the corresponding JavaDocs and state that store names but be valid topic names (and explain what this means, ie, which characters are allowed and which not). Thx.

@asfbot
Copy link
Copy Markdown

asfbot commented Jan 7, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/588/
Test FAILed (JDK 7 and Scala 2.10).

@nixsticks
Copy link
Copy Markdown
Contributor Author

Will look into it. Thanks @mjsax !

@asfbot
Copy link
Copy Markdown

asfbot commented Jan 7, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/589/
Test FAILed (JDK 8 and Scala 2.12).

@nixsticks
Copy link
Copy Markdown
Contributor Author

nixsticks commented Jan 7, 2017

@mjsax Do you think it might be good to put it both in KGroupedStreamImpl/KGroupedTableImpl and also in InternalTopicConfig in case other, new code tries to use InternalTopicConfig as well?

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Jan 7, 2017

Sounds like a good idea to me. Maybe we can put it as an assertion into InternalTopicConfig ?

Copy link
Copy Markdown
Contributor Author

@nixsticks nixsticks Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a couple of cleanups like this in the cases where it seemed like the test name did not match the actual test, since I am technically writing and editing tests regarding the state store names -- see also shouldNotAcceptNullStateStoreSupplierWhenAggregatingSessionWindows and shouldNotAcceptNullStateStoreSupplierWhenCountingSessionWindows

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic rule is to have a single PR for a single JIRA -- changes like this add "noise" and complicate reviewing in general (review need to know/judge what is part of the actual PR and what is cleanup). For this case it is ok though IMHO :)

@nixsticks
Copy link
Copy Markdown
Contributor Author

@mjsax There are now ... a lot of changes. I am fairly new to contributing to open source so please let me know if I have updated the pull request in the wrong way. I left the full validation in InternalTopicConfig since I figured that internal topics should also follow the rules of regular topics.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was taken from the Scala version of this utility class and I am not sure what best to call it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a tricky one... we do not want to have code duplication -- but we also do not want to depend on kafka-core module...
\cc @guozhangwang What do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @ijuma

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change the scala version to use this one in common? i believe kafka-core depends on common. It would be better to not have the code duplicated if we can avoid @ijuma

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we move the logic here, then we should reuse it and not duplicate it (including tests if they exist). A couple more things:

  1. This class should be in common.internals since we it's an internal class.
  2. We are doing this validation client-side, which could potentially be different from the broker validation (e.g. a future version could allow longer topic names or restrict it further due to file system or OS restrictions). It would be good to explain what is the current behaviour to make it clear how this improves on that.

@asfbot
Copy link
Copy Markdown

asfbot commented Jan 9, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/638/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link
Copy Markdown

asfbot commented Jan 9, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/639/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented Jan 9, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/637/
Test FAILed (JDK 7 and Scala 2.10).

Copy link
Copy Markdown
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I have some comment. Overall look already quite good. Fee free to wait with an update until @guozhangwang replied.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this to StatsStoreSupplier#name(), too? (plus add a test for it)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we try to follow the following policy to make reviewing easier:

  • each new sentence start a new line (you do follow this already)
  • no line longer then 120 character (otherwise lines are longer then Github displays and you need to scroll to the right)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds code redundancy. Please remove. Same blow. Only add validation to the already present not-null checks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove: same as above.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again. Remove.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a class member and use in all test methods to avoid code duplication.
Please add final keyword wherever possible.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use class final class member.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a tricky one... we do not want to have code duplication -- but we also do not want to depend on kafka-core module...
\cc @guozhangwang What do you think?

@miguno
Copy link
Copy Markdown
Contributor

miguno commented Jan 24, 2017

What's the status of this PR? I'd be great to get it merged soon.

@asfbot
Copy link
Copy Markdown

asfbot commented Jan 24, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1141/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented Jan 24, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1139/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link
Copy Markdown

asfbot commented Jan 24, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1139/
Test FAILed (JDK 7 and Scala 2.10).

@guozhangwang
Copy link
Copy Markdown
Contributor

Seems @nixsticks has not addressed the comments yet.

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A meta comment: instead of checking validity in the builder, could we just check validity once in the InternalTopicManager, before calling StreamsKafkaClient to create them, just like what we did in the scala's AdminUtils pattern?

The javadocs addition looks good to me.

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Jan 24, 2017

@guozhangwang I prefer shallow stack traces and thus IMHO we should check as soon as we can if names are valid. (Of course, just my personal opinion.)

@guozhangwang
Copy link
Copy Markdown
Contributor

@guozhangwang I prefer shallow stack traces and thus IMHO we should check as soon as we can if names are valid. (Of course, just my personal opinion.)

I'm convinced.

@nixsticks
Copy link
Copy Markdown
Contributor Author

nixsticks commented Feb 3, 2017

@guozhangwang Sorry, I was going to wait for an update on the kafka-core module vs code duplication, and then I went on PTO for two weeks! I will try to get my changes in soon (I'm returning to the US in a few days), but before that, would you mind addressing the comment from @mjsax about the above?

@nixsticks
Copy link
Copy Markdown
Contributor Author

nixsticks commented Feb 3, 2017

Not sure what happened to my previous comment, but: I ended up addressing most of the comments here tonight. Could you let me know whether the changes are acceptable, and what you'd like regarding the addition of Topic.java vs using the kafka-core module? Thanks!

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 3, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1471/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented Feb 3, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1468/
Test PASSed (JDK 7 and Scala 2.10).

@asfbot
Copy link
Copy Markdown

asfbot commented Mar 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2190/
Test FAILed (JDK 8 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented Mar 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2188/
Test FAILed (JDK 8 and Scala 2.12).

@asfbot
Copy link
Copy Markdown

asfbot commented Mar 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2187/
Test FAILed (JDK 7 and Scala 2.10).

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Mar 15, 2017

@nixsticks There was a checkstyle error in the build. Can you have a look. Thx.

@nixsticks
Copy link
Copy Markdown
Contributor Author

nixsticks commented Mar 15, 2017

@mjsax I did, last night, but it is for a file (ByteArrayConverter on Connect) that I did not touch.

checkstyleMain[ant:checkstyle] [ERROR] /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11/connect/runtime/src/main/java/org/apache/kafka/connect/converters/ByteArrayConverter.java:1: Line does not match expected header line of '/*'. [Header]

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Mar 15, 2017

Try to rebase to trunk

@guozhangwang
Copy link
Copy Markdown
Contributor

@mjsax I did, last night, but it is for a file (ByteArrayConverter on Connect) that I did not touch.

I think this is not from your code, but another PR merged recently that has a header issue, and it has been fixed in a most recent commit: 962c378

@asfbot
Copy link
Copy Markdown

asfbot commented Mar 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/2200/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link
Copy Markdown

asfbot commented Mar 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/2198/
Test PASSed (JDK 8 and Scala 2.12).

@asfbot
Copy link
Copy Markdown

asfbot commented Mar 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/2197/
Test PASSed (JDK 7 and Scala 2.10).

@nixsticks
Copy link
Copy Markdown
Contributor Author

@mjsax @guozhangwang Rebased

Copy link
Copy Markdown
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@nixsticks
Copy link
Copy Markdown
Contributor Author

nixsticks commented Mar 15, 2017

BTW, what is the generally followed process for squashing/rebasing commits before merging?

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Mar 16, 2017

We don't have an official policy. But this are my two cents:

  • As long as you work on the PR, it's best to just append new commits. This simplifies review process as the reviewer can inspect the deltas
  • If rebasing is required (eg., github shows some merge conflicts), just rebase (if possible don't squash commits -- but of course, if you squash first and rebase then, rebasing is simpler and it's also ok if you do this)

When a PR get's merged, the committer doing the merge will squash all commits anyway (so you don't need to squash by yourself). The commit message will contain the github title and github description for the PR (IRRC).

So overall, you don't need to worry about it too much :)

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Mar 16, 2017

The details can be found in the following wiki page:

https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes

What @mjsax is mostly right (apart from the no official policy comment :)). I'd just git merge instead of git rebase when you intend to update the branch with changes from trunk. Since commits are squashed automatically, there's not much point in using rebase and it doesn't make clear when the branch was updated to include more trunk commits.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Mar 16, 2017

By the way, one issue I raised early on is that doing this client-side won't necessarily work in every case. Here's a concrete example:

https://issues.apache.org/jira/browse/KAFKA-4893

Probably still worthwhile doing it, but we still need to handle errors by the broker if it's stricter for whatever reason.

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Mar 16, 2017

@ijuma Thanks for clarification! :)

About KAFKA-4893 -- I guess, we should add a check to Streams, but this would be a different PR. The store names checked here are only part of topic names, and thus, we cannot check the topic name length at this level. I created https://issues.apache.org/jira/browse/KAFKA-4912 for this.

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged to trunk.

@asfgit asfgit closed this in 9e78771 Mar 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants