KIP-221 / Add KStream#repartition operation by lkokhreidze · Pull Request #7170 · apache/kafka

lkokhreidze · 2019-08-06T18:20:30Z

KIP-221: Enhance DSL with Connecting Topic Creation and Repartition Hint

Description

This is PR for KIP-221. Goal of this PR is to introduce new KStream#repartition operator and underline machinery that can be used for repartition configuration on KStream instance.

Notable Changes

Introduced org.apache.kafka.streams.kstream.internals.graph.UnoptimizableRepartitionNode. This node is NOT subject of optimization algorithm, therefore, each repartition operation is excluded from optimization algorithm.
Introduced org.apache.kafka.streams.processor.internals.InternalTopicProperties class that can be used for capturing repartition topic configurations passed via DSL operations
Added org.apache.kafka.streams.processor.internals.InternalTopologyBuilder#internalTopicNamesWithProperties map for storing mapping between internal topics and their corresponding configuration. If configuration is present RepartitionTopicConfig is enriched with configurations passed via DSL operations (In this case via org.apache.kafka.streams.kstream.Repartitioned class).
Added KStreamRepartitionIntegrationTest for testing different scenarios of KStream#repartition

Should create repartition topic if key changing operation was NOT performed
Should Perform Key Select Operation When Repartition Operation Is Used With Key Selector
Should Create Repartition Topic With Specified Number Of Partitions
Should Inherit Repartition Topic Partition Number From Upstream Topic When Number Of Partitions Is Not Specified.
Should Create Only One Repartition Topic When Repartition Is Followed By GroupByKey.
Should Generate Repartition Topic When Name Is Not Specified.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

…mber of partitions based on InternalTopicProperties

…, KeyValueMapper)

…tioner

…erations method with InteralTopicConfig and StreamPartitioner parameters

…rtitioner

…cNamesWithProperties; Moved InternalTopicProperties class to dedicated file

… after repartition operation is performed

mjsax · 2019-08-06T20:55:25Z

@lkokhreidze Thanks for the PR. There are some checkstyle errors. Can you please fix them before we review your PR?

lkokhreidze · 2019-08-07T06:12:46Z

@mjsax done

mjsax

Thanks for the PR.

Made an initial pass. I still need to wrap my head around the optimization layer and how we merge repartition nodes. We need to add more test to RepartitiontTopicNameTest and/or StreamsGraphTest IMHO, to verify that the new repartition() operator works as intended.

Also, it seems you forgot to update groupBy() and groupByKey().

Finally, thinking about the KIP once more: as we extend groupBy to configure the internal repartition topic, I am wondering if we should extend the KIP and also allow to do this for join() that may also create repartition topics? \cc @guozhangwang @bbejeck @vvcephei @ableegoldman @cadonna @abbccdda

lkokhreidze · 2019-08-07T18:45:30Z

Thanks for the PR.

Made an initial pass. I still need to wrap my head around the optimization layer and how we merge repartition nodes. We need to add more test to RepartitiontTopicNameTest and/or StreamsGraphTest IMHO, to verify that the new repartition() operator works as intended.

Also, it seems you forgot to update groupBy() and groupByKey().

Finally, thinking about the KIP once more: as we extend groupBy to configure the internal repartition topic, I am wondering if we should extend the KIP and also allow to do this for join() that may also create repartition topics? \cc @guozhangwang @bbejeck @vvcephei @ableegoldman @cadonna @abbccdda

@mjsax this is the first PR (written in PR description) KStream#groupBy changes will be part of next PR if this one passes. I didn't want to create big PR without validating underline machinery and logic in the first place. I'm okay doing all the changes here, I was just thinking it will make review harder.

lkokhreidze · 2019-08-07T18:56:12Z

Made an initial pass. I still need to wrap my head around the optimization layer and how we merge repartition nodes. We need to add more test to RepartitiontTopicNameTest and/or StreamsGraphTest IMHO, to verify that the new repartition() operator works as intended.

Agreed, I'll do that. I wanted to tag @bbejeck as seems like he's the main author behind optimization logic. I'll add tests for optimization logic to make sure nothing breaks.

mjsax · 2019-08-07T23:19:50Z

Ah. I guess, I skipped the PR description... Sorry for that.

I discussed the proposal with @vvcephei in person, and thinking about the semantics once more, I am actually wondering if it is wise to change groupBy at all (this thought also affects my previous comment to include join() -- I would like to retract this idea :) )

We have basically two dimensions which 2 cases each to consider for groupBy (and also join():

repartition not required -- no Repartitioned object provided
repartition not required -- Repartitioned object is provided
repartition required -- no Repartitioned object provided
repartition required --Repartitioned object is provided

Case (1), (2), and (4) are straight forward. However, case (2) is somewhat awkward because we actually want to treat Repartitioned as a configuration object but specifying it in groupBy should not enforce a repartitioning (if one want to enforce a repartitioning, they should use the new repartition() operator). Hence, for case (2) the Repartitioned configuration would be ignored.

Therefore, only case (4) is left in which passing in Repartitioned would have an effect. For this case, it would be possible to change the number of partitions or to specify a different partitioner. (I ignore the serde and naming because this can be achieved via Grouped anyway). However, if one wants to change the number of partitions or wants to set a specific partitioner, it seems they want to to apply (ie enforce) this configuration independent of the upstream topic configuration; ie, it is actually a case for which the user wants to enforce a repartitioning. Hence, it seems perfectly fine (and actually better, because it's semantically more explicit) that a user should call repartition() instead.

Therefore, I don't see a good use case for which it make sense to pass in Repartitioned into groupBy.

Would be great if you could share your thoughts about it?

A second point I discussed with @vvcephei is about the optimization. We both have the impression that repartition() should not be subject to the repartition topic optimization. Instead, an enforced repartitioning step, should be added to the topology in a hard coded way similar to a call to through(). Maybe we can be some advanced optimization rules later, but it seems difficult (and potentially unsafe/incorrect) to apply the repartition topic optimization for this case. Hence, we would suggest to skip this optimization in this PR.

ableegoldman · 2019-08-07T23:40:44Z

However, if one wants to change the number of partitions or wants to set a specific partitioner, it seems they want to to apply (ie enforce) this configuration independent of the upstream topic configuration; ie, it is actually a case for which the user wants to enforce a repartitioning.

Not sure I agree @mjsax -- maybe you just want to control the parallelism in case a repartition is required? You could enforce users to step through their whole topology, figure out when/where repartitioning is needed, and use repartition to set the parallelism. Or, you could let Streams do this for you -- as it currently does, way more conveniently and probably less error-prone -- and supply a configuration to be used if Streams figures out you need to repartition.

mjsax · 2019-08-07T23:54:42Z

maybe you just want to control the parallelism in case a repartition is required

I don't see this as a use case in practice. Why would one want to change the parallelism? Because, the aggregation operation is over or under provisions and thus one wants to decrease or increase the parallelism. If I am ok with the "default" parallelism in case there is no repartitioning, why would I not be ok with it if data is repartitioned?

You could enforce users to step through their whole topology, figure out when/where repartitioning is needed, and use repartition to set the parallelism.

This is less an issue IMHO, because if I want to scale up for example, it's sufficient to insert repartition() once upstream and all downstream auto-created topics would inherit the parallelism implicitly (as they inherit it now from source topics). Hence, I don't need to insert repartition all over the place.

ableegoldman · 2019-08-08T00:24:40Z

If I am ok with the "default" parallelism in case there is no repartitioning, why would I not be ok with it if data is repartitioned?
Maybe you're processing data from a topic on your company's cluster, which has a huge number of partitions to begin with. Maybe your workload needs nowhere near this many partitions (you're filtering out most records, it's overpartitioned to begin, your just testing). You run your Streams app which creates some N topics all of which have this huge number of partitions. Your brokers struggle and your boss gets mad?
Why does anyone want to change this ever? (ie with repartition)

it's sufficient to insert repartition() once upstream
That's a good point though. So the repartition operator is really a "auto-create topic" + "set parallelism" operator -- should be sure to document this well. Now, what if someone wants to configure the parallelism of a certain repartition topic(s) but would like to continue using the source topic's parallelism as the default?
Not saying we should necessarily support that, but we should at least make it very clear to users how using this will affect downstream topics.

mjsax · 2019-08-08T00:44:35Z

Why does anyone want to change this ever? (ie with repartition)

My question is, why do you need groupByKey(Repartitioned.with())? If you want to scale down, it seems better to explicitly call repartition(Repartitioned.with()).groupByKey() -- otherwise, you might not scale down if not auto-repartition topic is created and it seems rather error prone that we allow to specify the number of partitions and than ignore the config entirely.

Similar to your second comment, if you want to "scale up" again later, you call repartition() again. I agree that we need to document this explicitly if we follow this route. However, it's similar behavior as we have as-of now. If you insert a through() all downstream operators inherit the parallelism from it.

ableegoldman · 2019-08-08T01:34:05Z

Ok, well I am fine with this framing it as a "set parallelism" operation...I don't want to stall this KIP/PR further, but what if this was split into a new set of setParallelism and repartition operators, where the repartition just auto-creates the topic while upstream setParallelism is responsible for setting the number of partitions?
Just wondering if it's worth making this more explicit, since there's really two new functionalities being introduced here. Doesn't hurt to bundle them into one operator, as long as users know what two things it actually does...

lkokhreidze · 2019-08-08T06:30:43Z

Hello @mjsax @ableegoldman @vvcephei
Thanks a lot for valuable insights and interesting discussion.
@mjsax - your arguments make sense, but I'm leaning more towards @ableegoldman points.
In addition, in my mind, one other important point that we need to take into account is not only parallelism but general configuration of repartition topics. In my mind, this KIP can be the foundation of actually giving users control over each individual repartition topic configuration. To be honest, I was tempted to propose deprecating KStream#groupBy(Grouped) operation altogether. Let me explain my reasoning a bit. After this KIP, I don't see any actual benefit nor need of actually using KStream#groupBy(Grouped). With KStream#groupBy(Repartitioned) user can do exact same things, plus more. Right now, in KStream there're Grouped and Joined configuration classes (and maybe some others that I'm missing) that are used for specifying
a) topology name (which translates to repartition topic naming)
b) producer serdes
c) partitioner
All those configurations can be encapsulated under Repartitioned, in addition with all other topic level configurations that user may want to pass to internal topic creation. Maybe this was discussed before, and there's a good reason why there're separate configuration classes for each operation (besides giving api the nice look, of course :) ).
One argument that comes to mind why we may actually want to have Repartitioned for groupBy is simple syntax sugar. For example, there isn't fundamental different between this two topologies:

builder
    .stream()
    .repartition((key, value) -> value.newKey(), Repartitioned.with("by-new-key").withNumberOfPartitions(2))
    .groupByKey()

builder
    .stream()
    .groupBy((key, value) -> value.newKey(), Repartitioned.with("by-new-key").withNumberOfPartitions(2))

While, for me, as a user, 2nd option looks much more appealing, similarly how key selector for KStream#groupBy merges together two operations (selectKey().groupBy()).

Again, your arguments are totally valid, and all can be achieved just by having repartition(Repartitioned) operation. But on the other hand, I don't see anything bad with adding Repartitioned option to groupBy. It won't break API semantics (at least I think it won't) and will give the user extra flexibility around controlling repartition topics.

lkokhreidze · 2020-03-15T16:48:04Z

Thanks @vvcephei for the update and no worries :)

vvcephei

Haha, well. I did start the review, and made a fair amount of progress before getting sidetracked by a global catastrophe...

It's still in my "actively working on this" bucket, and I'll commit to not starting new work until I finish my review. For now, I'll go ahead and ask this one question, which came up early in my review. I skimmed over the KIP and discussion thread, but didn't see a specific discussion of the overload in question.

…operation

vvcephei · 2020-03-27T19:35:17Z

test this please

vvcephei

Hey, @lkokhreidze , I finally finished my review, and it looks good to me. I'm not sure if @mjsax wants to make another pass.

mjsax

Sorry for the delay in reviewing!! And thanks to @vvcephei to help pushing this through.

Overall LGTM. Thanks for writing extensive tests!!!

mjsax · 2020-04-01T05:30:40Z

+    }
+
+    @Test
+    public void shouldCreateOnlyOneRepartitionTopicWhenRepartitionIsFollowedByGroupByKey() throws ExecutionException, InterruptedException {


Similar to above: we should be able to test with via unit tests using Topology#describe()

Thought about that, but somehow it felt "safer" with integration tests. Mainly because I was more comfortable verifying that topics actually get created when using repartition operation.

I had a similar thought, that it looks like good fodder for unit testing, but I did like the safety blanket of verifying the actual partition counts. I guess I'm fine either way, with a preference for whatever is already in the PR ;)

Mainly because I was more comfortable verifying that topics actually get created when using repartition operation.

I guess that is fair. (I just try to keep test runtime short if we can -- let's keep the integration test.)

mjsax · 2020-04-01T05:31:15Z

+    }
+
+    @Test
+    public void shouldGenerateRepartitionTopicWhenNameIsNotSpecified() throws ExecutionException, InterruptedException {


Seems to be unit-test able via Topology#describe() ?

Thought about that, but somehow it felt "safer" with integration tests. Mainly because I was more comfortable verifying that topics actually get created when using repartition operation.

mjsax · 2020-04-01T05:31:46Z

+    }
+
+    @Test
+    public void shouldGoThroughRebalancingCorrectly() throws ExecutionException, InterruptedException {


Not sure what this test is about, ie, how does is relate to the repartition() feature?

It's related to this comment #7170 (comment)

Thanks for clarifying!

mjsax · 2020-04-01T05:34:11Z

+    }
+
+    @Test
+    public void shouldInvokePartitionerWhenSet() {


Not sure what this test actually verifies?

This was the "easiest" way I could figure out to verify that custom partitioner is invoked when it's set

Co-Authored-By: John Roesler <vvcephei@users.noreply.github.com>

lkokhreidze · 2020-04-03T23:03:02Z

Hi @mjsax, I've addressed your comments, would appreciate another review.

lkokhreidze · 2020-04-07T14:54:41Z

Hi @mjsax @vvcephei

Small update: f2bcdfe In this commit I've added Topology optimization option as test parameter. This PR touches topology optimization (indirectly). In order to make sure that everything works as expected, I though it would beneficial in the integration tests verifying both, topology.optimization: all and topology.optimization: none configurations. Hope this makes sense.

Regards,
Levani

vvcephei · 2020-04-07T17:33:24Z

Wow, that's great. Thanks, @lkokhreidze !

mjsax · 2020-04-09T23:47:20Z

+        Arrays.asList(StreamsConfig.OPTIMIZE, StreamsConfig.NO_OPTIMIZATION)
+              .forEach(x -> values.add(new Object[]{x}));
+
+        return values;


Seems unnesseary complex? A simple

return Arrays.asList(new String[][] { {StreamsConfig.OPTIMIZE}, {StreamsConfig.NO_OPTIMIZATION} });

would do, too :)

(Feel free to ignore the comment.)

mjsax · 2020-04-09T23:49:15Z

+        return values;
+    }
+
+    public KStreamRepartitionIntegrationTest(final String topologyOptimization) {


A simple

@Parameter public String topologyOptimization;

Would be sufficient instead of adding a constructor and those lines could go into before().

(As above, feel free to ignore this comment.)

mjsax · 2020-04-09T23:53:25Z

Merged to trunk. Congrats @lkokhreidze! And thanks a lot for your hard work and patience!

vvcephei · 2020-04-10T14:20:21Z

Yes, thank you @lkokhreidze for seeing this through!

lkokhreidze added 15 commits August 3, 2019 13:09

KAFKA-8611 / add Repartitioned configuration class

7bc0e41

KAFKA-8611 / add NO-OP KStream#repartition operations

597fb18

KAFKA-8611 / fix Repartitioned class checkstyle violations

457d824

KAFKA-8611 / introduce InternalTopicProperties class; enrich topic nu…

771e8e6

…mber of partitions based on InternalTopicProperties

KAFKA-8611 / Add implementation for KStream#repartition(Repartitioned…

261aafd

…, KeyValueMapper)

KAFKA-8611 / Added integration tests

ae73bee

KAFKA-8611 / Added OptimizableRepartitionNodeBuilder#withStreamsParti…

56887ee

…tioner

KAFKA-8611 / update InternalStreamsBuilder#maybeOptimizeRepartitionOp…

c3e4bd1

…erations method with InteralTopicConfig and StreamPartitioner parameters

KAFKA-8611 / update KStream#repartition implementations with StreamPa…

6fac93f

…rtitioner

KAFKA-8611 / Add require not null checks

0c369e1

KAFKA-8611 / Minor formatting fixes

f641eb9

KAFKA-8611 / Added more tests

ef4fa8d

KAFKA-8611 / Add small doc about InternalTopologyBuilder#internalTopi…

595736d

…cNamesWithProperties; Moved InternalTopicProperties class to dedicated file

KAFKA-8611 / Minor refactoring

a4a661a

KAFKA-8611 / pass repartitionRequired as false to downstream topology…

2ee3c31

… after repartition operation is performed

mjsax added the streams label Aug 6, 2019

KAFKA-8611 / Fix checkstyle violations

33050f2

KAFKA-8611 / Make RepartitionedInternal package private

2dd9bbe

mjsax reviewed Aug 7, 2019

View reviewed changes

vvcephei self-requested a review March 14, 2020 18:09

lkokhreidze added 3 commits March 18, 2020 10:30

KAFKA-8611 / sync with trunk

6448ea9

KAFKA-8611 / sync with trunk

ebeb327

KAFKA-8611 / sync with trunk

162b554

vvcephei reviewed Mar 24, 2020

View reviewed changes

Comment thread streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java Outdated

lkokhreidze added 3 commits March 27, 2020 08:56

KAFKA-8611 / remove KeyValueMapper overloads for KStream#repartition …

5b4e415

…operation

KAFKA-8611 / update javadocs

5054ef1

KAFKA-8611 / sync with trunk

6c81be8

vvcephei approved these changes Mar 31, 2020

View reviewed changes

Comment thread streams/src/test/java/org/apache/kafka/streams/kstream/internals/graph/StreamsGraphTest.java Outdated

mjsax reviewed Apr 1, 2020

View reviewed changes

Comment thread streams/src/main/java/org/apache/kafka/streams/kstream/internals/KStreamImpl.java Outdated

KAFKA-8611 / revert accidental code change

cdbd24c

Co-Authored-By: John Roesler <vvcephei@users.noreply.github.com>

andrewchoi5 approved these changes Apr 3, 2020

View reviewed changes

lkokhreidze added 2 commits April 3, 2020 09:57

KAFKA-8611 / CR comments

591e8ff

KAFKA-8611 / CR comments: part 2

9fdc7ec

KAFKA-8611 / Add optimize as test parameter

f2bcdfe

mjsax reviewed Apr 9, 2020

View reviewed changes

mjsax merged commit e131a99 into apache:trunk Apr 9, 2020

lkokhreidze mentioned this pull request Apr 12, 2020

KAFKA-8611 / Refactor KStreamRepartitionIntegrationTest #8470

Merged

3 tasks

mjsax added the kip Requires or implements a KIP label Jun 12, 2020

lkokhreidze mentioned this pull request Jun 1, 2021

KAFKA-6718 / A Rack awareness for Kafka Streams #10785

Closed

3 tasks

Conversation

lkokhreidze commented Aug 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

KIP-221: Enhance DSL with Connecting Topic Creation and Repartition Hint

Description

Notable Changes

Committer Checklist (excluded from commit message)

Uh oh!

mjsax commented Aug 6, 2019

Uh oh!

lkokhreidze commented Aug 7, 2019

Uh oh!

mjsax left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lkokhreidze commented Aug 7, 2019

Uh oh!

lkokhreidze commented Aug 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjsax commented Aug 7, 2019

Uh oh!

ableegoldman commented Aug 7, 2019

Uh oh!

mjsax commented Aug 7, 2019

Uh oh!

ableegoldman commented Aug 8, 2019

Uh oh!

mjsax commented Aug 8, 2019

Uh oh!

ableegoldman commented Aug 8, 2019

Uh oh!

lkokhreidze commented Aug 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lkokhreidze commented Mar 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vvcephei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vvcephei commented Mar 27, 2020

Uh oh!

vvcephei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mjsax left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lkokhreidze commented Aug 6, 2019 •

edited

Loading

lkokhreidze commented Aug 7, 2019 •

edited

Loading

lkokhreidze commented Aug 8, 2019 •

edited

Loading

lkokhreidze commented Mar 15, 2020 •

edited

Loading

lkokhreidze commented Apr 7, 2020 •

edited

Loading