KAFKA-5488: Add type-safe split() operator by inponomarev · Pull Request #9107 · apache/kafka

inponomarev · 2020-07-30T22:30:31Z

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

inponomarev · 2020-07-30T22:34:52Z

⚠️ Two differences with KIP specification, discussion needed⚠️

Instead of multiple overloaded variants of Branched.with we now have Branched.withFunction and Branched.withConsumer. This is because of compiler warnings about overloading (Function and Consumer being indistinguishable when supplied as lambdas)
'Fully covariant' signatures like Consumer<? super KStream<? super K, ? super V>> don't work as expected. Used Consumer<? super KStream<K, V>> instead

vvcephei · 2020-07-31T20:54:23Z

Hey @mjsax , do you have time to give this a first pass?

mjsax · 2020-08-04T00:17:40Z

I'll put it into my backlog. But I am the main reviewer for two other KIPs (216 and 466) that I should review first as they got approve earlier and PRs are open for longer already.

mjsax

Thanks for the PR @inponomarev and sorry for the long wait for a review

Many comments are about JavaDocs, so it's mostly small suggestions. A few comments about the code structure are there, too.

mjsax · 2020-12-23T01:21:33Z

nit supposed -> used ?

this function -> the provided function

Actually, I am wondering if we should allow to pass in null? Thoughts?

See my reply below, where we discuss null consumers: #9107 (comment)

(in short: I agree, I think we shouldn't)

mjsax · 2020-12-23T01:24:46Z

If a non-null branch is provided here? (branch -> consumer?)

But I would propose to simplify it, and just use By default (as passing in a non-null consumer should be the "default" usage).

As above, should we even allow a not-null consumer?

see #9107 (comment)

mjsax · 2020-12-23T02:32:48Z

Should we call branch((k,v) -> true, branched) instead to just add a predicate and branch? This way, the default branch is nothing special at runtime any longer.

The default branch should have index 0 (so it will be stable when branches are added or removed), but it should always be checked after all other branches. And when we come to the default branch during message processing, there is actually no need in dereferncing a predicate and calling test... that's why I treat the default branch differently.

I guess it's fine both ways. -- The point about the index is a good one that I missed. But would still be doable I guess.

I don't think that there would be any measurable runtime difference if you use a "default predicate" (what we also do in the current implementation) -- the code is just a little "cleaner" as we don't need an extra "if" at the end -- but it's also not the end of the world as the process method is fairly simply anyway.

mjsax · 2020-12-23T02:36:08Z

result -> outputBranches ?

mjsax · 2020-12-23T02:39:51Z

I am wondering if it might be better to move this code into a build method that would be called within defaultBranch() / noDefaultBranch() ?

The pattern to pass in empty list that we modify later seems undesirable, and we should first build the list, and than pass them in -- otherwise, we make assumptions how ProcessorParameters and ProcessorGraphNode might be implemented what we should avoid.

I clearly remember that something made me to write it this way, but I have to recall...

Would love to learn about it. -- In general, it's easier to follow the same pattern throughout the code base. It easier to reason about the code that way, and also easier for people to learn the code base.

Just saw your other comment: #9107 (comment)

inponomarev · 2020-12-24T18:02:45Z

Hi @mjsax, thanks for your thorough revew! I have fixed everything according to your comments, except:

KAFKA-5488: Add type-safe split() operator #9107 (comment) -- just want to be sure!
KAFKA-5488: Add type-safe split() operator #9107 (comment) -- see my explanation
KAFKA-5488: Add type-safe split() operator #9107 (comment) -- I agree that this is ugly but there was a good reason why I made it this way. Just give me a day and I'll return with comments (or will fix it).

inponomarev · 2020-12-25T15:10:20Z

OK @mjsax concerning #9107 (comment) I remembered why it was implemented this way!

The problem is that it is not necessary to invoke defaultBranch() / noDefaultBranch() when we use consumers, like in this simple example (I just added a new unit test for this case):

source.split()
    .branch(isCoffee, Branched.withConsumer(issuer::setCoffeePurchases))
    .branch(isElectronics, Branched.withConsumer(issuer::setElectronicsPurchases));

mjsax · 2020-12-29T02:22:08Z

About the original comment: #9107 (comment)

I am fine with those changes.

About #9107 (comment) -- that is a good point. Thanks for explaining. I guess it's a "philosophical" question if we want to allow this pattern though, or if we want to require that either defaultBranch() or noDefaultBranch() is called? -- I did consider calling branch() like a builder pattern, and the final [noD|d]efaultBranch call is basically build()?

Curious to hear what @vvcephei thinks about it.

mjsax · 2020-12-29T02:41:26Z

@inponomarev -- Can you also update the docs for Kafka Streams and the 2.8 upgrade guide in this PR.

inponomarev · 2020-12-29T14:34:10Z

Can you also update the docs for Kafka Streams and the 2.8 upgrade guide in this PR.

The documentation had been already updated (see changes in docs/streams/developer-guide/dsl-api.html)

I also modified docs/upgrade.html -- should I add something more here, like code examples?

Another question: CI checks fail because of usage of deprecated branch method in streams/streams-scala/src/main/scala/org/apache/kafka/streams/scala/kstream/KStream.scala. Since I'm not a Scala user, I have no idea of what should be done here.

Most likely we should deprecate the branch method and add a wrapper for the new split method, but I don't know how to do this correctly.

mjsax · 2020-12-29T21:09:57Z

To make the build pass, for now, it should be sufficient to just deprecate the method via @nowarn("cat=deprecation") -- But it seems we should update the Scala API, too. If you cannot handle it, we can do a follow up PR.

It seems, we need to add split() to KStream.scala and introduce a new BranchedKStream.scala and Branch.scala classes and maybe some translations from Java Consumer/Function to their Scala variants. But I also don't really now Scala; @vvcephei should nkow better.

inponomarev · 2020-12-30T09:04:27Z

As far as I can judge from the name, @nowarn is not for deprecation, but rather for a warning suppression 🤔 apparently we need to mirror the changes in Java KStream interface here. Never wrote anything in Scala before. OK, it's better to wait for @vvcephei !

mjsax · 2020-12-30T19:22:46Z

Maybe I miss-understood you question. I thought the build fails because we are using some deprecated method -- for this case, we can make the build pass by suppressing the warning. If you want to deprecate a method in the Scala API, you just add @deprecated similar to Java. -- I guess it makes sense to also deprecate the KStream.scala#branch() method, but suppressing the warning should also make the build pass and we can deprecate this method when we add the new split() method.

vvcephei · 2021-01-08T16:31:58Z

Hey @inponomarev and @mjsax ! I'm glad to see this is moving along.

Regarding #9107 (comment) :

My understanding was that defaultBranch/noDefaultBranch were the terminal operators, in that they close out the context of a BranchedKStream, and you can't add any more branches after one of those methods.

But also, the whole branching construct is an incremental builder like the rest of the Kafka Streams API. In other words, just like this is a valid program:

builder.stream("input")
       .filter(myPredicate)

so would be Ivan's example:

builder.split()
       .branch("myBranch", ...)

What I mean by "incremental builder" is that each time you call a chained method in the DSL, it immediately adds nodes to the program, as opposed to having to call any kind of build() method to actually add stuff to the program. I think there are pros and cons to this design, but it seems more in line with the rest of the DSL not to require the terminal operators.

inponomarev · 2021-01-11T20:57:12Z

Hi @vvcephei , thank you for your comment. There's another question that we were unable to solve without you -- see #9107 (comment) from the words 'CI checks fail' and further discussion. Can you clarify, what's expected from KStream.scala ?

mjsax · 2021-01-13T00:50:13Z

@vvcephei -- hope you are also ok with the proposed changes to the KIP as per the PR description on top: #9107 (comment)

inponomarev · 2021-01-13T18:45:09Z

@vvcephei @mjsax I added full Scala wrapper for the new API: split method, BranchedKStream and Branched. Also added Scala unit tests that verify main use cases

vvcephei · 2021-01-16T21:44:46Z

Hey @inponomarev , I just took a look at the Scala API. Thanks for adding that!

I figured it'd be just easier to push a few tweaks than to describe what needs to be done.

You asked me offline if we could avoid the overloads in Branched, and indeed, we can with a default argument of null for the name.
The Scala test was inadvertently using the Java Branched class, but you meant to test the Scala one.
I happened to notice a small typo: The opposite of "prefix" is "suffix", not "postfix"
I also noticed that your files all had CRLF (windows) return characters, so I fixed them. You might want to configure git for autocrlf (git config --global core.autocrlf true) (see https://www.git-scm.com/book/en/v2/Customizing-Git-Git-Configuration)

These are all separate commits above, so you can scrutinize each one. This PR is your work, so feel free to protest any of my suggestions.

inponomarev · 2021-01-17T12:23:15Z

Hi @vvcephei thank you for your commits! Is everything else OK, especially #9107 (comment)?

@mjsax I pushed small fixes to Javadoc/Scaladoc, and AFAICS only tests not related to the changes are failing.

vvcephei · 2021-01-19T15:30:16Z

Thanks @inponomarev ,

Ah, I didn't notice that method signature name. I actually prefer it this way :)

Thanks also for pointing out the covariance change. This is also fine. Java's type system only contains a partial implementation of variance, so we do best we can.

Did you already update the KIP? If not, please do.

I'm +1 on this PR.

inponomarev · 2021-01-19T15:49:28Z

Thank you @vvcephei, I have updated the KIP and now it reflects the actual implementation.

I just wasn't sure if it's ok to edit specification text after it has been formally approved :-)

* consumers cannot be null * typo: "function"->"consumer"

inponomarev · 2021-02-02T19:42:50Z

Hi @mjsax , I have rebased and manually merged conflicts, and also removed FunctionConverters

JDK8 build still fails, but this time much later -- something related to integration testing

mjsax · 2021-02-03T02:04:19Z

Wait failure do you see exactly? Seem Jenkins in still running.

inponomarev · 2021-02-03T09:54:47Z

I was talking about build 17 (triggered by Commit db573f5, see https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-9107/)

Where did build 18 come from, why did it take 8 hours and then timed out -- I can't understand 😃

mjsax · 2021-02-03T18:21:17Z

Ah I see -- well, we do have from flaky tests, so nothing to worry about I guess. The last run timed out, so I retriggered the build. However, I could build it locally with Java8/Scale 2.12 and so I guess we can merge. Just waiting for @vvcephei to take a quick look at the last Scala commit.

vvcephei · 2021-02-04T19:26:58Z

Thanks, all, that Scala fix looks perfect to me.

mjsax · 2021-02-05T00:24:15Z

Merged to trunk.

Congrats for getting this into the 2.8.0 release @inponomarev -- great work!

vvcephei added the streams label Jul 31, 2020

vvcephei requested a review from mjsax July 31, 2020 20:53

inponomarev force-pushed the kip-418 branch from 369e2ee to 8a77301 Compare December 20, 2020 15:53

mjsax changed the title ~~KAFKA-5488: KIP-418 implementation~~ KAFKA-5488: Add type save branch() operator Dec 23, 2020

mjsax mentioned this pull request Dec 23, 2020

KAFKA-8651: Add predicate map to #branch #7068

Closed

3 tasks

mjsax reviewed Dec 23, 2020

View reviewed changes

mjsax changed the title ~~KAFKA-5488: Add type save branch() operator~~ KAFKA-5488: Add type-safe branch() operator Dec 29, 2020

mjsax reviewed Dec 29, 2020

View reviewed changes

Comment thread streams/src/main/java/org/apache/kafka/streams/kstream/internals/BranchedInternal.java Outdated

inponomarev changed the title ~~KAFKA-5488: Add type-safe branch() operator~~ KAFKA-5488: Add type-safe split() operator Dec 29, 2020

mjsax added the kip Requires or implements a KIP label Jan 7, 2021

inponomarev force-pushed the kip-418 branch from ab21d90 to 0759478 Compare January 13, 2021 19:49

inponomarev and others added 19 commits February 2, 2021 14:32

update reference in JavaDoc

876421a

remove compiler warnings

73c0349

implementation and test

74b21f9

more tests

c60a1a0

Documentation update

15dd152

code review fixes

106ca5d

requireNonNull for chain function and consumer

184728d

test for branching with no defaultBranch()/noDefaultBranch()

89a79e3

changelog, naming convention for BranchedInternal

cb5f1a6

Scala API wrapper

9179a33

spotlessApply

ad6d6cc

Correct the test to use the Scala Branched, not the Java one.

bb684e0

Use default arguments instead of method overloads

4780f52

Use "suffix" instead of "postfix"

f4fbe74

convert CRLF -> LF

aa3309e

fixed JavaDoc/Scaladoc

c0540b1

* consumers cannot be null * typo: "function"->"consumer"

Rewrite StreamsGraphTest using split()

68a830a

Javadoc/Scaladoc and Developer Guide updates

a82e991

unused Named import

a99018f

inponomarev force-pushed the kip-418 branch from f56be68 to a99018f Compare February 2, 2021 15:04

remove FunctionConverters

db573f5

vvcephei approved these changes Feb 4, 2021

View reviewed changes

mjsax merged commit 5552da3 into apache:trunk Feb 5, 2021

showuon mentioned this pull request Aug 10, 2021

MINOR: update the KStream#branch(split) doc and java doc and tests #11195

Merged

3 tasks

Conversation

inponomarev commented Jul 30, 2020

Committer Checklist (excluded from commit message)

Uh oh!

inponomarev commented Jul 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vvcephei commented Jul 31, 2020

Uh oh!

mjsax commented Aug 4, 2020

Uh oh!

mjsax left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

inponomarev Dec 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

inponomarev commented Dec 24, 2020

Uh oh!

inponomarev commented Dec 25, 2020

Uh oh!

mjsax commented Dec 29, 2020

Uh oh!

mjsax commented Dec 29, 2020

Uh oh!

Uh oh!

inponomarev commented Dec 29, 2020

Uh oh!

mjsax commented Dec 29, 2020

Uh oh!

inponomarev commented Dec 30, 2020

Uh oh!

mjsax commented Dec 30, 2020

Uh oh!

vvcephei commented Jan 8, 2021

Uh oh!

inponomarev commented Jan 11, 2021

Uh oh!

mjsax commented Jan 13, 2021

Uh oh!

inponomarev commented Jan 13, 2021

Uh oh!

vvcephei commented Jan 16, 2021

Uh oh!

inponomarev commented Jan 17, 2021

Uh oh!

vvcephei commented Jan 19, 2021

Uh oh!

inponomarev commented Jul 30, 2020 •

edited

Loading

inponomarev Dec 24, 2020 •

edited

Loading

mjsax commented Feb 3, 2021 •

edited

Loading

inponomarev commented Feb 3, 2021 •

edited

Loading