KAFKA-3705: Added a foreignKeyJoin implementation for KTable. by bellemare · Pull Request #5527 · apache/kafka

bellemare · 2018-08-17T18:27:07Z

https://issues.apache.org/jira/browse/KAFKA-3705

Foreign Key Join:

Allows for a KTable to map its value to a given foreign key and join on another KTable keyed on that foreign key. Applies the joiner, then returns the tuples keyed on the original key. This supports updates from both sides of the join.

Design Philosophy:

The intent of this design was to build a totally encapsulated function that operates very similarly to the regular join function. No further work is required by the user to obtain their foreignKeyJoin results after calling the function. That being said, there is increased cost in some of the topology components, especially due to resolving out-of-order arrival due to foreign key changes. I would appreciate any and all feedback on this approach, as my understanding of the Kafka Streams DSL is to provide higher level functionality without requiring the users to know exactly what's going on under the hood.

Some points of note:

Requires an additional materialized State Store for the prefixScanning of the repartitioned CombinedKey events.
ReadOnlyKeyValueStore interface was modified to contain prefixScan. This requires that all implementations support this, but follows an existing precedent where some store functions are already stubbed out with exceptions.
Currently limited to Inner Join (can do more join logic in future - just limiting the focus of this KIP).
Application Reset does not seem to delete the new internal topics that I have added. (only tested with Kafka 1.0).
Only works with identical number of input partitions at the moment, though it may be possible to get it working with KTables of varying input partition count.

Testing:

Testing is covered by a two integration tests that exercises the foreign key join.
The first test exercises the out-of-order resolution and partitioning strategies by running three streams instances on three partitions. This demonstrates the scalability of the proposed solution.

important The second test (KTableKTableForeignKeyInnerJoinMultiIntegrationTest) attempts to join using foreign key twice. This results in a NullPointerException regarding a missing task, and must be resolved before committing this.

Kaiserchen · 2018-08-17T18:41:38Z

+                                                final ValueMapper<V, KO> keyExtractor,
+                                                final ValueJoiner<V, VO, VR> joiner,
+                                                final Materialized<K, VR, KeyValueStore<Bytes, byte[]>> materialized,
+                                                final Serde<K> thisKeySerde,


Serialized wrapper?

Kaiserchen · 2018-08-17T19:08:05Z

+
+    @Override
+    public CombinedKey<KF, KP> deserialize(final String topic, final byte[] data) {
+        //{4-byte foreignKeyLength}{foreignKeySerialized}{4-bytePrimaryKeyLength}{primaryKeySerialized}


can skip the second length, its known anyway.

True, I could just do it as the remainder. Thanks

Kaiserchen · 2018-08-17T19:08:55Z

+        //{4-byte foreignKeyLength}{foreignKeySerialized}{4-bytePrimaryKeyLength}{primaryKeySerialized}
+
+        final byte[] fkCount = Arrays.copyOfRange(data, 0, 4);
+        final int foreignKeyLength = fourBytesToInt(fkCount);


treating the whole thing as Buffer?

I don't understand your question, can you elaborate?

I think the whole section could look nicer if you would start with ByteBuffer.allocate(totallength).asIntBuffer(keylength).asbyteBuffer.put(key).put(key)...
something

Kaiserchen · 2018-08-17T19:17:47Z

+
+
+    //TODO - Can reduce some of the parameters, but < 13 is not possible at the moment.
+    //Would likely need to split into two graphNodes - ie: foreignKeyJoinNode and foreignKeyJoinOrderResolutionNode.


I think the step here an optimizer could potentially exploit is the repartitioning. So one could try to only factor out the repartitioning

I'll have to look more into the optimizer. TBH I built this originally in 1.0 and just did a functional port, not necessarily a best practices one. Thanks

I would not recommend to spend to much energy. At the moment I really don't expect the optimizer to be able to exploit any of this. Probably also not in the future. Was just a though popping into my head

Kaiserchen · 2018-08-17T19:20:07Z

+
+    @Override
+    public KeyValueIterator<K, V> prefixScan(final K prefix) {
+        return this.inner.prefixScan(prefix);


probably need to wrap into
DelegatingPeekingKeyValueIterator

bellemare · 2018-08-20T20:37:52Z

+        final Materialized foreignMaterialized = Materialized.<CombinedKey<KO, K>, V, KeyValueStore<Bytes, byte[]>>as(prefixScannableDBRef.name())
+                //Need all values to be immediately available in the rocksDB store.
+                //No easy way to flush cache prior to prefixScan, so caching is disabled on this store.
+                .withCachingDisabled()


I notice that in 2.x that I may be able to rework this to allow for enabled cache using a prefixScan function similar to ThreadCache.range. I will have to look into this a bit more, though I don't think it will affect performance much since I anticipate RocksDB prefixScan to take the longest overall.

Might be, its one of the places I got stuck once. From experience I can tell that its working sufficiently well w/o cache. I think rocks does a pretty good job in not seeking around to randomly on the disk

I'll leave it out for now. If someone else thinks otherwise, they can speak up or it can be done in a subsequent PR.

Kaiserchen · 2018-09-03T12:23:37Z

+                        final byte[] offset = longSerializer.serialize(null, context().offset());
+                        context().headers().add(KTableRepartitionerProcessorSupplier.this.offset, offset);
+                        context().headers().add(propagate, falseByteArray);
+                        context().forward(combinedOldKey, change.newValue);


would need to forward null here?

Yes, cleaner to do so. The value is not relevant. I have fixed that and added a clarification comment (I can remove all comments if required before final submission).

Kaiserchen · 2018-09-03T12:30:21Z

We should just remove all the final keywords, I don't think they add any benefit?

bellemare · 2018-09-03T15:04:25Z

I had to add all the final keywords to pass the linting check - IIRC, my first run had dozens of linting errors preventing compilation.

mjsax · 2019-03-07T23:47:51Z

@bellemare What is your JIRA ID? Would like to assign the ticket to you.

bellemare · 2019-03-08T14:07:49Z

@mjsax JIRA ID is abellemare

sachabest · 2019-04-09T17:05:52Z

Hi, sorry to intrude on a potentially stale PR, but is this functionality still in development? Would be extraordinarily useful for joining two changelog-like entities.

pgwhalen · 2019-04-09T17:23:26Z

I sure hope so, my team is looking forward to it as well! Given that the KIP was accepted a few weeks ago, I think it's safe to say it will make it in fairly soon. I would definitely pick up development if @bellemare can't continue.

bellemare · 2019-04-09T18:06:47Z

Hey folks - I'm still trying to get the code put together and finalize some of the changes that were outlined in the KIP. Stay tuned!

bellemare · 2019-04-12T14:59:37Z

Hi All - I'm at a point where I need some feedback on a couple of things:

The organization of the code
The organization of the graph nodes in KTableImpl
Insights into why I am getting NullPointerException in KTableKTableForeignKeyInnerJoinMultiIntegrationTest (though not consistently). I believe this is a misunderstanding on my part as to how partitions are co-partitioned, but there may be more to it that I am missing. Basically, it seems that depending on how the tasks and partitions are assigned, we either get a java.lang.NullPointerException: Task was unexpectedly missing for partition table1-1 or it we don't.
This must be resolved if we wish to have flexible partition counts for joining, ie: FK join a topic with 3 partitions and a topic with 7 partitions
Anything else.

Feedback is very much appreciated, as this is the first PR I've put up against Kafka and I'm sure I've violated a number of things.

adaniline-traderev · 2019-04-16T16:34:03Z

+    @Override
+    public byte[] serialize(String topic, SubscriptionResponseWrapper<V> data) {
+        //{16-bytes Hash}{n-bytes serialized data}
+        byte[] serializedData = serializer.serialize(null, data.getForeignValue());


Why is the topic passed as null? It causes issues with GenericRecord AVRO serializer, since it tries to register schemas under "null-value" subject, and the schema registry responds with "version not compatible" error

The issue is actually with the Confluent implementation of the SerDe, as they incorrectly attempt to register when null topics are passed in. Read confluentinc/schema-registry#1061 for more details. That being said, it has been extremely quiet in that git repo, I am not sure how much effort Confluent puts into supporting work on that product.

If this does not gets fixed either way, this PR will be unusable for most of the practical use cases. What is the downside of passing the topic name to the serializer? I tried it, and it seemed to work as expected.
Is there a workaround if confluentinc/schema-registry#1061 is not fixed?

I think the main issue would be the large amount of internal topic schemas registered to the schema registry. This, combined with any breaking changes to the schema (due to normal business requirement changes) would make it such that you are now needing to manually delete schema registry entries made to internal topics. This is a workflow that I do not believe was ever intended to be done with the Confluent Serde.

As it stands right now, there are allegedly other functionalities that require null serialization ("There are several places in Streams where we need to serialize a value for purposes other than sending it to a topic (KTableSuppressProcessor comes to mind), and using null for the topic is the convention we have."). These too will not work with the confluent Serde.

If they do not fix it, then the next best thing to do would be wrap it in your own implementation and intercept null-topic values to avoid registration. I do not see why it wouldn't be fixed since the current behaviour of registering "null-topic" is fundamentally useless.

Anyways, with all that being said, for this particular line I can certainly pass in the topic since it's fairly well-defined. If you wish to have your internal topics registered to the schema registry, no big deal. For other parts, such as

kafka/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/KTableRepartitionerProcessorSupplier.java

Line 67 in 18a16b8

Murmur3.hash128(valueSerializer.serialize(null, change.newValue)));

, there is no solution using the current Confluent Serde.

Confluent serde needs a schema id, and looks like it is not stored in GenericData.Record instance - it may not be trivial to fix confluentinc/schema-registry#1061...

I discussed this with some other people, and somebody mentioned, that for the value we serialize, this value is actually also store in RocksDB (input KTable). We also know, that the corresponding byte[] are written into the store changelog topic. Hence, instead of using the repartition topic, using the changelog topic should be a better option, as it does not leak anything into SR (or whatever other Serdes might do with the topic name).

Even if there is not changelog topic for the input KTable (we do some optimizations and sometimes don't create one (eg, the store might actual be a logical view and is not materialized). But even for this case, using the changelog topic name seems to be save.

context().topic() gives the repartition topic name in the serializer, which is what I want. In the processor sections, where I use null, context().topic() gives me the input-topic name for the KTable... which is also fine, since the serializer will check against the input topic schema, which must be valid by definition of the data being within the topic... so I suspect this issue can be laid to rest, in line with adaniline-traderev's suggestion.

This removes any requirement for the upstream serializer to have to do special work for null values.

@mjsax I'm not sure I can fully follow the suggestion of using changelog topic v.s. the repartition topic here: are you suggesting to do it universally or just for this case? If it is the latter case, I felt it a bit awkward due to inconsistency with other source KTable cases where we will just follow the SourceNode / RecordDeserializer path to deserialize using the source topic; it if it the first case, that also has some drawbacks since with today's topology generation not all source KTables will need to be materialized to a store and hence not necessary having a changelog topic.

I still feel that using the source topic name (and i.e. in this case, the repartition topic) admittedly exposed to SR but is philosophically the right thing to do, and we should consider fixing it on serde APIs in the future. WDYT

@guozhangwang I was just talking about the foreign-key case (not sure why you thought it might be anything else?). My understanding is the following: The contract is that we should pass a topic name into the serializer of which we want to write the data into. This contract breaks if we pass in the repartition topic name, because we write something different into the repartition topic.

You are right that the changelog topic might not exist, however, my personal take is, that registering for a non-existing topic, is a smaller violation of the contract that passing in the "wrong" repartition topic name. Note, that the changelog topic name is conceptually the "right" topic name. However, this case would not happen very often anyway (compare examples below).

Your comment trigger one more thought: the optimization framework could actually check for different cases, and if there is an upstream topic (either changelog or source topic that has the same schema), we could actually use this name.

Some examples (does not cover all cases):

builder.table("table-topic").foreignKeyJoin(...)

For this case we need to materialize the base table (that is also the join-table), and the schema is registered on table-topic already, so we can pass in table-topic to avoid leaking anything.

builder.table("table-topic").filter(...).foreignKeyJoin(...)

For this case we materialize the derived table from the filter() and we get a proper filter-changelog-topic and we can pass this one.

builder.stream("stream-topic").groupBy().aggregate().foreignKeyJoin(...)

For this case, the agg result KTable is materialized and we can pass the agg-changelog-topic as name.

builder.stream("stream-topic").groupBy().aggregate().filter().foreignKeyJoin(...)

For this case, the agg result KTable is materialized and we can pass the agg-changelog-topic as name, because the filter() does not change the schema. Thus, even if the join-input KTable is not materialized, we can avoid to leak anything by "borrowing" the upstream changelog topic name of the filter input KTable.

builder.table("table-topic").mapValues(...).foreignKeyJoin(...)

For this case, we need to materialize the result of mapValues() and get a proper changelog topic for the join-input table.

builder.table("table-topic", Materialized.as("foo")).mapValues(...).foreignKeyJoin(...)

This might be a weird case, for which the base table is materialized, while the input join-table would not be materialized, and also the type changes via mapValues(). Hence, the table-topic schema is not the same as the join schema and we also don't have a changelog topic for the join-input KTable. We still use the changelog-topic name of the non-existent changelog topic (of the mapValues() result KTable).

As you can see, we can cover a large scope of cases for which we don't leak anything and can always use a topic name that contains data corresponding to the schema. Does this explain my thoughts?

I understand your reasoning now, but still I felt Streams should not fix it trying to piggy-back on another topic that happens to be of the same schema that this serde is used for; or rather, I'd prefer to use a non-exist dummy topic than an existing topic if we do not like repartition topics (again, I agree that repartition topic is not ideal, since we are, in fact, not sending the bytes serialized in that way to the topic).

vvcephei · 2019-04-16T20:31:09Z

Hi @bellemare ,

Thanks for your PR! I'll review this as soon as I get the chance, and pay particular attention to the points you called out.

-John

vvcephei · 2019-10-03T15:27:10Z

Thanks, @bellemare , yeah, it would be nice to see Jenkins give us at least 1 green build.

vvcephei · 2019-10-03T19:10:47Z

The java 8 build had just one failure, which I think is ok:

org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest > testAddAndRemoveWorker FAILED

Likewise with the java 11+scala 2.13 build:

03:35:23.161 org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest > testAddAndRemoveWorker STARTED
03:37:02.849 org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest.testAddAndRemoveWorker failed, log available in /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.13/connect/runtime/build/reports/testOutput/org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest.testAddAndRemoveWorker.test.stdout
03:37:02.849 
03:37:02.849 org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest > testAddAndRemoveWorker FAILED
03:37:02.849     org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Could not execute PUT request
03:37:02.849         at org.apache.kafka.connect.util.clusters.EmbeddedConnectCluster.configureConnector(EmbeddedConnectCluster.java:281)
03:37:02.849         at org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest.testAddAndRemoveWorker(ConnectWorkerIntegrationTest.java:117)

And the java 11+scala 2.12 build:

03:37:47.420 org.apache.kafka.connect.integration.ExampleConnectIntegrationTest > testSourceConnector STARTED
03:39:26.620 org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector failed, log available in /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/connect/runtime/build/reports/testOutput/org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector.test.stdout
03:39:26.620 
03:39:26.620 org.apache.kafka.connect.integration.ExampleConnectIntegrationTest > testSourceConnector FAILED
03:39:26.620     org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Could not execute PUT request
03:39:26.621         at org.apache.kafka.connect.util.clusters.EmbeddedConnectCluster.configureConnector(EmbeddedConnectCluster.java:281)
03:39:26.621         at org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector(ExampleConnectIntegrationTest.java:177)

bellemare · 2019-10-03T19:13:17Z

(JDK 11 & Scala 2.13) and (JDK 8, Scala 2.11) both failed on the same test, though it appears totally unrelated:

15:05:30 org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest > testAddAndRemoveWorker STARTED
15:07:10 org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest.testAddAndRemoveWorker failed, log available in /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.13/connect/runtime/build/reports/testOutput/org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest.testAddAndRemoveWorker.test.stdout
15:07:10 
15:07:10 org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest > testAddAndRemoveWorker FAILED
15:07:10     org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Could not execute PUT request
15:07:10         at org.apache.kafka.connect.util.clusters.EmbeddedConnectCluster.configureConnector(EmbeddedConnectCluster.java:281)
15:07:10         at

vvcephei · 2019-10-03T19:17:25Z

Hmm, It's almost like we're all just sitting around watching these builds...

I agree, I don't think that failure is related, and I also don't think it's worthwhile to keep running the tests to see if that broken test passes next time.

vvcephei · 2019-10-03T20:10:41Z

Note, the failing test is known to be broken: https://issues.apache.org/jira/browse/KAFKA-8690

bbejeck · 2019-10-03T22:57:13Z

Test failures unrelated and local build passes, so merging this.

bbejeck · 2019-10-03T23:00:14Z

Merged #5527 into trunk.

bbejeck · 2019-10-03T23:00:36Z

Thanks @bellemare for the hard work and perseverance to get this done!

vvcephei · 2019-10-04T00:43:45Z

Yeah, congratulations, @bellemare for this awesome contribution!

pgwhalen · 2019-10-04T00:51:56Z

Kudos! I’ve been following this feature for a year and am extremely excited to see it make it in.

…

On Oct 3, 2019, at 8:44 PM, John Roesler ***@***.***> wrote: Yeah, congratulations, @bellemare for this awesome contribution! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

guozhangwang · 2019-10-04T01:18:23Z

Congratulations @bellemare , it's been a long KIP and journey :)

mjsax · 2019-10-04T04:26:02Z

Congratulations @bellemare!!! And thanks for all the hard work!!!

Also thanks to @Kaiserchen for his initial proposal and the many hours you invested in this KIP!

Check out https://twitter.com/kafkastreams/status/1179974460167745536

Fixes compile error introduced by merging apache#5527

bellemare · 2019-10-04T13:26:53Z

Thanks for all the help everyone, especially from @vvcephei. I couldn't have done it without you all. And thanks to Jan too for getting it started so long ago.

mjsax · 2019-10-04T16:35:51Z

@bellemare Do we need to update the docs for the new feature? We should at least mention in the upgrade guide.

Would you like to update the wiki, too: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Join+Semantics

guozhangwang · 2019-10-04T17:04:56Z

Besides the upgrade guide, I think we can also update the developer-guide on dsl section: https://kafka.apache.org/23/documentation/streams/developer-guide/dsl-api.html

bellemare · 2019-10-08T18:43:37Z

@mjsax @guozhangwang I'll take a look at updating them.

What/where is the upgrade guide?

mjsax · 2019-10-09T05:46:24Z

I would add maybe one bullet point to https://github.com/apache/kafka/blob/trunk/docs/upgrade.html (as "notable changes") and also list it in https://github.com/apache/kafka/blob/trunk/docs/streams/upgrade-guide.html

Not sure if both files have already a section for 2.4 release -- if not, just add one :)

thebearmayor · 2019-10-11T18:07:20Z

@vvcephei @bellemare Do you know of any cases that would lead to duplicate updates being emitted by these joins?

We first noticed this because we saw "Detected out-of-order KTable update" messages for topics emitted from some of these left joins. Looking at the updates, they were identical except for the timestamps. But it happens intermittently -- and not consistently even with the same input data. We switched to inner joins, and still saw them. Sometimes they updates had the same timestamp, but when they had different timestamps, one would have the timestamp from the corresponding left-table update, and the other would have the timestamp from the right-table update.

So far I don't think this is causing bad data or any real problems. Just wanted to check if you had seen it before I keep digging.

Thanks!

guozhangwang · 2019-10-14T16:08:30Z

@thebearmayor I think this is fine: as we've discussed in KIP duplicates may happen for the same join-results but they will happen intermittently as you've observed. Since it is for a KTable changelog streams overwriting multiple times would not cause correctness issues. @vvcephei and @bellemare could chime in with more insights.

guozhangwang · 2019-10-14T16:09:34Z

That also makes me thinking, should we make

LOG.warn("Detected out-of-order KTable update for {} at offset {}, partition {}.",

an info level entry than warn to not cause unnecessary panic since it would not cause correctness issues anyways?

mjsax · 2019-10-14T22:39:38Z

Interesting though -- mid to long term, I think we need to allow better handling of out-of-order data for KTables anyway. The main purpose to the log message was for the builder.table() operator -- it required data be ordered because it applied updates in offset-order, not timestamp order. Hence, upstream the single-writer principle should be applied -- the warning makes sense for that case IMHO as it indicated a potential upstream problem.

Maybe we should make the warning more flexible and only turn it on for builder.table() operator but disable it if we use the same processor elsewhere?

thebearmayor · 2019-10-14T23:33:02Z

Hi, @mjsax, just to be clear I am getting the warning from a builder.table() operator, which is reading from a topic which is written to by one of these joins. The join operator itself does not issue this warning.

mjsax · 2019-10-15T08:25:49Z

Thanks for the clarification.

Maybe we introduce more out-of-order records due to the round-trip via two repartition topics than we anticipated... But I am not 100% sure why -- each update to the left-hand side would send one message to the right hand side and should receive zero or one respond messages. An update to the right hand side could send multiple message to the left hand side, however maximum one per key. -- If we compute the join result timestamp as maximum of both input timestamps, I don't see atm why we would introduce much out-of-order data. \cc @bellemare @vvcephei Thoughts?

bellemare · 2019-10-15T16:41:13Z

@thebearmayor Do you have any other information on how common it is, or steps to reproduce?

@mjsax I don't have any ideas off the top of my head, but I will take a look at the code again with this in mind...

thebearmayor · 2019-10-16T02:22:26Z

@bellemare I'll tell you everything I can think of, most of which won't be relevant. I didn't mean to take up more of your time. I'll be offline until next week, and I mean to dig into it more then.

I don't have any way yet to reproduce this in development. I've only seen it running in ec2 with 3 instances against large topics, with 16 partitions. I assume there is some timing component to the issue.

We're not using the absolute latest version of this code. I think we built your branch when it was "feature complete" sometime around mid-August.
We're joining two topics with hundreds of millions of messages each. They're fairly old and compacted, so I don't think there are any records with duplicate keys.
The join itself is simple -- we just keep the right-side value and discard the left-side.
It's not very common -- I would say duplicate records occur a few thousand times in a few million input records. Of those, far fewer have out-of-order timestamps.

Kaiserchen reviewed Aug 17, 2018

View reviewed changes

mjsax added the streams label Aug 17, 2018

bellemare commented Aug 20, 2018

View reviewed changes

Kaiserchen reviewed Sep 3, 2018

View reviewed changes

Comment thread ...ache/kafka/streams/kstream/internals/foreignkeyjoin/BaseForeignKeyJoinProcessorSupplier.java Outdated

bellemare force-pushed the trunk-oneToMany branch from a36f5ee to 3d465f5 Compare September 3, 2018 20:57

bellemare force-pushed the trunk-oneToMany branch from 8719e16 to 56b76fa Compare September 25, 2018 11:15

bellemare force-pushed the trunk-oneToMany branch 5 times, most recently from 3e1f62d to e718610 Compare October 30, 2018 15:08

bellemare force-pushed the trunk-oneToMany branch 3 times, most recently from 8024097 to 3925a3a Compare December 3, 2018 20:06

bellemare changed the title ~~[DO NOT MERGE] - KAFKA-3705 Added a foreignKeyJoin implementation for KTable.~~ [REVIEW NEEDED] - KAFKA-3705 Added a foreignKeyJoin implementation for KTable. Apr 12, 2019

adaniline-traderev reviewed Apr 16, 2019

View reviewed changes

bbejeck merged commit c87fe94 into apache:trunk Oct 3, 2019

cadonna added a commit to cadonna/kafka that referenced this pull request Oct 4, 2019

Fix compile error

ab3c043

Fixes compile error introduced by merging apache#5527

mjsax changed the title ~~[REVIEW NEEDED] - KAFKA-3705 Added a foreignKeyJoin implementation for KTable.~~ KAFKA-3705: Added a foreignKeyJoin implementation for KTable. Oct 4, 2019

chia7712 mentioned this pull request Mar 28, 2020

MINOR: Exclude '**/*Suite.class' from test, unitTest and integrationTest #8381

Merged

3 tasks

mjsax added the kip Requires or implements a KIP label Jun 12, 2020



		//TODO - Can reduce some of the parameters, but < 13 is not possible at the moment.
		//Would likely need to split into two graphNodes - ie: foreignKeyJoinNode and foreignKeyJoinOrderResolutionNode.

Conversation

bellemare commented Aug 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Foreign Key Join:

Design Philosophy:

Testing:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kaiserchen Aug 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kaiserchen commented Sep 3, 2018

Uh oh!

bellemare commented Sep 3, 2018

Uh oh!

mjsax commented Mar 7, 2019

Uh oh!

bellemare commented Mar 8, 2019

Uh oh!

sachabest commented Apr 9, 2019

Uh oh!

pgwhalen commented Apr 9, 2019

Uh oh!

bellemare commented Apr 9, 2019

Uh oh!

bellemare commented Apr 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bellemare Apr 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

bellemare commented Aug 17, 2018 •

edited

Loading

Kaiserchen Aug 17, 2018 •

edited

Loading

bellemare commented Apr 12, 2019 •

edited

Loading

bellemare Apr 16, 2019 •

edited

Loading

vvcephei commented Oct 3, 2019 •

edited

Loading