KAFKA-6813: Remove deprecated APIs in KIP-182, Part I by guozhangwang · Pull Request #4919 · apache/kafka

guozhangwang · 2018-04-24T06:23:42Z

I'm breaking KAFKA-6813 into a couple of "smaller" PRs and this is the first one. It focused on:

Remove deprecated APIs in KStream, KTable, KGroupedStream, KGroupedTable, SessionWindowedKStream, TimeWindowedKStream.
Also found a couple of overlooked bugs while working on them:

2.a) In KTable.filter / mapValues without the additional parameter indicating the materialized stores, originally we will not materialize the store. After KIP-182 we mistakenly diverge the semantics: for KTable.mapValues it is still the case, for KTable.filter we will always materialize.

2.b) In XXStream/Table.reduce/count, we used to try to reuse the serdes since their types are pre-known (for reduce it is the same types for both key / value, for count it is the same types for key, and Long for value). This was somehow lost in the past refactoring.

2.c) We are enforcing to cast a Serde<V> to Serde<VR> for XXStream / Table.aggregate, for which the returned value type is NOT known, such the enforced casting should not be applied and we should require users to provide us the value serde if they believe the default ones are not applicable.

2.d) Whenever we are creating a new MaterializedInternal we are effectively incrementing the suffix index for the store / processor-node names. However in some places this MaterializedInternal is only used for validation, so the resulted processor-node / store suffix is not monotonic.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

…p120-Kip182-deprecated

guozhangwang · 2018-04-25T20:12:05Z

        }
    };

+    final Initializer<V> reduceInitializer = new Initializer<V>() {


This is moved from TimeWindowedStreamImpl, just to be consistent with the other const functions.

guozhangwang · 2018-04-25T20:12:31Z

        final MaterializedInternal<K, V, KeyValueStore<Bytes, byte[]>> materializedInternal
-                = new MaterializedInternal<>(materialized, builder, REDUCE_NAME);
+                = new MaterializedInternal<>(materialized, builder, AGGREGATE_NAME);
+        if (materializedInternal.keySerde() == null) {


This is for 2.b), ditto elsewhere.

guozhangwang · 2018-04-25T20:14:08Z

-                                true);
+        final String name = builder.newProcessorName(FILTER_NAME);
+
+        // only materialize if the state store is queryable


This is for 2.a), ditto for mapValues.

guozhangwang · 2018-04-25T20:14:52Z

-        Objects.requireNonNull(initializer, "initializer can't be null");
-        Objects.requireNonNull(aggregator, "aggregator can't be null");
-        Objects.requireNonNull(sessionMerger, "sessionMerger can't be null");
-        return doAggregate(initializer, aggregator, sessionMerger, (Serde<T>) valSerde);


This is for 2.c), ditto elsewhere.

guozhangwang · 2018-04-25T20:17:31Z

+            materializedInternal.withKeySerde(keySerde);
+        }
        if (materializedInternal.valueSerde() == null) {
-            materialized.withValueSerde(Serdes.Long());


This is for 2.d), ditto elsewhere.

guozhangwang · 2018-04-25T20:19:43Z

-                                                                AGGREGATE_NAME,
-                                                                windowStoreBuilder(storeName, serde),
-                                                                false);
+        return aggregate(initializer, aggregator, Materialized.<K, VR, WindowStore<Bytes, byte[]>>with(keySerde, null));


This is a meta explanation about the serde inheritance across multiple internal impl classes:

reduce: inherit the key and value serdes from the parent XXImpl class.

count: inherit the key serdes, enforce setting the Serdes.Long() for value serdes.

aggregate: inherit the key serdes, do not set for value serdes internally (line 92 here is for this case).

guozhangwang · 2018-05-07T17:19:01Z

@mjsax @bbejeck @vvcephei this PR is ready for reviews now.

…p120-Kip182-deprecated

bbejeck

Just one minor comment, otherwise LGTM.

bbejeck · 2018-05-07T19:22:03Z

               .groupBy(MockMapper.selectKeyKeyValueMapper())
               .count();

+        System.out.print(builder.build().describe());


Is this intentional?

No :P Will remove it.

bbejeck · 2018-05-07T19:30:40Z

Failure unrelated, exceded rate limit.

Retest this please

mjsax

Pretty hard to review... Overall looks good. Couple of questions.

Also one meta comment about Serde inheritance: Isn't it inconsistent to do this for groupBy-aggregate only but not for other operators? For example a builder.stream("", Consumed.with(...)).filter.to("", Produced.with()) could forward the Serdes from the source via the filter to the sink making the Produced.with() unnecessary. I like Serde inheritance, but should we do it consistently instead of picking a "random" pair of operators that support it?

mjsax · 2018-05-07T20:20:41Z

     * ReadOnlyKeyValueStore<String,Long> localStore = streams.store(queryableStoreName, QueryableStoreTypes.<String, Long>keyValueStore());
     * String key = "some-key";
     * Long sumForKey = localStore.get(key); // key must be local (application state is shared over all running Kafka Streams instances)
     * }</pre>
     * For non-local keys, a custom RPC mechanism must be implemented using {@link KafkaStreams#allMetadata()} to
     * query the value of the key on a parallel running instance of your Kafka Streams application.
-     * <p>


why do we remove this?

Hmm.. good point, I will add them back.

mjsax · 2018-05-07T20:22:23Z

     * The result is written into a local {@link KeyValueStore} (which is basically an ever-updating materialized view)
-     * provided by the given {@code storeSupplier}.
+     * that can be queried using the provided {@code queryableStoreName}.


this need to be updated?

mjsax · 2018-05-07T20:23:31Z

     * <pre>{@code
-     * KafkaStreams streams = ... // compute sum
-     * String queryableStoreName = storeSupplier.name();
+     * KafkaStreams streams = ... // some aggregation on value type double
     * ReadOnlyKeyValueStore<String,Long> localStore = streams.store(queryableStoreName, QueryableStoreTypes.<String, Long>keyValueStore());


in the example above,

String queryableStoreName = "storeName" // the queryableStoreName should be the name of the store as defined by the Materialized instance

was inserted. should we do the same here?

mjsax · 2018-05-07T20:31:48Z

     * The key of the result record is the same as for both joining input records.
     * Furthermore, for each input record of both {@code KStream}s that does not satisfy the join predicate the provided
-     * {@link ValueJoiner} will be called with a {@code null} value for this/other stream, respectively.
+     * {@link ValueJoiner} will be called with a {@code null} value for the this/other stream, respectively.


mjsax · 2018-05-07T20:37:02Z

     * internal repartitioning topic in Kafka and write and re-read the data via this topic before the actual join.
     * The repartitioning topic will be named "${applicationId}-XXX-repartition", where "applicationId" is
-     * user-specified in {@link StreamsConfig} via parameter
+     * user-specified in {@link  StreamsConfig} via parameter


nit: double space

mjsax · 2018-05-07T20:40:41Z

        Objects.requireNonNull(materialized, "materialized can't be null");
        final MaterializedInternal<K, V, KeyValueStore<Bytes, byte[]>> materializedInternal
-                = new MaterializedInternal<>(materialized, builder, REDUCE_NAME);
+                = new MaterializedInternal<>(materialized, builder, AGGREGATE_NAME);


Why the change to AGGREGATE_NAME ?

This is a good one: while working on this class I realized that we mistakenly used REDUCE_NAME for the prefix, while before this we used AGGREGATE_NAME, and in count and other reduce places we use AGGREGATE_NAME as well.

So technically I think this is the right fix, but arguably it will result in different store names if users are on this version already.

we mistakenly used REDUCE_NAME for the prefix

Why "mistakenly" -- it's a reduce(...) operator? Because count() is an special case of aggregation() it seems to be fine to use AGGREGATE_NAME there?

There are three places where reduce were called: KGroupedStream, TimeWindowedKStream, and SessionWindowedKStream. What I've observed is that in the first two we use REDUCE_NAME as both processor name and store name prefixes, and in the latter we use AGGREGTE_NAME for both processor name and store name prefixes. I thought the latter was right and the former is not. But if people can correct me that it is the other way around I'm happy to change in the other way (either case, there is a compatibility concern).

I agree about the compatibility concern -- if we change the name, it might break the upgrade path. To me, REDUCE_NAME seems to be correct and thus, if we really change align all of them (not sure if it's worth it), we should update SessionWindowedKStream#reduce().

Sounds good, will leave a note on the upgrade path as well.

mjsax · 2018-05-07T20:49:43Z

+
+            builder.internalTopologyBuilder.addProcessor(name, processorSupplier, this.name);
+
+            return new KTableImpl<>(builder,


might be better to move the return after the if-then-else to unify code and just use a boolean variable for the last parameter?

Ack. Will try to do the same for mapValues as well.

mjsax · 2018-05-07T20:58:26Z

+        // only materialize if the state store is queryable
+        KTableProcessorSupplier<K, V, V> processorSupplier;
+
+        if (materialized != null && materialized.isQueryable()) {


As this is a private method, we could simplify be never calling with null?

I thought about that, but this would require the other callers to pass in a dummy Materialized which I think does not worth the code cleanness. I've refactored the part according to your comment above and LMK what do you think.

mjsax · 2018-05-07T21:04:46Z


 /**
- * Similar to KStreamAggregationIntegrationTest but with dedupping enabled
+ * Similar to KStreamAggregationIntegrationTest but with de dupping enabled


nit: I think deduping is correct?

guozhangwang · 2018-05-07T21:46:46Z

@mjsax I understand this is a large PR... I've tried hard to make it into multiple smaller ones (I'll probably have two other PRs at about similar sizes) while the hope is that for this PR, most of the removals are straight forward except the fixes mentioned in 2).

Regarding your question about inheritance: I did this for the following reasons:

a) Before KIP-182 we are actually doing sth. already about the inheritance, but only for count and reduce; when we add KIP-182 we mistakenly dropped some of those.
b) For aggregations, i.e. from grouped stream / grouped table / time windowed grouped stream, the only following operation is aggregation; while for other operators like stream().to() we need to decide whether to inherit based on the actual operator, which is more tricky. I do have plans to enhance the inheritance of serdes once we have removed the deprecation though (that was part of the plan for post-KIP182 anyways) but I want separate that general enhancement from this smaller scoped feature.
c) Scala API is actually relying on the fix above to make strong typing strict. So I want to make sure this part does make it to 2.0 release along with the Scala API.

mjsax · 2018-05-08T00:09:39Z

+ *  This class allows to access the {@link InternalTopologyBuilder} a {@link Topology} object.
+ *
+ */
+public class TopologyWrapper extends Topology {


Was this added on purpose in the last commit? Seems to be unused.

Not on purpose, it was added for a follow-up PR. Will revert for now.

mjsax

Feel free to merge. Please don't forget to update the docs -- can be done in follow up PR, too.

@guozhangwang

) In #4919 we propagate the SerDes for each of these aggregation operators. As @guozhangwang mentioned in that PR: ``` reduce: inherit the key and value serdes from the parent XXImpl class. count: inherit the key serdes, enforce setting the Serdes.Long() for value serdes. aggregate: inherit the key serdes, do not set for value serdes internally. ``` Although it's all good for reduce and count, it is quiet unsafe to have aggregate without Materialized given. In fact I don't see why we would not give a Materialized for the aggregate since the result type will always be different (otherwise use reduce) and also the value Serde is simply not propagated. This has been discussed previously in a broader PR before but I believe for aggregate we could pass implicitly a Materialized the same way we pass a Joined, just to avoid the stupid case. Then if the user wants to specialize, he can give his own Materialized. Reviewers: Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>

#4919 unintentionally changed the topology naming scheme. This change returns to the prior scheme. Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

I'm breaking KAFKA-6813 into a couple of "smaller" PRs and this is the first one. It focused on: Remove deprecated APIs in KStream, KTable, KGroupedStream, KGroupedTable, SessionWindowedKStream, TimeWindowedKStream. Also found a couple of overlooked bugs while working on them: 2.a) In KTable.filter / mapValues without the additional parameter indicating the materialized stores, originally we will not materialize the store. After KIP-182 we mistakenly diverge the semantics: for KTable.mapValues it is still the case, for KTable.filter we will always materialize. 2.b) In XXStream/Table.reduce/count, we used to try to reuse the serdes since their types are pre-known (for reduce it is the same types for both key / value, for count it is the same types for key, and Long for value). This was somehow lost in the past refactoring. 2.c) We are enforcing to cast a Serde<V> to Serde<VR> for XXStream / Table.aggregate, for which the returned value type is NOT known, such the enforced casting should not be applied and we should require users to provide us the value serde if they believe the default ones are not applicable. 2.d) Whenever we are creating a new MaterializedInternal we are effectively incrementing the suffix index for the store / processor-node names. However in some places this MaterializedInternal is only used for validation, so the resulted processor-node / store suffix is not monotonic. Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>

@guozhangwang

…ache#5066) In apache#4919 we propagate the SerDes for each of these aggregation operators. As @guozhangwang mentioned in that PR: ``` reduce: inherit the key and value serdes from the parent XXImpl class. count: inherit the key serdes, enforce setting the Serdes.Long() for value serdes. aggregate: inherit the key serdes, do not set for value serdes internally. ``` Although it's all good for reduce and count, it is quiet unsafe to have aggregate without Materialized given. In fact I don't see why we would not give a Materialized for the aggregate since the result type will always be different (otherwise use reduce) and also the value Serde is simply not propagated. This has been discussed previously in a broader PR before but I believe for aggregate we could pass implicitly a Materialized the same way we pass a Joined, just to avoid the stupid case. Then if the user wants to specialize, he can give his own Materialized. Reviewers: Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>

…e#5075) apache#4919 unintentionally changed the topology naming scheme. This change returns to the prior scheme. Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

guozhangwang added 10 commits April 21, 2018 14:41

kstream and ktable deprecated stateful operators

56134c1

rebase from trunk

3d33190

remove deprecated apis in KGroupedTable and KGroupedStream

dfed53b

continue working on KStreamImpl and KTableImpl

0bd43c3

Merge branch 'trunk' of https://github.com/apache/kafka into K6813-Ki…

d7c90f6

…p120-Kip182-deprecated

continue working on KGroupedStreamImpl

c302aba

fix unit tests

81c05ca

fix two critical bugs

9592fc1

more fixes on unit tests

628069b

fixed all unit tests

a10e11c

guozhangwang changed the title ~~[WIP] KAFKA-6813: Remove deprecated APIs in KIP-120 and KIP-182~~ KAFKA-6813: Remove deprecated APIs in KIP-182, Part I Apr 25, 2018

a few minor cleanups

33fc44b

guozhangwang commented Apr 25, 2018

View reviewed changes

seglo added a commit to lightbend/kafka that referenced this pull request May 6, 2018

Key serdes no longer required with fixes in apache#4919

4b451d5

seglo mentioned this pull request May 6, 2018

MINOR: Build and code sample updates for Kafka Streams DSL for Scala #4949

Merged

3 tasks

rebase from trunk

46df4a1

fix findbugs

0d5364c

mjsax added the streams label May 7, 2018

Merge branch 'trunk' of https://github.com/apache/kafka into K6813-Ki…

eb9ab11

…p120-Kip182-deprecated

bbejeck approved these changes May 7, 2018

View reviewed changes

mjsax reviewed May 7, 2018

View reviewed changes

github comments

52d3f1f

mjsax reviewed May 8, 2018

View reviewed changes

guozhangwang added 2 commits May 7, 2018 17:14

remove TopologyWrapper

bb9d4fb

github comments

52b7257

mjsax approved these changes May 8, 2018

View reviewed changes

guozhangwang merged commit 2b5a594 into apache:trunk May 8, 2018

seglo mentioned this pull request May 9, 2018

Count fix and Type alias refactor in Streams Scala API #4966

Merged

3 tasks

joan38 mentioned this pull request May 22, 2018

KAFKA-6936: Implicit Materialized for aggregates #5066

Merged

3 tasks

vvcephei mentioned this pull request May 24, 2018

KAFKA-6813: return to double-counting for count topology names #5075

Merged

3 tasks

vvcephei mentioned this pull request Jun 4, 2018

MINOR: add regression tests for KTable mapValues and filter #5134

Merged

3 tasks

guozhangwang deleted the K6813-Kip120-Kip182-deprecated branch April 24, 2020 23:57


		builder.internalTopologyBuilder.addProcessor(name, processorSupplier, this.name);

		return new KTableImpl<>(builder,

Conversation

guozhangwang commented Apr 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guozhangwang commented May 7, 2018

Uh oh!

bbejeck left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bbejeck commented May 7, 2018

Uh oh!

mjsax left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guozhangwang commented May 7, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

guozhangwang commented Apr 24, 2018 •

edited

Loading