Skip to content

KAFKA-6813: Remove deprecated APIs in KIP-182, Part I#4919

Merged
guozhangwang merged 17 commits intoapache:trunkfrom
guozhangwang:K6813-Kip120-Kip182-deprecated
May 8, 2018
Merged

KAFKA-6813: Remove deprecated APIs in KIP-182, Part I#4919
guozhangwang merged 17 commits intoapache:trunkfrom
guozhangwang:K6813-Kip120-Kip182-deprecated

Conversation

@guozhangwang
Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang commented Apr 24, 2018

I'm breaking KAFKA-6813 into a couple of "smaller" PRs and this is the first one. It focused on:

  1. Remove deprecated APIs in KStream, KTable, KGroupedStream, KGroupedTable, SessionWindowedKStream, TimeWindowedKStream.

  2. Also found a couple of overlooked bugs while working on them:

2.a) In KTable.filter / mapValues without the additional parameter indicating the materialized stores, originally we will not materialize the store. After KIP-182 we mistakenly diverge the semantics: for KTable.mapValues it is still the case, for KTable.filter we will always materialize.

2.b) In XXStream/Table.reduce/count, we used to try to reuse the serdes since their types are pre-known (for reduce it is the same types for both key / value, for count it is the same types for key, and Long for value). This was somehow lost in the past refactoring.

2.c) We are enforcing to cast a Serde<V> to Serde<VR> for XXStream / Table.aggregate, for which the returned value type is NOT known, such the enforced casting should not be applied and we should require users to provide us the value serde if they believe the default ones are not applicable.

2.d) Whenever we are creating a new MaterializedInternal we are effectively incrementing the suffix index for the store / processor-node names. However in some places this MaterializedInternal is only used for validation, so the resulted processor-node / store suffix is not monotonic.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@guozhangwang guozhangwang changed the title [WIP] KAFKA-6813: Remove deprecated APIs in KIP-120 and KIP-182 KAFKA-6813: Remove deprecated APIs in KIP-182, Part I Apr 25, 2018
}
};

final Initializer<V> reduceInitializer = new Initializer<V>() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved from TimeWindowedStreamImpl, just to be consistent with the other const functions.

final MaterializedInternal<K, V, KeyValueStore<Bytes, byte[]>> materializedInternal
= new MaterializedInternal<>(materialized, builder, REDUCE_NAME);
= new MaterializedInternal<>(materialized, builder, AGGREGATE_NAME);
if (materializedInternal.keySerde() == null) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for 2.b), ditto elsewhere.

true);
final String name = builder.newProcessorName(FILTER_NAME);

// only materialize if the state store is queryable
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for 2.a), ditto for mapValues.

Objects.requireNonNull(initializer, "initializer can't be null");
Objects.requireNonNull(aggregator, "aggregator can't be null");
Objects.requireNonNull(sessionMerger, "sessionMerger can't be null");
return doAggregate(initializer, aggregator, sessionMerger, (Serde<T>) valSerde);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for 2.c), ditto elsewhere.

materializedInternal.withKeySerde(keySerde);
}
if (materializedInternal.valueSerde() == null) {
materialized.withValueSerde(Serdes.Long());
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for 2.d), ditto elsewhere.

AGGREGATE_NAME,
windowStoreBuilder(storeName, serde),
false);
return aggregate(initializer, aggregator, Materialized.<K, VR, WindowStore<Bytes, byte[]>>with(keySerde, null));
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a meta explanation about the serde inheritance across multiple internal impl classes:

  1. reduce: inherit the key and value serdes from the parent XXImpl class.
  2. count: inherit the key serdes, enforce setting the Serdes.Long() for value serdes.
  3. aggregate: inherit the key serdes, do not set for value serdes internally (line 92 here is for this case).

@guozhangwang
Copy link
Copy Markdown
Contributor Author

@mjsax @bbejeck @vvcephei this PR is ready for reviews now.

@mjsax mjsax added the streams label May 7, 2018
Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor comment, otherwise LGTM.

.groupBy(MockMapper.selectKeyKeyValueMapper())
.count();

System.out.print(builder.build().describe());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No :P Will remove it.

@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented May 7, 2018

Failure unrelated, exceded rate limit.

Retest this please

Copy link
Copy Markdown
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty hard to review... Overall looks good. Couple of questions.

Also one meta comment about Serde inheritance: Isn't it inconsistent to do this for groupBy-aggregate only but not for other operators? For example a builder.stream("", Consumed.with(...)).filter.to("", Produced.with()) could forward the Serdes from the source via the filter to the sink making the Produced.with() unnecessary. I like Serde inheritance, but should we do it consistently instead of picking a "random" pair of operators that support it?

* ReadOnlyKeyValueStore<String,Long> localStore = streams.store(queryableStoreName, QueryableStoreTypes.<String, Long>keyValueStore());
* String key = "some-key";
* Long sumForKey = localStore.get(key); // key must be local (application state is shared over all running Kafka Streams instances)
* }</pre>
* For non-local keys, a custom RPC mechanism must be implemented using {@link KafkaStreams#allMetadata()} to
* query the value of the key on a parallel running instance of your Kafka Streams application.
* <p>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we remove this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. good point, I will add them back.

* The result is written into a local {@link KeyValueStore} (which is basically an ever-updating materialized view)
* provided by the given {@code storeSupplier}.
* that can be queried using the provided {@code queryableStoreName}.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this need to be updated?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

* <pre>{@code
* KafkaStreams streams = ... // compute sum
* String queryableStoreName = storeSupplier.name();
* KafkaStreams streams = ... // some aggregation on value type double
* ReadOnlyKeyValueStore<String,Long> localStore = streams.store(queryableStoreName, QueryableStoreTypes.<String, Long>keyValueStore());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the example above,

String queryableStoreName = "storeName" // the queryableStoreName should be the name of the store as defined by the Materialized instance

was inserted. should we do the same here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

* The key of the result record is the same as for both joining input records.
* Furthermore, for each input record of both {@code KStream}s that does not satisfy the join predicate the provided
* {@link ValueJoiner} will be called with a {@code null} value for this/other stream, respectively.
* {@link ValueJoiner} will be called with a {@code null} value for the this/other stream, respectively.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

* internal repartitioning topic in Kafka and write and re-read the data via this topic before the actual join.
* The repartitioning topic will be named "${applicationId}-XXX-repartition", where "applicationId" is
* user-specified in {@link StreamsConfig} via parameter
* user-specified in {@link StreamsConfig} via parameter
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: double space

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

Objects.requireNonNull(materialized, "materialized can't be null");
final MaterializedInternal<K, V, KeyValueStore<Bytes, byte[]>> materializedInternal
= new MaterializedInternal<>(materialized, builder, REDUCE_NAME);
= new MaterializedInternal<>(materialized, builder, AGGREGATE_NAME);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change to AGGREGATE_NAME ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good one: while working on this class I realized that we mistakenly used REDUCE_NAME for the prefix, while before this we used AGGREGATE_NAME, and in count and other reduce places we use AGGREGATE_NAME as well.

So technically I think this is the right fix, but arguably it will result in different store names if users are on this version already.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we mistakenly used REDUCE_NAME for the prefix

Why "mistakenly" -- it's a reduce(...) operator? Because count() is an special case of aggregation() it seems to be fine to use AGGREGATE_NAME there?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three places where reduce were called: KGroupedStream, TimeWindowedKStream, and SessionWindowedKStream. What I've observed is that in the first two we use REDUCE_NAME as both processor name and store name prefixes, and in the latter we use AGGREGTE_NAME for both processor name and store name prefixes. I thought the latter was right and the former is not. But if people can correct me that it is the other way around I'm happy to change in the other way (either case, there is a compatibility concern).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about the compatibility concern -- if we change the name, it might break the upgrade path. To me, REDUCE_NAME seems to be correct and thus, if we really change align all of them (not sure if it's worth it), we should update SessionWindowedKStream#reduce().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will leave a note on the upgrade path as well.


builder.internalTopologyBuilder.addProcessor(name, processorSupplier, this.name);

return new KTableImpl<>(builder,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be better to move the return after the if-then-else to unify code and just use a boolean variable for the last parameter?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Will try to do the same for mapValues as well.

// only materialize if the state store is queryable
KTableProcessorSupplier<K, V, V> processorSupplier;

if (materialized != null && materialized.isQueryable()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is a private method, we could simplify be never calling with null?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but this would require the other callers to pass in a dummy Materialized which I think does not worth the code cleanness. I've refactored the part according to your comment above and LMK what do you think.


/**
* Similar to KStreamAggregationIntegrationTest but with dedupping enabled
* Similar to KStreamAggregationIntegrationTest but with de dupping enabled
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think deduping is correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

@guozhangwang
Copy link
Copy Markdown
Contributor Author

@mjsax I understand this is a large PR... I've tried hard to make it into multiple smaller ones (I'll probably have two other PRs at about similar sizes) while the hope is that for this PR, most of the removals are straight forward except the fixes mentioned in 2).

Regarding your question about inheritance: I did this for the following reasons:

a) Before KIP-182 we are actually doing sth. already about the inheritance, but only for count and reduce; when we add KIP-182 we mistakenly dropped some of those.
b) For aggregations, i.e. from grouped stream / grouped table / time windowed grouped stream, the only following operation is aggregation; while for other operators like stream().to() we need to decide whether to inherit based on the actual operator, which is more tricky. I do have plans to enhance the inheritance of serdes once we have removed the deprecation though (that was part of the plan for post-KIP182 anyways) but I want separate that general enhancement from this smaller scoped feature.
c) Scala API is actually relying on the fix above to make strong typing strict. So I want to make sure this part does make it to 2.0 release along with the Scala API.

* This class allows to access the {@link InternalTopologyBuilder} a {@link Topology} object.
*
*/
public class TopologyWrapper extends Topology {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this added on purpose in the last commit? Seems to be unused.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not on purpose, it was added for a follow-up PR. Will revert for now.

Copy link
Copy Markdown
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to merge. Please don't forget to update the docs -- can be done in follow up PR, too.

@guozhangwang guozhangwang merged commit 2b5a594 into apache:trunk May 8, 2018
guozhangwang pushed a commit that referenced this pull request Jun 1, 2018
)

In #4919 we propagate the SerDes for each of these aggregation operators.

As @guozhangwang mentioned in that PR:

```
reduce: inherit the key and value serdes from the parent XXImpl class.
count: inherit the key serdes, enforce setting the Serdes.Long() for value serdes.
aggregate: inherit the key serdes, do not set for value serdes internally.
```

Although it's all good for reduce and count, it is quiet unsafe to have aggregate without Materialized given. In fact I don't see why we would not give a Materialized for the aggregate since the result type will always be different (otherwise use reduce) and also the value Serde is simply not propagated.

This has been discussed previously in a broader PR before but I believe for aggregate we could pass implicitly a Materialized the same way we pass a Joined, just to avoid the stupid case. Then if the user wants to specialize, he can give his own Materialized.

Reviewers: Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>
guozhangwang pushed a commit that referenced this pull request Jun 4, 2018
#4919 unintentionally changed the topology naming scheme. This change returns to the prior scheme.

Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
ying-zheng pushed a commit to ying-zheng/kafka that referenced this pull request Jul 6, 2018
I'm breaking KAFKA-6813 into a couple of "smaller" PRs and this is the first one. It focused on:

Remove deprecated APIs in KStream, KTable, KGroupedStream, KGroupedTable, SessionWindowedKStream, TimeWindowedKStream.

Also found a couple of overlooked bugs while working on them:

2.a) In KTable.filter / mapValues without the additional parameter indicating the materialized stores, originally we will not materialize the store. After KIP-182 we mistakenly diverge the semantics: for KTable.mapValues it is still the case, for KTable.filter we will always materialize.

2.b) In XXStream/Table.reduce/count, we used to try to reuse the serdes since their types are pre-known (for reduce it is the same types for both key / value, for count it is the same types for key, and Long for value). This was somehow lost in the past refactoring.

2.c) We are enforcing to cast a Serde<V> to Serde<VR> for XXStream / Table.aggregate, for which the returned value type is NOT known, such the enforced casting should not be applied and we should require users to provide us the value serde if they believe the default ones are not applicable.

2.d) Whenever we are creating a new MaterializedInternal we are effectively incrementing the suffix index for the store / processor-node names. However in some places this MaterializedInternal is only used for validation, so the resulted processor-node / store suffix is not monotonic.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>
ying-zheng pushed a commit to ying-zheng/kafka that referenced this pull request Jul 6, 2018
…ache#5066)

In apache#4919 we propagate the SerDes for each of these aggregation operators.

As @guozhangwang mentioned in that PR:

```
reduce: inherit the key and value serdes from the parent XXImpl class.
count: inherit the key serdes, enforce setting the Serdes.Long() for value serdes.
aggregate: inherit the key serdes, do not set for value serdes internally.
```

Although it's all good for reduce and count, it is quiet unsafe to have aggregate without Materialized given. In fact I don't see why we would not give a Materialized for the aggregate since the result type will always be different (otherwise use reduce) and also the value Serde is simply not propagated.

This has been discussed previously in a broader PR before but I believe for aggregate we could pass implicitly a Materialized the same way we pass a Joined, just to avoid the stupid case. Then if the user wants to specialize, he can give his own Materialized.

Reviewers: Debasish Ghosh <dghosh@acm.org>, Guozhang Wang <guozhang@confluent.io>
ying-zheng pushed a commit to ying-zheng/kafka that referenced this pull request Jul 6, 2018
…e#5075)

apache#4919 unintentionally changed the topology naming scheme. This change returns to the prior scheme.

Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
@guozhangwang guozhangwang deleted the K6813-Kip120-Kip182-deprecated branch April 24, 2020 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants