KAFKA-10003: Mark KStream.through() as deprecated by mjsax · Pull Request #8679 · apache/kafka

mjsax · 2020-05-17T02:49:10Z

part of KIP-221

- part of KIP-221

mjsax · 2020-05-17T02:50:33Z

Also updates the Scala API...

mjsax · 2020-05-17T02:55:14Z

-                        </ol>
-                    </td>
-                </tr>
-                <tr class="row-odd"><td><p class="first"><strong>Through</strong></p>


The diff is weird because the part above repeats below. The actual deletes starts here.

mjsax · 2020-05-17T02:57:38Z

     * If a {@link StreamPartitioner custom partitioner} has been
     * {@link ProducerConfig#PARTITIONER_CLASS_CONFIG configured} via {@link StreamsConfig} or
-     * {@link KStream#through(String, Produced)}, or if the original {@link KTable}'s input
+     * {@link KStream#repartition(Repartitioned)}, or if the original {@link KTable}'s input


Not sure if this update is necessary. This method is deprecated itself.

Might as well make this update, since we may remove the methods at different times.

mjsax · 2020-05-17T02:58:08Z

     * from the auto-generated topic using default serializers, deserializers, and producer's {@link DefaultPartitioner}.
     * The number of partitions is determined based on the upstream topics partition numbers.
     * <p>
-     * This operation is similar to {@link #through(String)}, however, Kafka Streams manages the used topic automatically.


Not 100% sure if we should remove this now, or when we remove through()?

I'd agree with removing it. I guess if you want to preserve it in some fashion, you could add the opposite statement to the through() documentation.

mjsax · 2020-05-17T02:59:05Z

-     * {@link #transform(TransformerSupplier, String...)}), and no data redistribution happened afterwards (e.g., via
-     * {@link #through(String)}) an internal repartitioning topic will be created in Kafka.
+     * {@link #map(KeyValueMapper)}, {@link #flatMap(KeyValueMapper)} or
+     * {@link #transform(TransformerSupplier, String...)}) an internal repartitioning topic will be created in Kafka.


Just simplifying this one.

mjsax · 2020-05-17T03:00:17Z

 import org.apache.kafka.streams.processor.internals.InternalTopicProperties;

-class RepartitionedInternal<K, V> extends Repartitioned<K, V> {
+public class RepartitionedInternal<K, V> extends Repartitioned<K, V> {


Must be public to be visible in Scala

It's worth noting that it only needs to be visible for the scala tests that verify the scala Repartitioned builder results in a correctly configured object. For the public API, we only convert a scala Repartitioned to a java Repartitioned.

mjsax · 2020-05-17T03:00:44Z

-
+
+    @Test
+    public void shouldProcessViaRepartitionTopic() {


Replicated the test for through() for repartition().

mjsax · 2020-05-17T03:02:21Z

+            stream = input.repartition();
+        } else {
+            input.to(INTERMEDIATE_USER_TOPIC);
+            stream = builder.stream(INTERMEDIATE_USER_TOPIC);


We still need to test this, because topics using this pattern are still consider intermediate topics and the --intermediat-topic flag in StreamsResetter is still useful and not changed.

I'm wondering if we should continue testing with through, to ensure it continues to work. WDYT?

Well, through() is literally implemented as to() + stream()... But I can revert and add a suppress annotation, too.

On re-reading, I realize I misunderstood the situation. I revert my comment 😬 .

mjsax · 2020-05-17T03:03:47Z

        if (throughTopic != null) {
-            output = input.through(throughTopic);
+            input.to(throughTopic);
+            output = builder.stream(throughTopic);


Using to() and steam() is "simpler" as we cleanup topics in-between (and thus avoid internal topics).

We could of course also use repartition().

mjsax · 2020-05-17T03:04:04Z

    private static final String TEST_ID = "reset-with-ssl-integration-test";

-    private static Map<String, Object> sslConfig;
+    private static final Map<String, Object> SSL_CONFIG;


side cleanup

mjsax · 2020-05-17T03:04:17Z

    }

+    @Test
+    public void shouldNotAllowNullRepartitionedOnRepartition() {


replicating test

mjsax · 2020-05-17T03:04:49Z

+        assertEquals(((AbstractStream) stream1.repartition()).keySerde(), consumedInternal.keySerde());
+        assertEquals(((AbstractStream) stream1.repartition()).valueSerde(), consumedInternal.valueSerde());
+        assertEquals(((AbstractStream) stream1.repartition(Repartitioned.with(mySerde, mySerde))).keySerde(), mySerde);
+        assertEquals(((AbstractStream) stream1.repartition(Repartitioned.with(mySerde, mySerde))).valueSerde(), mySerde);


replicating test cases

mjsax · 2020-05-17T03:05:00Z

    }

+    @Test
+    public void shouldUseRecordMetadataTimestampExtractorWithRepartition() {


replicating test

mjsax · 2020-05-17T03:05:08Z

    }

+    @Test
+    public void shouldSendDataThroughRepartitionTopicUsingRepartitioned() {


replicating test

mjsax · 2020-05-17T03:05:44Z

        if (withRepartitioning) {
-            final KStream<String, Integer> repartitionedData = data.through("repartition");
+            data.to("repartition");
+            final KStream<String, Integer> repartitionedData = builder.stream("repartition");


As above. Avoid internal topics.

mjsax · 2020-05-17T03:08:50Z

   *
   * //..
-   * val clicksPerRegion: KTable[String, Long] = //..
+   * val clicksPerRegion: KStream[String, Long] = //..


There is no KTable#through() method.

mjsax · 2020-05-17T03:10:17Z

  }

-  "Create a Produced with timestampExtractor and resetPolicy" should "create a Consumed with Serdes, timestampExtractor and resetPolicy" in {
+  "Create a Produced with streamPartitioner" should "create a Produced with Serdes and streamPartitioner" in {


Side cleanup (was originally copied from ConsumedTest but not updated correctly)

mjsax · 2020-05-17T03:53:12Z

+   * @return A new [[Repartitioned]] instance configured with keySerde and valueSerde
+   * @see KStream#repartition(Repartitioned)
+   */
+  def `with`[K, V](implicit keySerde: Serde[K], valueSerde: Serde[V]): RepartitionedJ[K, V] =


I just named all method with in alignment to the other Scala helper classes.

Also noticed, that all helper classed only have static methods... Is not by design? Seems we are missing something here? If there is more than one optional parameter, it seems we should have non-static method to allow method chaining? (Could be fixed in a follow up PR)

We'd have to change from object to class or case class (which would have been my preference to begin with), since objects can only have static members.

Probably, this ship has sailed for now, and we should just keep doing what the other similar classes are doing. Since we've found so much wackiness in the Scala API since it was introduced, it might be a good idea to consider revamping the whole thing from scratch some day.

vvcephei

Just found another set of zombie comments I meant to send some time in the past. I'll continue my review.

vvcephei · 2020-05-18T02:03:02Z

     * @return a {@code KStream} that contains the exact same (and potentially repartitioned) records as this {@code KStream}
-     * @see #repartition()
-     * @see #repartition(Repartitioned)
+     * @deprecated used {@link #repartition()} instead


It's a little nice for future reference when we also say when it became deprecated, such as "since 2.6".

Not sure why? If I use 2.6 why do I can if it was deprecated in 2.4 or 2.2 or 2.6? It's deprecated in the version I use now. Why would I care about older versions?

For one thing, it's nice for us, so we can easily tell when it's been deprecated "long enough" to remove. I can recall trudging through git history in the past to figure this out.

For users, maybe you don't care, but I personally find it nice when my libraries do this for me. It's just good bookkeeping, and it gives me some confidence that the maintainers are doing proper, tidy maintenance.

If it provides a "third party" supporting opinion, the Scala language designers thought this was important enough to build it in as a separate field of the "deprecated" annotation: https://docs.scala-lang.org/tour/annotations.html

vvcephei · 2020-05-18T02:04:20Z

     * from the auto-generated topic using default serializers, deserializers, and producer's {@link DefaultPartitioner}.
     * The number of partitions is determined based on the upstream topics partition numbers.
     * <p>
-     * This operation is similar to {@link #through(String)}, however, Kafka Streams manages the used topic automatically.


I'd agree with removing it. I guess if you want to preserve it in some fashion, you could add the opposite statement to the through() documentation.

vvcephei · 2020-05-18T02:17:30Z

+   * @return A new [[Repartitioned]] instance configured with keySerde and valueSerde
+   * @see KStream#repartition(Repartitioned)
+   */
+  def `with`[K, V](implicit keySerde: Serde[K], valueSerde: Serde[V]): RepartitionedJ[K, V] =


We'd have to change from object to class or case class (which would have been my preference to begin with), since objects can only have static members.

Probably, this ship has sailed for now, and we should just keep doing what the other similar classes are doing. Since we've found so much wackiness in the Scala API since it was introduced, it might be a good idea to consider revamping the whole thing from scratch some day.

vvcephei

Thanks, @mjsax ! Completed my full pass.

vvcephei · 2020-05-19T19:13:34Z

     * If a {@link StreamPartitioner custom partitioner} has been
     * {@link ProducerConfig#PARTITIONER_CLASS_CONFIG configured} via {@link StreamsConfig} or
-     * {@link KStream#through(String, Produced)}, or if the original {@link KTable}'s input
+     * {@link KStream#repartition(Repartitioned)}, or if the original {@link KTable}'s input


Might as well make this update, since we may remove the methods at different times.

vvcephei · 2020-05-19T19:19:32Z

 import org.apache.kafka.streams.processor.internals.InternalTopicProperties;

-class RepartitionedInternal<K, V> extends Repartitioned<K, V> {
+public class RepartitionedInternal<K, V> extends Repartitioned<K, V> {


It's worth noting that it only needs to be visible for the scala tests that verify the scala Repartitioned builder results in a correctly configured object. For the public API, we only convert a scala Repartitioned to a java Repartitioned.

vvcephei · 2020-05-19T19:21:15Z

-
+
+    @Test
+    public void shouldProcessViaRepartitionTopic() {


vvcephei · 2020-05-19T19:23:49Z

+            stream = input.repartition();
+        } else {
+            input.to(INTERMEDIATE_USER_TOPIC);
+            stream = builder.stream(INTERMEDIATE_USER_TOPIC);


I'm wondering if we should continue testing with through, to ensure it continues to work. WDYT?

vvcephei · 2020-05-19T19:31:42Z


    }

+    @SuppressWarnings("deprecation") // specifically testing the deprecated variant


This would be a case where I would advocate more strongly to deprecate this method, to avoid accidentally "hiding" the deprecation from callers.

Well, but then we need to add more suppression or deprecation upstream. Does not seem worth for testing code

This is exactly the point!

vvcephei · 2020-05-19T19:32:38Z

   *
   * //..
-   * val clicksPerRegion: KTable[String, Long] = //..
+   * val clicksPerRegion: KStream[String, Long] = //..


Co-authored-by: John Roesler <vvcephei@users.noreply.github.com>

…scala/kstream/KStream.scala Co-authored-by: John Roesler <vvcephei@users.noreply.github.com>

vvcephei

Thanks for the update, @mjsax ! I replied above to the threads, and had just two more new comments.

vvcephei · 2020-05-19T21:57:25Z

+   * @return a [[KStream]] that contains the exact same repartitioned records as this [[KStream]]
+   * @see `org.apache.kafka.streams.kstream.KStream#repartition`
+   */
+  def repartition(implicit repartitioned: Repartitioned[K, V]): KStream[K, V] =


I just noticed that we have no test for this operator (or for through). Should we add one?

Not sure what we can/should test?

We have previously had embarrassing bugs like, "It's not possible to write code that compiles using the Scala DSL". If we had had even the most trivial test written for those DSL methods, we would never have released those bugs. So, I'd just recommend creating any topology that contains this operator and maybe using TTD to pipe a single record through it to ensure that it doesn't throw any runtime exceptions when you use it.

…scala/kstream/KStream.scala Co-authored-by: John Roesler <vvcephei@users.noreply.github.com>

lkokhreidze

Thanks for picking this up @mjsax , lgtm.

vvcephei

Thanks for this thorough (and thoroughly awesome) PR, @mjsax !

I responded to the question about the scala API test, above; otherwise I'm +1.

mjsax · 2020-05-21T00:34:28Z

Added the test. Will merge after Jenkins passed.

chia7712 · 2020-05-21T07:49:38Z

     * @return a {@code KStream} that contains the exact same (and potentially repartitioned) records as this {@code KStream}
-     * @see #repartition()
-     * @see #repartition(Repartitioned)
+     * @deprecated since 2.6; use #repartition(Repartitioned) instead


Could we use {@link #repartition(Repartitioned)} ?

vvcephei

Thanks so much, @mjsax !

mjsax · 2020-05-21T21:47:52Z

Java 8 passed.
Java 11:

org.apache.kafka.streams.integration.EosBetaUpgradeIntegrationTest.shouldUpgradeFromEosAlphaToEosBeta[false]

Java 14:

org.apache.kafka.streams.integration.EosBetaUpgradeIntegrationTest.shouldUpgradeFromEosAlphaToEosBeta[false]

mjsax · 2020-05-22T15:39:48Z

Java 8 and Java 11 passed.
Java 14:

org.apache.kafka.streams.integration.EosBetaUpgradeIntegrationTest.shouldUpgradeFromEosAlphaToEosBeta[true]

* 'trunk' of github.com:apache/kafka: KAFKA-9888: Copy connector configs before passing to REST extensions (apache#8511) KAFKA-9931: Implement KIP-605 to expand support for Connect worker internal topic configurations (apache#8654) KAFKA-6145: Add unit tests for assignments of only stateless tasks (apache#8713) MINOR: Fix join group request timeout lower bound (apache#8702) MINOR: Improve security documentation for Kafka Streams apache#8710 KAFKA-6145: KIP-441: Enforce Standby Task Stickiness (apache#8696) KAFKA-10003: Mark KStream.through() as deprecated and update Scala API (apache#8679)

KAFKA-10003: Mark KStream.through() as deprecated

29eda14

- part of KIP-221

mjsax added the streams label May 17, 2020

mjsax commented May 17, 2020

View reviewed changes

mjsax added 2 commits May 16, 2020 20:44

Missed some stuff

5ea08b5

Fix imports

7a5e26c

mjsax commented May 17, 2020

View reviewed changes

fix

7f03995

vvcephei reviewed May 19, 2020

View reviewed changes

mjsax and others added 4 commits May 19, 2020 13:28

Update docs/streams/developer-guide/dsl-api.html

542340d

Co-authored-by: John Roesler <vvcephei@users.noreply.github.com>

Update docs/streams/upgrade-guide.html

1786a0a

Co-authored-by: John Roesler <vvcephei@users.noreply.github.com>

Update docs/streams/upgrade-guide.html

7cf29ab

Co-authored-by: John Roesler <vvcephei@users.noreply.github.com>

Update streams/streams-scala/src/main/scala/org/apache/kafka/streams/…

8333e47

…scala/kstream/KStream.scala Co-authored-by: John Roesler <vvcephei@users.noreply.github.com>

vvcephei reviewed May 19, 2020

View reviewed changes

mjsax and others added 2 commits May 19, 2020 17:16

Update streams/streams-scala/src/main/scala/org/apache/kafka/streams/…

5b5ef0b

…scala/kstream/KStream.scala Co-authored-by: John Roesler <vvcephei@users.noreply.github.com>

Github comments

28bc278

lkokhreidze approved these changes May 20, 2020

View reviewed changes

vvcephei approved these changes May 20, 2020

View reviewed changes

Add test

647b367

chia7712 reviewed May 21, 2020

View reviewed changes

vvcephei approved these changes May 21, 2020

View reviewed changes

Github comments

dfcb6a3

mjsax merged commit 27824ba into apache:trunk May 22, 2020

mjsax deleted the kafka-10003-deprecate-through branch May 22, 2020 15:41

mjsax added the kip Requires or implements a KIP label Jun 12, 2020


		}

		@SuppressWarnings("deprecation") // specifically testing the deprecated variant

Conversation

mjsax commented May 17, 2020

Uh oh!

mjsax commented May 17, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mjsax May 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvcephei left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvcephei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mjsax May 17, 2020 •

edited

Loading