KAFKA-6813: return to double-counting for count topology names by vvcephei · Pull Request #5075 · apache/kafka

vvcephei · 2018-05-24T19:26:06Z

#4919 unintentionally changed the topology naming scheme. This change returns to the prior scheme.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

vvcephei · 2018-05-25T16:34:34Z

retest this, please (I missed the test results).

vvcephei · 2018-05-25T16:59:26Z

@guozhangwang @bbejeck @mjsax , Please review this.

mjsax

The change looks good to me.

We should put some tests in place though to avoid a similar regression in the future. There are existing test that should have caught this -- guess we need to extend those tests.

bbejeck

LGTM.

vvcephei · 2018-05-26T00:23:11Z

Hm. The test failure is correlated. I'll have to take a look on Tuesday.

guozhangwang · 2018-05-29T17:02:32Z

I think in #4919 we've changed a few other places that needs to be considered in this PR, namely:

In KTable.filter we will create an internal materialized store even if Materialized is not passed in, hence "burning" one number for that store;

vvcephei · 2018-05-30T23:06:15Z

retest this, please.

vvcephei · 2018-05-31T13:50:59Z

retest this, please.

vvcephei · 2018-05-31T14:02:10Z

Java8 passed, and Java10 failed some broker test.

@guozhangwang @bbejeck @mjsax ,

I've been over the code changes several times, and I think this is the only alteration we introduced before (45% confidence). Please let me know if you have some other scenarios you want me to test.

Otherwise, I think this is ready for reviews.

vvcephei · 2018-05-31T14:04:23Z

This is hideous, but it's the best I could think of to get a look at the name in Materialized (which is protected) without auto-setting it (which happens in MaterializedInternal's constructor.

Actually, I did have another thought, which was that MaterializedInternal could avoid setting the name in the constructor and instead provide a method to do it. We could pursue a strategy where all the names are assigned in the DSL classes, which would make it much easier to see when they are getting assigned.

But this seemed like a risky change, so I opted for the less invasive choice.

Yeah, I agree with you that it would be a substantial refactoring. IMHO better to start with a smaller change and we can re-visit the name generation down the road.

A similar idea but not introducing a new class: we can add a new constructor of MaterializedInternal that only takes one parameter Materialized and do not try to create internal names at all, then after the object is created with this constructor, its storeName() may return null.

I think it is similar to what you have here but with fewer LOCs :P Besides, we should add comment on that constructor that it will be removed once we do not do this silly thing any more.

I think if we go down that road, we may aw well complete the journey to my alternative proposal to add a method to MaterializedInternal to explicitly set the store name if there isn't one and remove that logic from the constructor. This can be implemented safely by adding the new constructor you proposed, and mechanically in lining the old one.

Basically, every call to new MaterializedInternal... would become:

mi = new MaterializedInternal(...); mi.generateStoreNameIfNotSet();

Then we could judiciously remove the second line in cases like this one where we do not want it.

@vvcephei sounds good.

vvcephei · 2018-05-31T14:12:51Z

There's a bunch of incidental cleanup in here, just resolving IDEA warnings (many are related to moving to java 8).

vvcephei · 2018-05-31T14:13:38Z

This is not required at the Java 8 level.

vvcephei · 2018-05-31T14:13:49Z

This is not required at the Java 8 level.

bbejeck

Just have a couple of general comments, but this looks good to me. Nice job on the added test cases.

bbejeck · 2018-05-31T15:07:00Z

This logic is repeated in a couple of places.
I'm wondering if we could change MaterializedPeek to take the InternalSteamsBuilder as an additional constructor param and have the logic inside the class, and this block of code could be replaced with new MaterializedPeek<>(materialized, builder).maybeIncrementTopologyCount() or something like that.

I get where you're coming from, but I think the duplication is justified in this case. Aside from the three examples in this diff, we interact with Materialized and MaterializedInteral in dozens of places. What we're doing here is very specific to a regression in these three methods.

I think that if we were to separate the regression patch from these locations, we risk having it misused elsewhere. If we did consolidate it, I would want to name it something like new CountAggregationRegression4919Patch(materialized, builder).maybeIncrementTopologyCount() to be sure it's used properly.

My experience is that deduplicating code often results in a net increase in complexity and is therefore not without cost. I'm not saying we should have copy-pasted code all over the place, just that we shouldn't universally apply a deduplication strategy. It seems like this is an example where it's better to just have the logic applied where it's needed.

bbejeck · 2018-05-31T15:10:16Z

Yeah, I agree with you that it would be a substantial refactoring. IMHO better to start with a smaller change and we can re-visit the name generation down the road.

guozhangwang

Some nit comments, otherwise LGTM.

guozhangwang · 2018-05-31T22:38:16Z

A similar idea but not introducing a new class: we can add a new constructor of MaterializedInternal that only takes one parameter Materialized and do not try to create internal names at all, then after the object is created with this constructor, its storeName() may return null.

I think it is similar to what you have here but with fewer LOCs :P Besides, we should add comment on that constructor that it will be removed once we do not do this silly thing any more.

guozhangwang · 2018-05-31T22:39:24Z

vvcephei · 2018-06-01T21:27:57Z

@guozhangwang @bbejeck @mjsax ,

Following the CR discussion, I have replaced the former MaterializedInternal constructor with one that does not mutate the storeName. For correctness, the following condition should hold:

in every place that previously called new MaterializedInternal(...), we will now see the following two lines: new MaterializedInternal(...); materializedInternal.generateStoreNameIfNeeded()

I have verified for myself that this is true, but please take a look on your own.

There are some places where we redundantly instantiate MaterializedInternal with a store name, and then immediately call generate-name-if-needed, but I intentionally left it so that we could more easily certify that this change is correct.

While making this change, I noticed that there is a pre-existing pattern for it, since several locations effectively set the keySerde or valueSerde, likewise "if needed".

This change allows us to get rid of MaterializedPeek, since we can equivalently instantiate a MaterializedInternal now. These three locations are the only time we instantiate a MI without then calling generate-store-name.

One final note: the constructor previously set queryable such that there was an invariant isQueryable() == true iff storeName() != null. Since we no longer set the storeName in the constructor, maintaining the internal field was less convenient, so I instead opted for an equivalent method body for isQueryable().

guozhangwang

One nit comment, otherwise lgtm.

Will merge the PR once it is addressed and jenkins passed.

guozhangwang · 2018-06-01T21:55:00Z

nit: indent

* remove MaterializePeek now that MI constructor doesn't mutate storeName

vvcephei · 2018-06-04T18:36:22Z

Turns out I misread the code around isQueryable. It's a good thing we have tests! I've updated the PR. Watching to see if/when the tests pass...

vvcephei · 2018-06-04T20:08:01Z

Java 8 passed.

Java 10 failure was unrelated:

java.lang.AssertionError: Expected an exception of type org.apache.kafka.common.errors.TimeoutException; got type org.apache.kafka.common.errors.SslAuthenticationException
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.assertTrue(Assert.java:41)
	at kafka.api.AdminClientIntegrationTest.assertFutureExceptionTypeEquals(AdminClientIntegrationTest.scala:136)
	at kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1001)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.lang.Thread.run(Thread.java:844)

vvcephei · 2018-06-04T20:09:24Z

@guozhangwang , I think this is ready to merge, but I can run the tests again, if you want to see both builds pass.

guozhangwang · 2018-06-04T22:34:16Z

Thanks @vvcephei , merged to trunk.

…grained-acl-create-topics * apache-github/trunk: KAFKA-5588: Remove deprecated --new-consumer tools option (apache#5097) MINOR: Fix for the location of the trogdor.sh executable file in the documentation. (apache#5040) KAFKA-6997: Exclude test-sources.jar when $INCLUDE_TEST_JARS is FALSE MINOR: docs should point to latest version (apache#5132) KAFKA-6981: Move the error handling configuration properties into the ConnectorConfig and SinkConnectorConfig classes (KIP-298) [KAFKA-6730] Simplify State Store Recovery (apache#5013) MINOR: Rename package `internal` to `internals` for consistency (apache#5137) KAFKA-6704: InvalidStateStoreException from IQ when StreamThread closes store (apache#4801) MINOR: Add missing configs for resilience settings MINOR: Add regression tests for KTable mapValues and filter (apache#5134) KAFKA-6750: Add listener name to authentication context (KIP-282) (apache#4829) KAFKA-3665: Enable TLS hostname verification by default (KIP-294) (apache#4956) KAFKA-6938: Add documentation for accessing Headers on Kafka Streams Processor API (apache#5128) KAFKA-6813: return to double-counting for count topology names (apache#5075) KAFKA-5919; Adding checks on "version" field for tools using it MINOR: Remove deprecated KafkaStreams constructors in docs (apache#5118)

…refix * apache-github/trunk: KAFKA-6726: Fine Grained ACL for CreateTopics (KIP-277) (apache#4795) KAFKA-5588: Remove deprecated --new-consumer tools option (apache#5097) MINOR: Fix for the location of the trogdor.sh executable file in the documentation. (apache#5040) KAFKA-6997: Exclude test-sources.jar when $INCLUDE_TEST_JARS is FALSE MINOR: docs should point to latest version (apache#5132) KAFKA-6981: Move the error handling configuration properties into the ConnectorConfig and SinkConnectorConfig classes (KIP-298) [KAFKA-6730] Simplify State Store Recovery (apache#5013) MINOR: Rename package `internal` to `internals` for consistency (apache#5137) KAFKA-6704: InvalidStateStoreException from IQ when StreamThread closes store (apache#4801) MINOR: Add missing configs for resilience settings MINOR: Add regression tests for KTable mapValues and filter (apache#5134) KAFKA-6750: Add listener name to authentication context (KIP-282) (apache#4829) KAFKA-3665: Enable TLS hostname verification by default (KIP-294) (apache#4956) KAFKA-6938: Add documentation for accessing Headers on Kafka Streams Processor API (apache#5128) KAFKA-6813: return to double-counting for count topology names (apache#5075) KAFKA-5919; Adding checks on "version" field for tools using it MINOR: Remove deprecated KafkaStreams constructors in docs (apache#5118)

…e#5075) apache#4919 unintentionally changed the topology naming scheme. This change returns to the prior scheme. Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

mjsax added the streams label May 25, 2018

mjsax reviewed May 25, 2018

View reviewed changes

vvcephei changed the title ~~KAFKA-6813: return to double-counting for count topo names~~ KAFKA-6813: return to double-counting for count topology names May 25, 2018

bbejeck approved these changes May 25, 2018

View reviewed changes

vvcephei commented May 31, 2018

View reviewed changes

bbejeck reviewed May 31, 2018

View reviewed changes

guozhangwang reviewed May 31, 2018

View reviewed changes

guozhangwang reviewed Jun 1, 2018

View reviewed changes

Comment thread streams/src/main/java/org/apache/kafka/streams/StreamsBuilder.java Outdated

Copy link
Copy Markdown

Contributor

guozhangwang Jun 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent

vvcephei added 2 commits June 4, 2018 12:31

return to double-counting for count topo names

d2f735c

move name generation from MI constructor to method

2851a05

* remove MaterializePeek now that MI constructor doesn't mutate storeName

vvcephei mentioned this pull request Jun 4, 2018

MINOR: add regression tests for KTable mapValues and filter #5134

Merged

3 tasks

guozhangwang merged commit 6f9f365 into apache:trunk Jun 4, 2018

vvcephei deleted the fix-topology-numbering branch June 5, 2018 15:38

rhauch mentioned this pull request Jul 12, 2019

KAFKA-6605 fix NPE in Flatten when optional Struct is null - backport… #5706

Closed

3 tasks

Conversation

vvcephei commented May 24, 2018

Committer Checklist (excluded from commit message)

Uh oh!

vvcephei commented May 25, 2018

Uh oh!

vvcephei commented May 25, 2018

Uh oh!

mjsax left a comment

Choose a reason for hiding this comment

Uh oh!

bbejeck left a comment

Choose a reason for hiding this comment

Uh oh!

vvcephei commented May 26, 2018

Uh oh!

guozhangwang commented May 29, 2018

Uh oh!

vvcephei commented May 30, 2018

Uh oh!

vvcephei commented May 31, 2018

Uh oh!

vvcephei commented May 31, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bbejeck left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guozhangwang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvcephei commented Jun 1, 2018

Uh oh!

guozhangwang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvcephei commented Jun 4, 2018

Uh oh!

vvcephei commented Jun 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vvcephei commented Jun 4, 2018

Uh oh!

guozhangwang commented Jun 4, 2018

Uh oh!

Reviewers

Assignees

Labels

guozhangwang left a comment •

edited

Loading

vvcephei commented Jun 4, 2018 •

edited

Loading