Skip to content

KAFKA-6813: return to double-counting for count topology names#5075

Merged
guozhangwang merged 2 commits intoapache:trunkfrom
vvcephei:fix-topology-numbering
Jun 4, 2018
Merged

KAFKA-6813: return to double-counting for count topology names#5075
guozhangwang merged 2 commits intoapache:trunkfrom
vvcephei:fix-topology-numbering

Conversation

@vvcephei
Copy link
Copy Markdown
Contributor

#4919 unintentionally changed the topology naming scheme. This change returns to the prior scheme.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@vvcephei
Copy link
Copy Markdown
Contributor Author

retest this, please (I missed the test results).

@vvcephei
Copy link
Copy Markdown
Contributor Author

@guozhangwang @bbejeck @mjsax , Please review this.

@mjsax mjsax added the streams label May 25, 2018
Copy link
Copy Markdown
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me.

We should put some tests in place though to avoid a similar regression in the future. There are existing test that should have caught this -- guess we need to extend those tests.

@vvcephei vvcephei changed the title KAFKA-6813: return to double-counting for count topo names KAFKA-6813: return to double-counting for count topology names May 25, 2018
Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@vvcephei
Copy link
Copy Markdown
Contributor Author

Hm. The test failure is correlated. I'll have to take a look on Tuesday.

@guozhangwang
Copy link
Copy Markdown
Contributor

I think in #4919 we've changed a few other places that needs to be considered in this PR, namely:

  1. In KTable.filter we will create an internal materialized store even if Materialized is not passed in, hence "burning" one number for that store;

@vvcephei
Copy link
Copy Markdown
Contributor Author

retest this, please.

1 similar comment
@vvcephei
Copy link
Copy Markdown
Contributor Author

retest this, please.

@vvcephei
Copy link
Copy Markdown
Contributor Author

Java8 passed, and Java10 failed some broker test.

@guozhangwang @bbejeck @mjsax ,

I've been over the code changes several times, and I think this is the only alteration we introduced before (45% confidence). Please let me know if you have some other scenarios you want me to test.

Otherwise, I think this is ready for reviews.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hideous, but it's the best I could think of to get a look at the name in Materialized (which is protected) without auto-setting it (which happens in MaterializedInternal's constructor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I did have another thought, which was that MaterializedInternal could avoid setting the name in the constructor and instead provide a method to do it. We could pursue a strategy where all the names are assigned in the DSL classes, which would make it much easier to see when they are getting assigned.

But this seemed like a risky change, so I opted for the less invasive choice.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with you that it would be a substantial refactoring. IMHO better to start with a smaller change and we can re-visit the name generation down the road.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar idea but not introducing a new class: we can add a new constructor of MaterializedInternal that only takes one parameter Materialized and do not try to create internal names at all, then after the object is created with this constructor, its storeName() may return null.

I think it is similar to what you have here but with fewer LOCs :P Besides, we should add comment on that constructor that it will be removed once we do not do this silly thing any more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we go down that road, we may aw well complete the journey to my alternative proposal to add a method to MaterializedInternal to explicitly set the store name if there isn't one and remove that logic from the constructor. This can be implemented safely by adding the new constructor you proposed, and mechanically in lining the old one.

Basically, every call to new MaterializedInternal... would become:

mi = new MaterializedInternal(...);
mi.generateStoreNameIfNotSet();

Then we could judiciously remove the second line in cases like this one where we do not want it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvcephei sounds good.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bunch of incidental cleanup in here, just resolving IDEA warnings (many are related to moving to java 8).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required at the Java 8 level.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required at the Java 8 level.

Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have a couple of general comments, but this looks good to me. Nice job on the added test cases.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is repeated in a couple of places.
I'm wondering if we could change MaterializedPeek to take the InternalSteamsBuilder as an additional constructor param and have the logic inside the class, and this block of code could be replaced with new MaterializedPeek<>(materialized, builder).maybeIncrementTopologyCount() or something like that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get where you're coming from, but I think the duplication is justified in this case. Aside from the three examples in this diff, we interact with Materialized and MaterializedInteral in dozens of places. What we're doing here is very specific to a regression in these three methods.

I think that if we were to separate the regression patch from these locations, we risk having it misused elsewhere. If we did consolidate it, I would want to name it something like new CountAggregationRegression4919Patch(materialized, builder).maybeIncrementTopologyCount() to be sure it's used properly.

My experience is that deduplicating code often results in a net increase in complexity and is therefore not without cost. I'm not saying we should have copy-pasted code all over the place, just that we shouldn't universally apply a deduplication strategy. It seems like this is an example where it's better to just have the logic applied where it's needed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with you that it would be a substantial refactoring. IMHO better to start with a smaller change and we can re-visit the name generation down the road.

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nit comments, otherwise LGTM.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar idea but not introducing a new class: we can add a new constructor of MaterializedInternal that only takes one parameter Materialized and do not try to create internal names at all, then after the object is created with this constructor, its storeName() may return null.

I think it is similar to what you have here but with fewer LOCs :P Besides, we should add comment on that constructor that it will be removed once we do not do this silly thing any more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

@vvcephei
Copy link
Copy Markdown
Contributor Author

vvcephei commented Jun 1, 2018

@guozhangwang @bbejeck @mjsax ,

Following the CR discussion, I have replaced the former MaterializedInternal constructor with one that does not mutate the storeName. For correctness, the following condition should hold:

  • in every place that previously called new MaterializedInternal(...), we will now see the following two lines: new MaterializedInternal(...); materializedInternal.generateStoreNameIfNeeded()

I have verified for myself that this is true, but please take a look on your own.

There are some places where we redundantly instantiate MaterializedInternal with a store name, and then immediately call generate-name-if-needed, but I intentionally left it so that we could more easily certify that this change is correct.

While making this change, I noticed that there is a pre-existing pattern for it, since several locations effectively set the keySerde or valueSerde, likewise "if needed".

This change allows us to get rid of MaterializedPeek, since we can equivalently instantiate a MaterializedInternal now. These three locations are the only time we instantiate a MI without then calling generate-store-name.

One final note: the constructor previously set queryable such that there was an invariant isQueryable() == true iff storeName() != null. Since we no longer set the storeName in the constructor, maintaining the internal field was less convenient, so I instead opted for an equivalent method body for isQueryable().

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit comment, otherwise lgtm.

Will merge the PR once it is addressed and jenkins passed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent

vvcephei added 2 commits June 4, 2018 12:31
* remove MaterializePeek now that MI constructor doesn't mutate storeName
@vvcephei
Copy link
Copy Markdown
Contributor Author

vvcephei commented Jun 4, 2018

Turns out I misread the code around isQueryable. It's a good thing we have tests! I've updated the PR. Watching to see if/when the tests pass...

@vvcephei
Copy link
Copy Markdown
Contributor Author

vvcephei commented Jun 4, 2018

Java 8 passed.

Java 10 failure was unrelated:

java.lang.AssertionError: Expected an exception of type org.apache.kafka.common.errors.TimeoutException; got type org.apache.kafka.common.errors.SslAuthenticationException
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.assertTrue(Assert.java:41)
	at kafka.api.AdminClientIntegrationTest.assertFutureExceptionTypeEquals(AdminClientIntegrationTest.scala:136)
	at kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1001)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.lang.Thread.run(Thread.java:844)

@vvcephei
Copy link
Copy Markdown
Contributor Author

vvcephei commented Jun 4, 2018

@guozhangwang , I think this is ready to merge, but I can run the tests again, if you want to see both builds pass.

@guozhangwang
Copy link
Copy Markdown
Contributor

Thanks @vvcephei , merged to trunk.

@vvcephei vvcephei deleted the fix-topology-numbering branch June 5, 2018 15:38
ijuma added a commit to edoardocomar/kafka that referenced this pull request Jun 6, 2018
…grained-acl-create-topics

* apache-github/trunk:
  KAFKA-5588: Remove deprecated --new-consumer tools option (apache#5097)
  MINOR: Fix for the location of the trogdor.sh executable file in the documentation. (apache#5040)
  KAFKA-6997: Exclude test-sources.jar when $INCLUDE_TEST_JARS is FALSE
  MINOR: docs should point to latest version (apache#5132)
  KAFKA-6981: Move the error handling configuration properties into the ConnectorConfig and SinkConnectorConfig classes (KIP-298)
  [KAFKA-6730] Simplify State Store Recovery (apache#5013)
  MINOR: Rename package `internal` to `internals` for consistency (apache#5137)
  KAFKA-6704: InvalidStateStoreException from IQ when StreamThread closes store (apache#4801)
  MINOR: Add missing configs for resilience settings
  MINOR: Add regression tests for KTable mapValues and filter (apache#5134)
  KAFKA-6750: Add listener name to authentication context (KIP-282) (apache#4829)
  KAFKA-3665: Enable TLS hostname verification by default (KIP-294) (apache#4956)
  KAFKA-6938: Add documentation for accessing Headers on Kafka Streams Processor API (apache#5128)
  KAFKA-6813: return to double-counting for count topology names (apache#5075)
  KAFKA-5919; Adding checks on "version" field for tools using it
  MINOR: Remove deprecated KafkaStreams constructors in docs (apache#5118)
ijuma added a commit to big-andy-coates/kafka that referenced this pull request Jun 6, 2018
…refix

* apache-github/trunk:
  KAFKA-6726: Fine Grained ACL for CreateTopics (KIP-277) (apache#4795)
  KAFKA-5588: Remove deprecated --new-consumer tools option (apache#5097)
  MINOR: Fix for the location of the trogdor.sh executable file in the documentation. (apache#5040)
  KAFKA-6997: Exclude test-sources.jar when $INCLUDE_TEST_JARS is FALSE
  MINOR: docs should point to latest version (apache#5132)
  KAFKA-6981: Move the error handling configuration properties into the ConnectorConfig and SinkConnectorConfig classes (KIP-298)
  [KAFKA-6730] Simplify State Store Recovery (apache#5013)
  MINOR: Rename package `internal` to `internals` for consistency (apache#5137)
  KAFKA-6704: InvalidStateStoreException from IQ when StreamThread closes store (apache#4801)
  MINOR: Add missing configs for resilience settings
  MINOR: Add regression tests for KTable mapValues and filter (apache#5134)
  KAFKA-6750: Add listener name to authentication context (KIP-282) (apache#4829)
  KAFKA-3665: Enable TLS hostname verification by default (KIP-294) (apache#4956)
  KAFKA-6938: Add documentation for accessing Headers on Kafka Streams Processor API (apache#5128)
  KAFKA-6813: return to double-counting for count topology names (apache#5075)
  KAFKA-5919; Adding checks on "version" field for tools using it
  MINOR: Remove deprecated KafkaStreams constructors in docs (apache#5118)
ying-zheng pushed a commit to ying-zheng/kafka that referenced this pull request Jul 6, 2018
…e#5075)

apache#4919 unintentionally changed the topology naming scheme. This change returns to the prior scheme.

Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants