KAFKA-6813: return to double-counting for count topology names#5075
KAFKA-6813: return to double-counting for count topology names#5075guozhangwang merged 2 commits intoapache:trunkfrom vvcephei:fix-topology-numbering
Conversation
|
retest this, please (I missed the test results). |
|
@guozhangwang @bbejeck @mjsax , Please review this. |
mjsax
left a comment
There was a problem hiding this comment.
The change looks good to me.
We should put some tests in place though to avoid a similar regression in the future. There are existing test that should have caught this -- guess we need to extend those tests.
|
Hm. The test failure is correlated. I'll have to take a look on Tuesday. |
|
I think in #4919 we've changed a few other places that needs to be considered in this PR, namely:
|
|
retest this, please. |
1 similar comment
|
retest this, please. |
|
Java8 passed, and Java10 failed some broker test. @guozhangwang @bbejeck @mjsax , I've been over the code changes several times, and I think this is the only alteration we introduced before (45% confidence). Please let me know if you have some other scenarios you want me to test. Otherwise, I think this is ready for reviews. |
There was a problem hiding this comment.
This is hideous, but it's the best I could think of to get a look at the name in Materialized (which is protected) without auto-setting it (which happens in MaterializedInternal's constructor.
There was a problem hiding this comment.
Actually, I did have another thought, which was that MaterializedInternal could avoid setting the name in the constructor and instead provide a method to do it. We could pursue a strategy where all the names are assigned in the DSL classes, which would make it much easier to see when they are getting assigned.
But this seemed like a risky change, so I opted for the less invasive choice.
There was a problem hiding this comment.
Yeah, I agree with you that it would be a substantial refactoring. IMHO better to start with a smaller change and we can re-visit the name generation down the road.
There was a problem hiding this comment.
A similar idea but not introducing a new class: we can add a new constructor of MaterializedInternal that only takes one parameter Materialized and do not try to create internal names at all, then after the object is created with this constructor, its storeName() may return null.
I think it is similar to what you have here but with fewer LOCs :P Besides, we should add comment on that constructor that it will be removed once we do not do this silly thing any more.
There was a problem hiding this comment.
I think if we go down that road, we may aw well complete the journey to my alternative proposal to add a method to MaterializedInternal to explicitly set the store name if there isn't one and remove that logic from the constructor. This can be implemented safely by adding the new constructor you proposed, and mechanically in lining the old one.
Basically, every call to new MaterializedInternal... would become:
mi = new MaterializedInternal(...);
mi.generateStoreNameIfNotSet();
Then we could judiciously remove the second line in cases like this one where we do not want it.
There was a problem hiding this comment.
There's a bunch of incidental cleanup in here, just resolving IDEA warnings (many are related to moving to java 8).
There was a problem hiding this comment.
This is not required at the Java 8 level.
There was a problem hiding this comment.
This is not required at the Java 8 level.
bbejeck
left a comment
There was a problem hiding this comment.
Just have a couple of general comments, but this looks good to me. Nice job on the added test cases.
There was a problem hiding this comment.
This logic is repeated in a couple of places.
I'm wondering if we could change MaterializedPeek to take the InternalSteamsBuilder as an additional constructor param and have the logic inside the class, and this block of code could be replaced with new MaterializedPeek<>(materialized, builder).maybeIncrementTopologyCount() or something like that.
There was a problem hiding this comment.
I get where you're coming from, but I think the duplication is justified in this case. Aside from the three examples in this diff, we interact with Materialized and MaterializedInteral in dozens of places. What we're doing here is very specific to a regression in these three methods.
I think that if we were to separate the regression patch from these locations, we risk having it misused elsewhere. If we did consolidate it, I would want to name it something like new CountAggregationRegression4919Patch(materialized, builder).maybeIncrementTopologyCount() to be sure it's used properly.
My experience is that deduplicating code often results in a net increase in complexity and is therefore not without cost. I'm not saying we should have copy-pasted code all over the place, just that we shouldn't universally apply a deduplication strategy. It seems like this is an example where it's better to just have the logic applied where it's needed.
There was a problem hiding this comment.
Yeah, I agree with you that it would be a substantial refactoring. IMHO better to start with a smaller change and we can re-visit the name generation down the road.
There was a problem hiding this comment.
A similar idea but not introducing a new class: we can add a new constructor of MaterializedInternal that only takes one parameter Materialized and do not try to create internal names at all, then after the object is created with this constructor, its storeName() may return null.
I think it is similar to what you have here but with fewer LOCs :P Besides, we should add comment on that constructor that it will be removed once we do not do this silly thing any more.
|
@guozhangwang @bbejeck @mjsax , Following the CR discussion, I have replaced the former MaterializedInternal constructor with one that does not mutate the storeName. For correctness, the following condition should hold:
I have verified for myself that this is true, but please take a look on your own. There are some places where we redundantly instantiate MaterializedInternal with a store name, and then immediately call generate-name-if-needed, but I intentionally left it so that we could more easily certify that this change is correct. While making this change, I noticed that there is a pre-existing pattern for it, since several locations effectively set the This change allows us to get rid of MaterializedPeek, since we can equivalently instantiate a MaterializedInternal now. These three locations are the only time we instantiate a MI without then calling generate-store-name. One final note: the constructor previously set |
guozhangwang
left a comment
There was a problem hiding this comment.
One nit comment, otherwise lgtm.
Will merge the PR once it is addressed and jenkins passed.
* remove MaterializePeek now that MI constructor doesn't mutate storeName
|
Turns out I misread the code around |
|
Java 8 passed. Java 10 failure was unrelated: |
|
@guozhangwang , I think this is ready to merge, but I can run the tests again, if you want to see both builds pass. |
|
Thanks @vvcephei , merged to trunk. |
…grained-acl-create-topics * apache-github/trunk: KAFKA-5588: Remove deprecated --new-consumer tools option (apache#5097) MINOR: Fix for the location of the trogdor.sh executable file in the documentation. (apache#5040) KAFKA-6997: Exclude test-sources.jar when $INCLUDE_TEST_JARS is FALSE MINOR: docs should point to latest version (apache#5132) KAFKA-6981: Move the error handling configuration properties into the ConnectorConfig and SinkConnectorConfig classes (KIP-298) [KAFKA-6730] Simplify State Store Recovery (apache#5013) MINOR: Rename package `internal` to `internals` for consistency (apache#5137) KAFKA-6704: InvalidStateStoreException from IQ when StreamThread closes store (apache#4801) MINOR: Add missing configs for resilience settings MINOR: Add regression tests for KTable mapValues and filter (apache#5134) KAFKA-6750: Add listener name to authentication context (KIP-282) (apache#4829) KAFKA-3665: Enable TLS hostname verification by default (KIP-294) (apache#4956) KAFKA-6938: Add documentation for accessing Headers on Kafka Streams Processor API (apache#5128) KAFKA-6813: return to double-counting for count topology names (apache#5075) KAFKA-5919; Adding checks on "version" field for tools using it MINOR: Remove deprecated KafkaStreams constructors in docs (apache#5118)
…refix * apache-github/trunk: KAFKA-6726: Fine Grained ACL for CreateTopics (KIP-277) (apache#4795) KAFKA-5588: Remove deprecated --new-consumer tools option (apache#5097) MINOR: Fix for the location of the trogdor.sh executable file in the documentation. (apache#5040) KAFKA-6997: Exclude test-sources.jar when $INCLUDE_TEST_JARS is FALSE MINOR: docs should point to latest version (apache#5132) KAFKA-6981: Move the error handling configuration properties into the ConnectorConfig and SinkConnectorConfig classes (KIP-298) [KAFKA-6730] Simplify State Store Recovery (apache#5013) MINOR: Rename package `internal` to `internals` for consistency (apache#5137) KAFKA-6704: InvalidStateStoreException from IQ when StreamThread closes store (apache#4801) MINOR: Add missing configs for resilience settings MINOR: Add regression tests for KTable mapValues and filter (apache#5134) KAFKA-6750: Add listener name to authentication context (KIP-282) (apache#4829) KAFKA-3665: Enable TLS hostname verification by default (KIP-294) (apache#4956) KAFKA-6938: Add documentation for accessing Headers on Kafka Streams Processor API (apache#5128) KAFKA-6813: return to double-counting for count topology names (apache#5075) KAFKA-5919; Adding checks on "version" field for tools using it MINOR: Remove deprecated KafkaStreams constructors in docs (apache#5118)
…e#5075) apache#4919 unintentionally changed the topology naming scheme. This change returns to the prior scheme. Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
#4919 unintentionally changed the topology naming scheme. This change returns to the prior scheme.
Committer Checklist (excluded from commit message)