Skip to content

MINOR: log warning when topology override for cache size is non-zero#11959

Merged
ableegoldman merged 8 commits intoapache:trunkfrom
ableegoldman:MINOR-log-warning-if-nonzero-override-for-cache-size-config
Mar 30, 2022
Merged

MINOR: log warning when topology override for cache size is non-zero#11959
ableegoldman merged 8 commits intoapache:trunkfrom
ableegoldman:MINOR-log-warning-if-nonzero-override-for-cache-size-config

Conversation

@ableegoldman
Copy link
Copy Markdown
Member

Since the topology-level cache size config only controls whether we disable the caching layer entirely for that topology, setting it to anything other than 0 has no effect. The actual cache memory is still just split evenly between the threads, and shared by all topologies.

It's possible we'll want to change this in the future, but for now we should make sure to log a warning so that users who do try to set this override to some nonzero value are made aware that it doesn't work like this.

Also includes some minor refactoring plus an off-by-one fix from #11796

@ableegoldman
Copy link
Copy Markdown
Member Author

Hey @vamossagar12 can you take a look at this? Also cc @guozhangwang @wcarlson5

@vamossagar12
Copy link
Copy Markdown
Contributor

@ableegoldman , I did a quick pass.. Looks good to me..

Copy link
Copy Markdown
Member

@showuon showuon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Left some minor comments. Thank you.

Comment thread streams/src/main/java/org/apache/kafka/streams/TopologyConfig.java
Comment on lines -1050 to +1053
resizeThreadCacheAndBufferMemory(numLiveThreads + 1);
resizeThreadCacheAndBufferMemory(numLiveThreads);
log.info("Adding StreamThread-{}, there are now {} threads with cache size/max buffer size values as {} per thread.",
threadIdx, numLiveThreads + 1, getThreadCacheAndBufferMemoryString());
threadIdx, numLiveThreads, getThreadCacheAndBufferMemoryString());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add tests for it, to make sure after adding a thread, the cache size and buffer memory is set as what we expected.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That test is to test thread removal, right? We should have the similar tests for thread addition. Otherwise, we should be able to catch this issue via these tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.. yeah there are 2 more tests for threadReplacement which seem to be printing out an extra thread count. But I agree, a test for thread addition would be good..

CACHE_MAX_BYTES_BUFFERING_CONFIG);
} else if (isTopologyOverride(STATESTORE_CACHE_MAX_BYTES_CONFIG, topologyOverrides)) {
cacheSize = getLong(STATESTORE_CACHE_MAX_BYTES_CONFIG);
final boolean stateStoreCacheMaxBytesOverridden = isTopologyOverride(STATESTORE_CACHE_MAX_BYTES_CONFIG, topologyOverrides);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ableegoldman , another thing is my initial logic was similar to what you have below (without the boolean variables stateStoreCacheMaxBytesOverridden and cacheMaxBytesBufferingOverridden but I had to change it to the above one as I was getting =>

[2022-03-29T06:53:02.757Z] [ant:checkstyle] [ERROR] /home/jenkins/jenkins-agent/workspace/Kafka_kafka-pr_PR-11959/streams/src/main/java/org/apache/kafka/streams/TopologyConfig.java:127:5: NPath Complexity is 960 (max allowed is 500). [NPathComplexity]

Looks like this PR suffers from the same problem.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah :/

I think this is a case where it's reasonable to suppress this checkstyle exception, as the NPath complexity is sort of "artificially" complex, ie the code is easy to follow it just has many branches because it goes through each config

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.. yeah we can suppress it but after reading the error, I also felt it has too many branches. Anyways, it's fine I guess.

@ableegoldman
Copy link
Copy Markdown
Member Author

@showuon addressed your comment and added test coverage for the cache & buffer size after adding a thread, plz give a +1 if all looks good

Copy link
Copy Markdown
Contributor

@wcarlson5 wcarlson5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have anything else to add tbh

Copy link
Copy Markdown
Member

@showuon showuon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update and the tests. LGTM!


for (final String log : appender.getMessages()) {
// after we replace the thread there should be two remaining threads with 5 bytes each
if (log.endsWith("Adding StreamThread-3, there are now 3 threads with cache size/max buffer size values as 3/178956970 per thread.")) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently there was already a test for the cache being sized correctly after a thread replacement/addition, I'm guessing this was updated with the incorrect values in the PR that introduced the off by one bug -- probably should have been a red flag if the value of the cache size changed here, not to mention the comment above which explicitly mentions it should be 5 bytes per thread 🙂

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the comment is off.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No actually I meant that the comment was correct -- the test was just verifying incorrect results (after the thread replacement there should be 2 threads with 5MB of cache, as it says). But no worries

}

@Test
public void shouldResizeMaxBufferAfterThreadReplacement() throws InterruptedException {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems to be identical to the one above except that it uses the default cache size and a custom input buffer size -- however both tests are still validating both sizes in the log message so this doesn't need to be a separate test, we can just cover both in the original test.

(The total time to run streams tests has been getting a bit out of hand lately, so we should have a concrete reason to split something out into a second test like this, especially the more "heavy" integration tests)

@ableegoldman
Copy link
Copy Markdown
Member Author

All test failures are in Connect, so unrelated. Going to merge

@ableegoldman ableegoldman merged commit 1317f3f into apache:trunk Mar 30, 2022
@ableegoldman
Copy link
Copy Markdown
Member Author

Merged to trunk

@showuon
Copy link
Copy Markdown
Member

showuon commented Mar 31, 2022

Thanks for fixing the tests, @ableegoldman !

mjsax added a commit to mjsax/kafka that referenced this pull request Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants