KAFKA-10754: fix flaky tests by waiting kafka streams be in running state before assert#9629
Conversation
There was a problem hiding this comment.
forgot to put the kafkaStreams1 into the try resource block here
There was a problem hiding this comment.
The start is async, and we didn't wait for it.
|
@ableegoldman @wcarlson5 , could you help review this PR? Thanks. |
There was a problem hiding this comment.
We should throw a specific kind of exception, not an Exception.
…tate before assert
| waitForApplicationState(Collections.singletonList(kafkaStreams), KafkaStreams.State.ERROR, Duration.ofSeconds(15)); | ||
|
|
||
| TestUtils.waitForCondition(flag::get, "Handler was called"); | ||
| waitForApplicationState(Collections.singletonList(kafkaStreams), KafkaStreams.State.ERROR, DEFAULT_DURATION); |
There was a problem hiding this comment.
We should wait for the uncaughtExceptionHandler got called before waiting for the streams state change.
There was a problem hiding this comment.
The order is not really that important here, either way works
|
Hey @showuon thanks for the quick fix! I notice that |
|
@ableegoldman , thanks for pointing it out. After investigation, I found the test I default set to 1 stream thread in this test since other tests will set to the expected thread number before testing. The fix is in this commit: e6d39f6. Thank you. |
|
All tests passed, Yeah! |
|
|
||
| /** | ||
| * Set the handler invoked when an {@link StreamsConfig#NUM_STREAM_THREADS_CONFIG internal thread} | ||
| * Set the handler invoked when an {@link StreamsConfig#NUM_STREAM_THREADS_CONFIG} internal thread |
There was a problem hiding this comment.
I think this was actually correct as it was (and ditto for the above). One alternative suggestion:
| * Set the handler invoked when an {@link StreamsConfig#NUM_STREAM_THREADS_CONFIG} internal thread | |
| * Set the handler invoked when an internal {@link StreamsConfig#NUM_STREAM_THREADS_CONFIG stream thread} |
| mkEntry(StreamsConfig.APPLICATION_ID_CONFIG, appId), | ||
| mkEntry(StreamsConfig.STATE_DIR_CONFIG, TestUtils.tempDirectory().getPath()), | ||
| mkEntry(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 2), | ||
| mkEntry(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 1), |
There was a problem hiding this comment.
Hey @wcarlson5 , can you take a look at this? If we change the default number of threads to 1 will we be reducing test coverage or not testing the correct thing anymore?
FWIW I think for tests where the number of threads doesn't matter, we should default to 1. But I'm not sure which tests do/do not rely on using multiple stream threads
There was a problem hiding this comment.
Yes, Both the old handler test and the close client should have 2 threads. We need to ensure that after a rebalance the old handler has attempted the process the record twice and the client shutdown only once. We can not be sure of that with only one thread.
| waitForApplicationState(Collections.singletonList(kafkaStreams), KafkaStreams.State.ERROR, Duration.ofSeconds(15)); | ||
|
|
||
| TestUtils.waitForCondition(flag::get, "Handler was called"); | ||
| assertThat(processorValueCollector.size(), equalTo(2)); |
There was a problem hiding this comment.
@wcarlson5 for example, this test probably should have multiple threads, right?
wcarlson5
left a comment
There was a problem hiding this comment.
@showuon Mostly these are good changes, though @ableegoldman is right changing the default to one thread will reduce our coverage. The only reason one shutdown application uses one thread is to ensure the shutdown signal still gets propagated when the last thread is dying.
| mkEntry(StreamsConfig.APPLICATION_ID_CONFIG, appId), | ||
| mkEntry(StreamsConfig.STATE_DIR_CONFIG, TestUtils.tempDirectory().getPath()), | ||
| mkEntry(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 2), | ||
| mkEntry(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 1), |
There was a problem hiding this comment.
Yes, Both the old handler test and the close client should have 2 threads. We need to ensure that after a rebalance the old handler has attempted the process the record twice and the client shutdown only once. We can not be sure of that with only one thread.
| waitForApplicationState(Collections.singletonList(kafkaStreams), KafkaStreams.State.ERROR, Duration.ofSeconds(15)); | ||
|
|
||
| TestUtils.waitForCondition(flag::get, "Handler was called"); | ||
| assertThat(processorValueCollector.size(), equalTo(2)); |
| waitForApplicationState(Collections.singletonList(kafkaStreams), KafkaStreams.State.ERROR, Duration.ofSeconds(15)); | ||
|
|
||
| TestUtils.waitForCondition(flag::get, "Handler was called"); | ||
| waitForApplicationState(Collections.singletonList(kafkaStreams), KafkaStreams.State.ERROR, DEFAULT_DURATION); |
There was a problem hiding this comment.
The order is not really that important here, either way works
…he test more reliable
|
@ableegoldman @wcarlson5 , thanks for the comments. Now I know why it sets the default treads to 2. So, to make the test more reliable, I'll do:
This should make this test more reliable. What do you think? |
|
These changes LGTM. WDYT @ableegoldman? thanks for the PR! |
ableegoldman
left a comment
There was a problem hiding this comment.
Thanks for the updates, this LGTM. Looks like all tests passed this time around
|
Merged to trunk |
…t-for-generated-requests * apache-github/trunk: (405 commits) KAFKA-6687: restrict DSL to allow only Streams from the same source topics (apache#9609) MINOR: Small cleanups in `AlterIsr` handling logic (apache#9663) MINOR: Increase unit test coverage of method ProcessorTopology#updateSourceTopics() (apache#9654) MINOR: fix reading SSH output in Streams system tests (apache#9665) KAFKA-10770: Remove duplicate defination of Metrics#getTags (apache#9659) KAFKA-10722: Described the types of the used state stores (apache#9607) KAFKA-10702; Skip bookkeeping of empty transactions (apache#9632) MINOR: Remove erroneous extra <code> in design doc (apache#9657) KAFKA-10736 Convert transaction coordinator metadata schemas to use g… (apache#9611) MINOR: Update vagrant/tests readme (apache#9650) KAFKA-10720: Document prohibition on header mutation by SMTs (apache#9597) KAFKA-10713: Stricter protocol parsing in hostnames (apache#9593) KAFKA-10565: Only print console producer prompt with a tty (apache#9644) MINOR: fix listeners doc to close <code> properly (apache#9655) MINOR: Remove unnecessary statement from WorkerConnector#doRun (apache#9653) KAFKA-10758: ProcessorTopology should only consider its own nodes when updating regex source topics (apache#9648) KAFKA-10754: fix flaky tests by waiting kafka streams be in running state before assert (apache#9629) MINOR: Upgrade to Scala 2.13.4 (apache#9643) MINOR: Update build and test dependencies (apache#9645) MINOR: Remove redundant argument from TaskMetricsGroup#recordCommit (apache#9642) ... clients/src/main/java/org/apache/kafka/clients/ClientRequest.java clients/src/main/java/org/apache/kafka/clients/NetworkClient.java clients/src/main/java/org/apache/kafka/common/requests/AbstractRequest.java clients/src/main/java/org/apache/kafka/common/requests/AbstractRequestResponse.java clients/src/main/java/org/apache/kafka/common/requests/AbstractResponse.java clients/src/main/java/org/apache/kafka/common/requests/AlterConfigsResponse.java clients/src/main/java/org/apache/kafka/common/requests/ApiVersionsResponse.java clients/src/main/java/org/apache/kafka/common/requests/FetchRequest.java clients/src/main/java/org/apache/kafka/common/requests/FetchResponse.java clients/src/main/java/org/apache/kafka/common/requests/LeaderAndIsrRequest.java clients/src/main/java/org/apache/kafka/common/requests/ListOffsetRequest.java clients/src/main/java/org/apache/kafka/common/requests/ListOffsetResponse.java clients/src/main/java/org/apache/kafka/common/requests/OffsetsForLeaderEpochRequest.java clients/src/main/java/org/apache/kafka/common/requests/OffsetsForLeaderEpochResponse.java clients/src/main/java/org/apache/kafka/common/requests/ProduceRequest.java clients/src/main/java/org/apache/kafka/common/requests/ProduceResponse.java clients/src/main/java/org/apache/kafka/common/requests/RequestHeader.java clients/src/main/java/org/apache/kafka/common/requests/StopReplicaRequest.java clients/src/main/java/org/apache/kafka/common/requests/UpdateMetadataRequest.java clients/src/test/java/org/apache/kafka/clients/NetworkClientTest.java clients/src/test/java/org/apache/kafka/clients/consumer/internals/FetcherTest.java clients/src/test/java/org/apache/kafka/clients/producer/internals/SenderTest.java clients/src/test/java/org/apache/kafka/common/requests/ProduceResponseTest.java clients/src/test/java/org/apache/kafka/common/requests/RequestContextTest.java clients/src/test/java/org/apache/kafka/common/requests/RequestHeaderTest.java clients/src/test/java/org/apache/kafka/common/security/authenticator/SaslAuthenticatorTest.java
The flaky test is because we didn't wait for the streams become RUNNING before verifying the state becoming ERROR state. This fix explicitly wait for the streams become RUNNING state. Also, I found we didn't put the 2nd stream into try resource block, so it won't close the stream after tests. In addition, I also fix some logic errors in the tests.
Committer Checklist (excluded from commit message)