KAFKA-12194: use stateListener to catch each state change by showuon · Pull Request #9888 · apache/kafka

showuon · 2021-01-14T09:10:55Z

The tests are flaky because we used the waitForApplicationState to wait for a state. waitForApplicationState is using poll to check the current stream state, which might miss some state changes.
Ex:

state: created -> running
check state: running

state: running -> rebalancing
state: rebalancing -> running
check state: running <-- which is not what we expected (we expected to be rebalancing, but missed)

I use StateListener to keep the new state of each state change. So when we verify a specific state, we can always find it if existed. Also have small refactor.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

showuon · 2021-01-14T09:13:16Z

@wcarlson5 @ableegoldman @cadonna , could you help review this PR? Thanks.

chia7712 · 2021-01-14T09:24:13Z

oh, we are working at same issue again (#9887) :(

cadonna

@showuon Thank you for the PR! This is very important!

This are my comments.

cadonna · 2021-01-14T10:48:33Z

-        try (final KafkaStreams kafkaStreams = new KafkaStreams(builder.build(), properties)) {
-            StreamsTestUtils.startKafkaStreamsAndWaitForRunningState(kafkaStreams);


Why replacing the try-with-resources with an explicit close()? This did not make the test flaky, as far as I can see. Can't we add the Streams state listener as the first statement in the try-with-resources block?

Yes, you're right. I just thought there are some duplicate codes and want to clean them up. But you're right, maybe not better. Fixed.

cadonna · 2021-01-14T11:12:50Z

    @Rule
    public TestName testName = new TestName();

+    private final List<KafkaStreams.State> stateToTransitions = new ArrayList<>();


I think stateHistory or stateTransitionHistory would be a more meaningful name for this variable.

cadonna · 2021-01-14T11:19:32Z

+        waitForStateTransition(KafkaStreams.State.REBALANCING);
+        waitForStateTransition(KafkaStreams.State.RUNNING);


I think it would be better to wait until the Kafka Streams client id in state RUNNING and then verify if the history of the states transitions after adding the stream thread is first REBALANCING and then RUNNING. Currently, the order is not verified as far as I can see.

Good suggestion! Added a hasStateTransition method to verify that. Thanks.

showuon · 2021-01-14T14:48:03Z

@cadonna , thanks for the comments. I've updated in this commit: c7218be. Thank you.

wcarlson5 · 2021-01-14T18:09:19Z

@showuon These changes look good. Thanks for shoring up these tests

cadonna

@showuon Thank you for the updates!

Here my feedback

cadonna · 2021-01-14T19:37:04Z

-            waitForApplicationState(Collections.singletonList(kafkaStreams), KafkaStreams.State.RUNNING, DEFAULT_DURATION);
+
+            waitForStateTransition(KafkaStreams.State.RUNNING);
+            assertTrue(hasStateTransition(KafkaStreams.State.REBALANCING, KafkaStreams.State.RUNNING));


We normally use assertThat() in new and refactored code. Please also change the other occurrences.

Suggested change

assertTrue(hasStateTransition(KafkaStreams.State.REBALANCING, KafkaStreams.State.RUNNING));

assertThat(hasStateTransition(KafkaStreams.State.REBALANCING, KafkaStreams.State.RUNNING), is(true));

OK, Updated. Thanks.

cadonna · 2021-01-14T20:32:41Z

+        // should have at least 2 states in history
+        if (stateTransitionHistory.size() < 2) {
+            return false;
+        }
+
+        for (int i = 0; i < stateTransitionHistory.size() - 1; i++) {
+            if (stateTransitionHistory.get(i).equals(before) && stateTransitionHistory.get(i + 1).equals(after)) {
+                return true;
+            }
+        }
+        return false;


Why do we need a for-loop here? Wouldn't it suffice to verify the last two elements of the history and check if those two elements are a REBALANCING followed by a RUNNING?

I use for loop is because I think there could be cases that there are some other state changes after RUNNING, ex: DEAD. But after your question, I think if that happened, the test should also fail as well. So, check the last 2 elements is good. Updated. Thanks.

cadonna · 2021-01-14T20:53:05Z

+    private void waitForStateTransition(final KafkaStreams.State expected) throws InterruptedException {
+        waitForCondition(
+            () -> !stateTransitionHistory.isEmpty() && stateTransitionHistory.contains(expected),
+            DEFAULT_DURATION.toMillis(),
+            () -> String.format("Client did not change to the %s state in time. Observed new state transitions: %s",
+                expected, stateTransitionHistory)
+        );
+    }


Couldn't we simply wait for the current state to become RUNNING?

Suggested change

private void waitForStateTransition(final KafkaStreams.State expected) throws InterruptedException {

waitForCondition(

() -> !stateTransitionHistory.isEmpty() && stateTransitionHistory.contains(expected),

DEFAULT_DURATION.toMillis(),

() -> String.format("Client did not change to the %s state in time. Observed new state transitions: %s",

expected, stateTransitionHistory)

);

}

private void waitForRunning() throws Exception {

waitForCondition(

() -> kafkaStreams.state() == KafkaStreams.State.RUNNING,

DEFAULT_DURATION.toMillis(),

() -> String.format("Client did not transit to state %s in %d seconds", expected, DEFAULT_DURATION.toMillis() / 1000)

);

}

We can't just check the current state to become RUNNING because after we add/remove threads, the state won't change immediately. That is, if we check if the state is RUNNING after adding/removing threads, the check will pass, but the rebalance is not happening, yet, which will cause the test fail. So I still use stateTransitionHistory to check the state, and also, I checked the last state of the history to see if it is RUNNING. That should be better.

showuon · 2021-01-15T02:35:05Z

Test will fail, will work it later. Don't review yet. Thanks.

showuon · 2021-01-15T07:43:59Z

@cadonna , thanks for the comments. I've updated in this commit: 46898d9. Thanks.

cadonna

@showuon LGTM!

I just left some minor comments!

Call for committer review: @ableegoldman

cadonna · 2021-01-15T09:46:35Z

+        if (historySize >= 2 && stateTransitionHistory.get(historySize - 2).equals(before) &&
+            stateTransitionHistory.get(historySize - 1).equals(after)) {
+            return true;
+        }


nit: just to better visually separate condition from if-block

Suggested change

if (historySize >= 2 && stateTransitionHistory.get(historySize - 2).equals(before) &&

stateTransitionHistory.get(historySize - 1).equals(after)) {

return true;

}

if (historySize >= 2 && stateTransitionHistory.get(historySize - 2).equals(before) &&

stateTransitionHistory.get(historySize - 1).equals(after)) {

return true;

}

cadonna · 2021-01-15T09:47:49Z

    }

+    private void addStreamStateChangeListener(final KafkaStreams kafkaStreams) {
+        // we store each new state in state transition so that we won't miss any state change


Could you please remove this comment? I do not think it is needed. The code is clear enough.

cadonna · 2021-01-15T09:48:55Z

+    // verify if state change from "before" state into "after" state
+    private boolean hasStateTransition(final KafkaStreams.State before, final KafkaStreams.State after) {
+        final int historySize = stateTransitionHistory.size();
+        // should have at least 2 states in history


I think this comment is also not needed. Could you remove it?

cadonna · 2021-01-15T09:57:53Z

+        );
+    }
+
+    // verify if state change from "before" state into "after" state


This comment seems incomplete. But I would also remove it. Sorry that I am a bit picky about inline comments, but inline comment tend to lie after a while when the code they should describe changes but the comments do not. I would rather focus on giving meaning names to variables and methods. For example, I would rename this method to lastStateTransitionFromRebalancingToRunning(), remove the argumetns, and hard code the states.

Classic Bruno, can't sneak any inline comments past him :P

cadonna · 2021-01-15T10:02:53Z

+            waitForRunning();
+            assertThat(hasStateTransition(KafkaStreams.State.REBALANCING, KafkaStreams.State.RUNNING), is(true));


What do you think of combining these two checks to one and call it waitForTransitionFromRebalancingToRunning(). They are always used together.

showuon · 2021-01-18T03:44:37Z

@cadonna @ableegoldman , Thanks for the comments. I've updated in this commit: 4161096. Thanks.

showuon · 2021-01-18T13:36:57Z

All tests passed.

chia7712

@showuon Thanks for this nice fix and refactor. left some minor comments. Please take a look :)

chia7712 · 2021-01-18T13:42:55Z

            oldThreadCount = kafkaStreams.localThreadsMetadata().size();
+            stateTransitionHistory.clear();

+            // remove a thread


unnecessary comment

chia7712 · 2021-01-18T13:44:12Z

-            waitForApplicationState(Collections.singletonList(kafkaStreams), KafkaStreams.State.REBALANCING, DEFAULT_DURATION);
-            waitForApplicationState(Collections.singletonList(kafkaStreams), KafkaStreams.State.RUNNING, DEFAULT_DURATION);
+
+            assertThat(waitForTransitionFromRebalancingToRunning(), is(true));


It seems to me the method waitForTransitionFromRebalancingToRunning can do the assert as well because we always call assertThat(waitForTransitionFromRebalancingToRunning(), is(true) in this test.

Good idea. Updated.

chia7712 · 2021-01-18T13:45:17Z

+
    @After
    public void teardown() throws IOException {
+        stateTransitionHistory.clear();


This is unnecessary as junit always create a new test class for each test case.

Good suggestion!

chia7712 · 2021-01-18T13:46:42Z

+
+            stateTransitionHistory.clear();

+            // add a new thread again


unnecessary comment

…t inside

showuon · 2021-01-19T01:41:21Z

@chia7712 , thanks for the comments. I've updated in this commit: 2c399c4. Thank you.

chia7712 · 2021-01-19T03:58:18Z

@showuon Thanks for updating code. +1 again.

guozhangwang · 2021-01-19T05:32:29Z

LGTM too. @chia7712 please feel free to merge and cherry-pick.

chia7712 · 2021-01-19T05:42:15Z

merge to trunk only as this issue happens only in 2.8.

KAFKA-12194: use stateListener to catch each state change

22a5f2f

chia7712 mentioned this pull request Jan 14, 2021

KAFKA-12195 Fix synchronization issue happening in KafkaStreams #9887

Merged

3 tasks

cadonna reviewed Jan 14, 2021

View reviewed changes

showuon force-pushed the KAFKA-12194 branch from e14f13f to 694d217 Compare January 14, 2021 14:39

KAFKA-12194: address reviewer's comment to refactor the tests

c7218be

showuon force-pushed the KAFKA-12194 branch from 694d217 to c7218be Compare January 14, 2021 14:43

cadonna reviewed Jan 14, 2021

View reviewed changes

chia7712 mentioned this pull request Jan 15, 2021

KAFKA-12203 Migrate connect:mirror-client module to JUnit 5 #9889

Merged

3 tasks

KAFKA-12194: address reviewer's comment to refactor the tests

46898d9

showuon force-pushed the KAFKA-12194 branch from 5b2c2e2 to 46898d9 Compare January 15, 2021 07:19

cadonna approved these changes Jan 15, 2021

View reviewed changes

KAFKA-12194: address reviewe'r comments to refactor codes

4161096

showuon mentioned this pull request Jan 18, 2021

KAFKA-12211: don't change perm for base/state dir when no persistent store #9904

Merged

3 tasks

chia7712 mentioned this pull request Jan 18, 2021

KAFKA-7341 Migrate core module to JUnit 5 #9855

Merged

3 tasks

showuon mentioned this pull request Jan 18, 2021

KAFKA-8460: produce records with current timestamp #9877

Merged

3 tasks

chia7712 approved these changes Jan 18, 2021

View reviewed changes

KAFKA-12194: make the waitForTransitionFromRebalancingToRunning asser…

7f4236f

…t inside

showuon force-pushed the KAFKA-12194 branch from 2c399c4 to 7f4236f Compare January 19, 2021 01:40

chia7712 merged commit 277c437 into apache:trunk Jan 19, 2021

		try (final KafkaStreams kafkaStreams = new KafkaStreams(builder.build(), properties)) {
		StreamsTestUtils.startKafkaStreamsAndWaitForRunningState(kafkaStreams);

		waitForStateTransition(KafkaStreams.State.REBALANCING);
		waitForStateTransition(KafkaStreams.State.RUNNING);

	assertTrue(hasStateTransition(KafkaStreams.State.REBALANCING, KafkaStreams.State.RUNNING));
	assertThat(hasStateTransition(KafkaStreams.State.REBALANCING, KafkaStreams.State.RUNNING), is(true));

		waitForRunning();
		assertThat(hasStateTransition(KafkaStreams.State.REBALANCING, KafkaStreams.State.RUNNING), is(true));

Conversation

showuon commented Jan 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

showuon commented Jan 14, 2021

Uh oh!

chia7712 commented Jan 14, 2021

Uh oh!

cadonna left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

showuon commented Jan 14, 2021

Uh oh!

wcarlson5 commented Jan 14, 2021

Uh oh!

cadonna left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

showuon Jan 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

showuon commented Jan 15, 2021

Uh oh!

showuon commented Jan 15, 2021

Uh oh!

cadonna left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

showuon commented Jan 18, 2021

Uh oh!

showuon commented Jan 18, 2021

Uh oh!

chia7712 left a comment

Choose a reason for hiding this comment

showuon commented Jan 14, 2021 •

edited

Loading

showuon Jan 15, 2021 •

edited

Loading