Skip to content

KAFKA-12194: use stateListener to catch each state change#9888

Merged
chia7712 merged 5 commits intoapache:trunkfrom
showuon:KAFKA-12194
Jan 19, 2021
Merged

KAFKA-12194: use stateListener to catch each state change#9888
chia7712 merged 5 commits intoapache:trunkfrom
showuon:KAFKA-12194

Conversation

@showuon
Copy link
Copy Markdown
Member

@showuon showuon commented Jan 14, 2021

The tests are flaky because we used the waitForApplicationState to wait for a state. waitForApplicationState is using poll to check the current stream state, which might miss some state changes.
Ex:

state: created -> running
check state: running
state: running -> rebalancing
state: rebalancing -> running
check state: running <-- which is not what we expected (we expected to be rebalancing, but missed)

I use StateListener to keep the new state of each state change. So when we verify a specific state, we can always find it if existed. Also have small refactor.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Jan 14, 2021

@wcarlson5 @ableegoldman @cadonna , could you help review this PR? Thanks.

@chia7712
Copy link
Copy Markdown
Member

oh, we are working at same issue again (#9887) :(

Copy link
Copy Markdown
Member

@cadonna cadonna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@showuon Thank you for the PR! This is very important!

This are my comments.

Comment on lines -100 to -101
try (final KafkaStreams kafkaStreams = new KafkaStreams(builder.build(), properties)) {
StreamsTestUtils.startKafkaStreamsAndWaitForRunningState(kafkaStreams);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why replacing the try-with-resources with an explicit close()? This did not make the test flaky, as far as I can see. Can't we add the Streams state listener as the first statement in the try-with-resources block?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I just thought there are some duplicate codes and want to clean them up. But you're right, maybe not better. Fixed.

@Rule
public TestName testName = new TestName();

private final List<KafkaStreams.State> stateToTransitions = new ArrayList<>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think stateHistory or stateTransitionHistory would be a more meaningful name for this variable.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree~

Comment on lines +151 to +152
waitForStateTransition(KafkaStreams.State.REBALANCING);
waitForStateTransition(KafkaStreams.State.RUNNING);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to wait until the Kafka Streams client id in state RUNNING and then verify if the history of the states transitions after adding the stream thread is first REBALANCING and then RUNNING. Currently, the order is not verified as far as I can see.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! Added a hasStateTransition method to verify that. Thanks.

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Jan 14, 2021

@cadonna , thanks for the comments. I've updated in this commit: c7218be. Thank you.

@wcarlson5
Copy link
Copy Markdown
Contributor

@showuon These changes look good. Thanks for shoring up these tests

Copy link
Copy Markdown
Member

@cadonna cadonna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@showuon Thank you for the updates!

Here my feedback

waitForApplicationState(Collections.singletonList(kafkaStreams), KafkaStreams.State.RUNNING, DEFAULT_DURATION);

waitForStateTransition(KafkaStreams.State.RUNNING);
assertTrue(hasStateTransition(KafkaStreams.State.REBALANCING, KafkaStreams.State.RUNNING));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We normally use assertThat() in new and refactored code. Please also change the other occurrences.

Suggested change
assertTrue(hasStateTransition(KafkaStreams.State.REBALANCING, KafkaStreams.State.RUNNING));
assertThat(hasStateTransition(KafkaStreams.State.REBALANCING, KafkaStreams.State.RUNNING), is(true));

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, Updated. Thanks.

Comment on lines +123 to +133
// should have at least 2 states in history
if (stateTransitionHistory.size() < 2) {
return false;
}

for (int i = 0; i < stateTransitionHistory.size() - 1; i++) {
if (stateTransitionHistory.get(i).equals(before) && stateTransitionHistory.get(i + 1).equals(after)) {
return true;
}
}
return false;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a for-loop here? Wouldn't it suffice to verify the last two elements of the history and check if those two elements are a REBALANCING followed by a RUNNING?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use for loop is because I think there could be cases that there are some other state changes after RUNNING, ex: DEAD. But after your question, I think if that happened, the test should also fail as well. So, check the last 2 elements is good. Updated. Thanks.

Comment on lines +112 to +119
private void waitForStateTransition(final KafkaStreams.State expected) throws InterruptedException {
waitForCondition(
() -> !stateTransitionHistory.isEmpty() && stateTransitionHistory.contains(expected),
DEFAULT_DURATION.toMillis(),
() -> String.format("Client did not change to the %s state in time. Observed new state transitions: %s",
expected, stateTransitionHistory)
);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we simply wait for the current state to become RUNNING?

Suggested change
private void waitForStateTransition(final KafkaStreams.State expected) throws InterruptedException {
waitForCondition(
() -> !stateTransitionHistory.isEmpty() && stateTransitionHistory.contains(expected),
DEFAULT_DURATION.toMillis(),
() -> String.format("Client did not change to the %s state in time. Observed new state transitions: %s",
expected, stateTransitionHistory)
);
}
private void waitForRunning() throws Exception {
waitForCondition(
() -> kafkaStreams.state() == KafkaStreams.State.RUNNING,
DEFAULT_DURATION.toMillis(),
() -> String.format("Client did not transit to state %s in %d seconds", expected, DEFAULT_DURATION.toMillis() / 1000)
);
}

Copy link
Copy Markdown
Member Author

@showuon showuon Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't just check the current state to become RUNNING because after we add/remove threads, the state won't change immediately. That is, if we check if the state is RUNNING after adding/removing threads, the check will pass, but the rebalance is not happening, yet, which will cause the test fail. So I still use stateTransitionHistory to check the state, and also, I checked the last state of the history to see if it is RUNNING. That should be better.

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Jan 15, 2021

Test will fail, will work it later. Don't review yet. Thanks.

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Jan 15, 2021

@cadonna , thanks for the comments. I've updated in this commit: 46898d9. Thanks.

Copy link
Copy Markdown
Member

@cadonna cadonna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@showuon LGTM!

I just left some minor comments!

Call for committer review: @ableegoldman

Comment on lines +126 to +129
if (historySize >= 2 && stateTransitionHistory.get(historySize - 2).equals(before) &&
stateTransitionHistory.get(historySize - 1).equals(after)) {
return true;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just to better visually separate condition from if-block

Suggested change
if (historySize >= 2 && stateTransitionHistory.get(historySize - 2).equals(before) &&
stateTransitionHistory.get(historySize - 1).equals(after)) {
return true;
}
if (historySize >= 2 && stateTransitionHistory.get(historySize - 2).equals(before) &&
stateTransitionHistory.get(historySize - 1).equals(after)) {
return true;
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

}

private void addStreamStateChangeListener(final KafkaStreams kafkaStreams) {
// we store each new state in state transition so that we won't miss any state change
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please remove this comment? I do not think it is needed. The code is clear enough.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

// verify if state change from "before" state into "after" state
private boolean hasStateTransition(final KafkaStreams.State before, final KafkaStreams.State after) {
final int historySize = stateTransitionHistory.size();
// should have at least 2 states in history
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is also not needed. Could you remove it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

);
}

// verify if state change from "before" state into "after" state
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems incomplete. But I would also remove it. Sorry that I am a bit picky about inline comments, but inline comment tend to lie after a while when the code they should describe changes but the comments do not. I would rather focus on giving meaning names to variables and methods. For example, I would rename this method to lastStateTransitionFromRebalancingToRunning(), remove the argumetns, and hard code the states.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classic Bruno, can't sneak any inline comments past him :P

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines +161 to +162
waitForRunning();
assertThat(hasStateTransition(KafkaStreams.State.REBALANCING, KafkaStreams.State.RUNNING), is(true));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of combining these two checks to one and call it waitForTransitionFromRebalancingToRunning(). They are always used together.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Jan 18, 2021

@cadonna @ableegoldman , Thanks for the comments. I've updated in this commit: 4161096. Thanks.

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Jan 18, 2021

All tests passed.

Copy link
Copy Markdown
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@showuon Thanks for this nice fix and refactor. left some minor comments. Please take a look :)

oldThreadCount = kafkaStreams.localThreadsMetadata().size();
stateTransitionHistory.clear();

// remove a thread
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary comment

waitForApplicationState(Collections.singletonList(kafkaStreams), KafkaStreams.State.REBALANCING, DEFAULT_DURATION);
waitForApplicationState(Collections.singletonList(kafkaStreams), KafkaStreams.State.RUNNING, DEFAULT_DURATION);

assertThat(waitForTransitionFromRebalancingToRunning(), is(true));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me the method waitForTransitionFromRebalancingToRunning can do the assert as well because we always call assertThat(waitForTransitionFromRebalancingToRunning(), is(true) in this test.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Updated.


@After
public void teardown() throws IOException {
stateTransitionHistory.clear();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary as junit always create a new test class for each test case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion!


stateTransitionHistory.clear();

// add a new thread again
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary comment

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Jan 19, 2021

@chia7712 , thanks for the comments. I've updated in this commit: 2c399c4. Thank you.

@chia7712
Copy link
Copy Markdown
Member

@showuon Thanks for updating code. +1 again.

@guozhangwang
Copy link
Copy Markdown
Contributor

LGTM too. @chia7712 please feel free to merge and cherry-pick.

@chia7712 chia7712 merged commit 277c437 into apache:trunk Jan 19, 2021
@chia7712
Copy link
Copy Markdown
Member

merge to trunk only as this issue happens only in 2.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants