Skip to content

KAFKA-8878: Fix flaky test AssignedStreamsTasksTest#shouldCloseCleanl…#7302

Merged
bbejeck merged 1 commit intoapache:trunkfrom
cpettitt-confluent:kafka-8878
Sep 10, 2019
Merged

KAFKA-8878: Fix flaky test AssignedStreamsTasksTest#shouldCloseCleanl…#7302
bbejeck merged 1 commit intoapache:trunkfrom
cpettitt-confluent:kafka-8878

Conversation

@cpettitt-confluent
Copy link
Copy Markdown
Contributor

…yWithSuspendedTaskAndEOS

The previous approach to testing KAFKA-8412 was to look at the logs and
determine if an error occurred during close. There was no direct way to
detect than an exception occurred because the exception was eaten in
AssignedTasks.close. In the PR for that ticket (#7207) it was
acknowledged that this was a brittle way to test for the exception. We
now see occasional failures because an unrelated ERROR level log entry
is made while closing the task.

This change eliminates the brittle log checking by rethrowing any time
an exception occurs in close, even when a subsequent unclean close
succeeds. This has the potential benefit of uncovering other supressed
exceptions down the road.

I've verified that even with us rethrowing on closeUnclean that all
tests pass.

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

…yWithSuspendedTaskAndEOS

The previous approach to testing KAFKA-8412 was to look at the logs and
determine if an error occurred during close. There was no direct way to
detect than an exception occurred because the exception was eaten in
`AssignedTasks.close`. In the PR for that ticket (apache#7207) it was
acknowledged that this was a brittle way to test for the exception. We
now see occasional failures because an unrelated ERROR level log entry
is made while closing the task.

This change eliminates the brittle log checking by rethrowing any time
an exception occurs in close, even when a subsequent unclean close
succeeds. This has the potential benefit of uncovering other supressed
exceptions down the road.

I've verified that even with us rethrowing on `closeUnclean` that all
tests pass.
@cpettitt-confluent
Copy link
Copy Markdown
Contributor Author

We may want to also backport this to 2.3, 2.2, and 2.1, since they all got the original fix in #7207 and may suffer from flaky test results as well.

@cpettitt-confluent
Copy link
Copy Markdown
Contributor Author

It's worth noting that raising the exception from AssignedTasks.close as in this patch does not impact close for non assigned suspended tasks (e.g. in AssignedTasks.closeNonAssignedSuspendedTasks).

@mjsax mjsax added streams tests Test fixes (including flaky tests) labels Sep 5, 2019
Copy link
Copy Markdown
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@cpettitt-confluent
Copy link
Copy Markdown
Contributor Author

The one test failure on JDK 8 / Scala 2.11 is:

org.apache.kafka.connect.integration.ConnectWorkerIntegrationTest.testAddAndRemoveWorker

It's a flaky test : https://issues.apache.org/jira/browse/KAFKA-8690.

@cpettitt-confluent
Copy link
Copy Markdown
Contributor Author

@mjsax do I need to do anything else on my side to get this merged?

Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpettitt-confluent thanks for the patch. LGTM.

@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Sep 10, 2019

Java 8/2.11 failed test results already cleared out.

retest this please.

@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Sep 10, 2019

waiting for green build to merge

@cpettitt-confluent
Copy link
Copy Markdown
Contributor Author

Thanks for the review @bbejeck. If it helps at all, I captured the test failure in a comment above. It was due to this flaky test: https://issues.apache.org/jira/browse/KAFKA-8690. Given the high false positive rate, I've been copying errors into comments. Not sure if there is a better way to be managing these :).

@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Sep 10, 2019

on one PR build both Java 11 builds passed and Java 8 failed #7302 (comment)

this PR build Java 8 passed so all builds passed at one point, merging this.

@bbejeck bbejeck merged commit 18246e5 into apache:trunk Sep 10, 2019
@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Sep 10, 2019

Merged #7302 into trunk.

@cpettitt-confluent
Copy link
Copy Markdown
Contributor Author

Thanks Bill!

@cpettitt-confluent cpettitt-confluent deleted the kafka-8878 branch September 10, 2019 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

streams tests Test fixes (including flaky tests)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants