Skip to content

KAFKA-7938: Fix test flakiness in DeleteConsumerGroupsTest#6306

Closed
gwenshap wants to merge 1 commit intoapache:2.2from
gwenshap:KAFKA-7938
Closed

KAFKA-7938: Fix test flakiness in DeleteConsumerGroupsTest#6306
gwenshap wants to merge 1 commit intoapache:2.2from
gwenshap:KAFKA-7938

Conversation

@gwenshap
Copy link
Copy Markdown
Contributor

This attempts to fix KAFKA-7938 and KAFKA-7946.

I removed two tests:

  • testDeleteWithShortInitialization basically didn't check the result and therefore always passed
  • testDeleteCmdWithShortInitialization has no way to enforce that initialization is indeed short, and therefore sometimes FAILED because the group would be created before the CMD tried to delete it.

I thought the tests had limited value relative to the effort of figuring out a way to make the timing work.

I also fixed testDeleteCmdNonEmptyGroup and testDeleteNonEmptyGroup so they will validate that the group both exists and is non-empty before starting the test itself. I also added some extra information for future debugging sessions :)

I ran the tests LOTS of times to validate, but with flaky tests, it is hard to tell :)

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@gwenshap
Copy link
Copy Markdown
Contributor Author

Forgot to mention:

  1. The PR is against 2.2 because the goal is to unblock that release by fixing flaky tests. I can do a separate PR against trunk.

  2. @vahidhashemian can you confirm the important (or lack thereof) of the tests I'm removing?

@gwenshap
Copy link
Copy Markdown
Contributor Author

cc @mjsax

@gwenshap gwenshap changed the title Kafka 7938: Fix test flakiness in DeleteConsumerGroupsTest KAFKA-7938: Fix test flakiness in DeleteConsumerGroupsTest Feb 22, 2019
@gwenshap
Copy link
Copy Markdown
Contributor Author

I heard you like flaky tests, so I put some flaky test failures in your flaky test fixes...

I think the failures are unrelated :)

Copy link
Copy Markdown
Contributor

@vahidhashemian vahidhashemian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gwenshap for the PR. I'm fine with removing those flaky "short initialization" tests. They make more sense for describing groups anyway (where they were originally added).

I left a couple of minor comments inline.

}, "The group did not initialize as expected.", maxRetries = 3)

val result = service.deleteGroups()
println(result)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unintentional line here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, quite intentional :) Those failures are difficult to reproduce, so more info in the console output of the test will help us next time.

I can add a comment if this is confusing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this prints the output even when the test passes. If we want the output on failures only maybe that output can be added to the assertion error messages that follow. Unless there is another reason to have the output in either scenario?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that makes sense. Will send the improvement tomorrow morning :)


TestUtils.waitUntilTrue(() => {
service.listGroups().contains(group)
service.collectGroupMembers(false)._2.get.size == 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving this check. Do you think it's worth doing the same in a few other unit tests in this file?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over the other tests, and none of them required a non-empty group specifically. Did I miss something?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, sorry my mistake.

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Feb 22, 2019

The PR is against 2.2 because the goal is to unblock that release by fixing flaky tests. I can do a separate PR against trunk.

Why not to the PR against trunk and cherry-pick to 2.2 ?

@gwenshap
Copy link
Copy Markdown
Contributor Author

The PR is against 2.2 because the goal is to unblock that release by fixing flaky tests. I can do a separate PR against trunk.

Why not to the PR against trunk and cherry-pick to 2.2 ?

will do.

@gwenshap
Copy link
Copy Markdown
Contributor Author

Replaced by #6312 which is against trunk.

@gwenshap gwenshap closed this Feb 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants