Skip to content

KAFKA-12398: Fix flaky test ConsumerBounceTest.testClose#10243

Merged
dajac merged 3 commits intoapache:trunkfrom
dengziming:KAFKA-12398-flaky-testclose
Mar 13, 2021
Merged

KAFKA-12398: Fix flaky test ConsumerBounceTest.testClose#10243
dajac merged 3 commits intoapache:trunkfrom
dengziming:KAFKA-12398-flaky-testclose

Conversation

@dengziming
Copy link
Copy Markdown
Member

More detailed description of your change
The test fails some times as follow:
Pasted Graphic

We'd better use TestUtils.waitUntilTrue instead of waiting for 1 second because sometimes 1 second is too long and sometimes is too short.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Comment on lines 486 to 487
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the ConsumerRebalanceListener be called if we don't poll?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, the onPartitionsAssigned is only called in AbstractCoordinator.onJoinComplete, we should call poll to invoke join onJoinComplete.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. So, if assignSemaphore is not released after poll is executed, it will never be released, right? Should we include poll within waitUntilTrue?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That make sense, I tried to print the time cost by consumer.poll,

    println("start:" + System.currentTimeMillis() )
    consumer.poll(time.Duration.ofSeconds(3L))
    println("end:" + System.currentTimeMillis() )

and I found that poll will cost 3 seconds since no data is consumed before timeout.

start:1614844034207
end:1614844037213

We can save almost 3 seconds after moving poll to waitUntilTrue .

@dengziming dengziming force-pushed the KAFKA-12398-flaky-testclose branch 2 times, most recently from 6095b52 to daf4b85 Compare March 4, 2021 07:56
@dengziming dengziming requested a review from dajac March 11, 2021 07:28
Copy link
Copy Markdown
Member

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for the update. Have you tried to repeatedly run the test to verify that it resolve the issue?

Comment thread core/src/test/scala/integration/kafka/api/ConsumerBounceTest.scala Outdated
@dengziming dengziming force-pushed the KAFKA-12398-flaky-testclose branch from daf4b85 to b977a82 Compare March 13, 2021 01:25
@dengziming
Copy link
Copy Markdown
Member Author

Thank for the update. Have you tried to repeatedly run the test to verify that it resolve the issue?

@dajac Of course, I tried many times to verify, only when I set the waitTimeMs<6000 it fails occasionally.

Copy link
Copy Markdown
Member

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the path!

@dajac dajac merged commit 9edc828 into apache:trunk Mar 13, 2021
ijuma added a commit to confluentinc/kafka that referenced this pull request Mar 17, 2021
Conflicts:
* Jenkinsfile: `install` -> `publishToMavenLocal`, drop ARM build and
other changes that don't make sense for Confluent's version of
`Jenkinsfile`.
* build.gradle: keep Confluent changes for automatic skipping signing
for specific version patterns (upstream only does it if the version ends
with `SNAPSHOT`).

Commits:
* apache-github/trunk: (59 commits)
  MINOR: Remove redundant allows in import-control.xml (apache#10339)
  MINOR: remove some specifying types in tool command (apache#10329)
  KAFKA-12455: Fix OffsetValidationTest.test_broker_rolling_bounce failure with Raft (apache#10322)
  MINOR: Add toString to various Kafka Metrics classes (apache#10330)
  KAFKA-12330; FetchSessionCache may cause starvation for partitions when FetchResponse is full (apache#10318)
  KAFKA-12427: Don't update connection idle time for muted connections (apache#10267)
  MINOR; Various code cleanups (apache#10319)
  HOTFIX: timeout issue in removeStreamThread() (apache#10321)
  revert stream logging level back to ERROR (apache#10320)
  KAFKA-12352: Make sure all rejoin group and reset state has a reason (apache#10232)
  KAFKA-10348: Share client channel between forwarding and auto creation manager (apache#10135)
  MINOR: Update year in NOTICE (apache#10308)
  KAFKA-12398: Fix flaky test `ConsumerBounceTest.testClose` (apache#10243)
  MINOR: Remove redundant inheritance from FilteringJmxReporter #onMetricRemoved (apache#10303)
  KAFKA-12462: proceed with task revocation in case of thread in PENDING_SHUTDOWN (apache#10311)
  KAFKA-12460; Do not allow raft truncation below high watermark (apache#10310)
  MINOR: Log project, gradle, java and scala versions at the start of the build (apache#10307)
  KAFKA-10357: Add missing repartition topic validation (apache#10305)
  MINOR: Improve error message in MirrorConnectorsIntegrationBaseTest (apache#10268)
  MINOR: Add missing unit tests for Mirror Connect (apache#10192)
  ...
@dengziming dengziming deleted the KAFKA-12398-flaky-testclose branch March 17, 2022 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants