KAFKA-12398: Fix flaky test ConsumerBounceTest.testClose#10243
KAFKA-12398: Fix flaky test ConsumerBounceTest.testClose#10243dajac merged 3 commits intoapache:trunkfrom
ConsumerBounceTest.testClose#10243Conversation
There was a problem hiding this comment.
Could the ConsumerRebalanceListener be called if we don't poll?
There was a problem hiding this comment.
Sorry for the late reply, the onPartitionsAssigned is only called in AbstractCoordinator.onJoinComplete, we should call poll to invoke join onJoinComplete.
There was a problem hiding this comment.
Make sense. So, if assignSemaphore is not released after poll is executed, it will never be released, right? Should we include poll within waitUntilTrue?
There was a problem hiding this comment.
That make sense, I tried to print the time cost by consumer.poll,
println("start:" + System.currentTimeMillis() )
consumer.poll(time.Duration.ofSeconds(3L))
println("end:" + System.currentTimeMillis() )
and I found that poll will cost 3 seconds since no data is consumed before timeout.
start:1614844034207
end:1614844037213
We can save almost 3 seconds after moving poll to waitUntilTrue .
6095b52 to
daf4b85
Compare
dajac
left a comment
There was a problem hiding this comment.
Thank for the update. Have you tried to repeatedly run the test to verify that it resolve the issue?
daf4b85 to
b977a82
Compare
@dajac Of course, I tried many times to verify, only when I set the waitTimeMs<6000 it fails occasionally. |
Conflicts: * Jenkinsfile: `install` -> `publishToMavenLocal`, drop ARM build and other changes that don't make sense for Confluent's version of `Jenkinsfile`. * build.gradle: keep Confluent changes for automatic skipping signing for specific version patterns (upstream only does it if the version ends with `SNAPSHOT`). Commits: * apache-github/trunk: (59 commits) MINOR: Remove redundant allows in import-control.xml (apache#10339) MINOR: remove some specifying types in tool command (apache#10329) KAFKA-12455: Fix OffsetValidationTest.test_broker_rolling_bounce failure with Raft (apache#10322) MINOR: Add toString to various Kafka Metrics classes (apache#10330) KAFKA-12330; FetchSessionCache may cause starvation for partitions when FetchResponse is full (apache#10318) KAFKA-12427: Don't update connection idle time for muted connections (apache#10267) MINOR; Various code cleanups (apache#10319) HOTFIX: timeout issue in removeStreamThread() (apache#10321) revert stream logging level back to ERROR (apache#10320) KAFKA-12352: Make sure all rejoin group and reset state has a reason (apache#10232) KAFKA-10348: Share client channel between forwarding and auto creation manager (apache#10135) MINOR: Update year in NOTICE (apache#10308) KAFKA-12398: Fix flaky test `ConsumerBounceTest.testClose` (apache#10243) MINOR: Remove redundant inheritance from FilteringJmxReporter #onMetricRemoved (apache#10303) KAFKA-12462: proceed with task revocation in case of thread in PENDING_SHUTDOWN (apache#10311) KAFKA-12460; Do not allow raft truncation below high watermark (apache#10310) MINOR: Log project, gradle, java and scala versions at the start of the build (apache#10307) KAFKA-10357: Add missing repartition topic validation (apache#10305) MINOR: Improve error message in MirrorConnectorsIntegrationBaseTest (apache#10268) MINOR: Add missing unit tests for Mirror Connect (apache#10192) ...
More detailed description of your change

The test fails some times as follow:
We'd better use
TestUtils.waitUntilTrueinstead of waiting for 1 second because sometimes 1 second is too long and sometimes is too short.Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.
Committer Checklist (excluded from commit message)