Skip to content

MINOR: Fix ListOffsetsRequestTest.testResponseIncludesLeaderEpoch#13747

Merged
dajac merged 1 commit intoapache:trunkfrom
dajac:minor-fix-testResponseIncludesLeaderEpoch
May 24, 2023
Merged

MINOR: Fix ListOffsetsRequestTest.testResponseIncludesLeaderEpoch#13747
dajac merged 1 commit intoapache:trunkfrom
dajac:minor-fix-testResponseIncludesLeaderEpoch

Conversation

@dajac
Copy link
Copy Markdown
Member

@dajac dajac commented May 23, 2023

Fix flaky ListOffsetsRequestTest.testResponseIncludesLeaderEpoch. It specifically fixes failures seen here.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

val secondLeaderId = TestUtils.awaitLeaderChange(servers, partition, firstLeaderId)
// make sure high watermark of new leader has caught up
TestUtils.waitUntilTrue(() => sendRequest(secondLeaderId, 0L, -1).errorCode() != Errors.OFFSET_NOT_AVAILABLE.code(),
TestUtils.waitUntilTrue(() => sendRequest(secondLeaderId, ListOffsetsRequest.LATEST_TIMESTAMP, -1).errorCode != Errors.OFFSET_NOT_AVAILABLE.code,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OFFSET_NOT_AVAILABLE is only returned when LATEST_TIMESTAMP or MAX_TIMESTAMP are used.

Copy link
Copy Markdown
Member

@showuon showuon May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if my understanding is correct, the test failure is because we are using the the OFFSET_NOT_AVAILABLE to identify if the new leader has caught up, but the OFFSET_NOT_AVAILABLE error only throws when requesting LATEST_TIMESTAMP or a timestamp beyond the last fetchable offset. Since we were using a small number of timestamp, this wait will pass immediately, and cause the below assertion failure with OFFSET_NOT_AVAILABLE sometimes.

Is this correct? If so, this change LGTM! Thanks.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. This is correct.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

val secondLeaderId = TestUtils.awaitLeaderChange(servers, partition, firstLeaderId)
// make sure high watermark of new leader has caught up
TestUtils.waitUntilTrue(() => sendRequest(secondLeaderId, 0L, -1).errorCode() != Errors.OFFSET_NOT_AVAILABLE.code(),
TestUtils.waitUntilTrue(() => sendRequest(secondLeaderId, ListOffsetsRequest.LATEST_TIMESTAMP, -1).errorCode != Errors.OFFSET_NOT_AVAILABLE.code,
Copy link
Copy Markdown
Member

@showuon showuon May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if my understanding is correct, the test failure is because we are using the the OFFSET_NOT_AVAILABLE to identify if the new leader has caught up, but the OFFSET_NOT_AVAILABLE error only throws when requesting LATEST_TIMESTAMP or a timestamp beyond the last fetchable offset. Since we were using a small number of timestamp, this wait will pass immediately, and cause the below assertion failure with OFFSET_NOT_AVAILABLE sometimes.

Is this correct? If so, this change LGTM! Thanks.

@dajac
Copy link
Copy Markdown
Member Author

dajac commented May 24, 2023

@showuon Thanks for your review. Yes, your understanding is correct. This check never worked in fact. The part which is not clear is that this test fails all the time in CI since we merged #13535 but I can't really see a reason. It seems that propagating the HWM is slower than it used to be.

@dajac dajac merged commit 1c5cf4e into apache:trunk May 24, 2023
@dajac dajac deleted the minor-fix-testResponseIncludesLeaderEpoch branch May 24, 2023 08:25
@satishd
Copy link
Copy Markdown
Member

satishd commented May 25, 2023

Thanks @dajac for fixing this intermittent failure.

divijvaidya pushed a commit that referenced this pull request Jul 10, 2023
…3747)

Fix flaky ListOffsetsRequestTest.testResponseIncludesLeaderEpoch. It specifically fixes failures seen [here](https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-13535/33/tests/).

Reviewers: Luke Chen <showuon@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants