KAFKA-12384: stabilize ListOffsetsRequestTest#testResponseIncludesLeaderEpoch by chia7712 · Pull Request #10389 · apache/kafka

chia7712 · 2021-03-23T17:20:37Z

issue: https://issues.apache.org/jira/browse/KAFKA-12384
The root cause is that we don't wait new leader to sync hw with follower so sending request to get offset could encounter OFFSET_NOT_AVAILABLE error.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

dengziming

This is nice, should we trigger the jenkins build multiple times to verify that the flaky test is fixed.

chia7712 · 2021-03-24T03:14:53Z

@dengziming thanks for your review!

should we trigger the jenkins build multiple times to verify that the flaky test is fixed.

sure. I also loop ListOffsetsRequestTest 300 times on my local. all pass

dajac · 2021-03-24T07:33:17Z

@chia7712 Have you been able to reproduce the flaky test locally? I tried few times and I never could...

chia7712 · 2021-03-24T07:36:45Z

Have you been able to reproduce the flaky test locally? I tried few times and I never could...

yep. It requires a "slow" machine to make "slow" sync between leader and followers. I open 8 containers to loop that test on my local and it can reproduce the error effectively.

dajac · 2021-03-24T07:48:50Z

@chia7712 awesome!

chia7712 · 2021-03-24T17:19:25Z

merge trunk to trigger QA again.

chia7712 · 2021-03-26T09:40:59Z

unrelated failure. will merge trunk to trigger QA again.

ijuma · 2021-03-26T11:50:31Z

-    val partitionToLeader = TestUtils.createTopic(zkClient, topic, numPartitions = 1, replicationFactor = 3, servers)
+    val topicConfig = new Properties
+    // make sure we won't lose data when force-removing leader
+    topicConfig.setProperty(TopicConfig.MIN_IN_SYNC_REPLICAS_CONFIG, "2")


Is this required? Even with min isr 1, we still require all in sync replicas to ack for acks=all. How many brokers do we have in the test?

IIRC, acks=all means “all in-sync replicas” (rather than “all replicas”) have to receive record. In other words, produce request can be completed after one replica has received the record if mini ISR is one. When we shutdown the server with mini replica=1, the successful record may be NOT synced with other broker (there are 3 brokers totally). In short , the data could get lost after we shutdown current leader.

Yeah, but if you have 3 brokers and you shutdown one of them, you should still have 2 brokers in the ISR. Are you saying that we are shrinking the ISR to 1 in this test? That would be unexpected.

Are you saying that we are shrinking the ISR to 1 in this test? That would be unexpected.

No, I did not observe such accident. Maybe I misunderstood the log before and my assumption is not happen (loop it 1000 times). will revert this change.

chia7712 · 2021-03-31T03:29:07Z

@ijuma any suggestions? This test is still flaky :(

showuon

@chia7712 , thanks for the fix. LGTM. Please also update the status in KAFKA-12384. Thank you.

chia7712 · 2021-03-31T05:40:24Z

Please also update the status in KAFKA-12384. Thank you.

oh, sorry that I did not notice the existent ticket. will update the jira :)

chia7712 · 2021-04-06T07:39:35Z

unrelated error. merge trunk to trigger QA again

ijuma · 2021-04-06T13:10:26Z

@chia7712 Is the test still flaky with the latest changes?

chia7712 · 2021-04-06T17:25:03Z

Is the test still flaky with the latest changes?

yep. I loop this patch 100 times and all pass

ijuma

LGTM, thanks!

* apache-github/trunk: KAFKA-10769 Remove JoinGroupRequest#containsValidPattern as it is dup… (apache#9851) KAFKA-12384: stabilize ListOffsetsRequestTest#testResponseIncludesLeaderEpoch (apache#10389) KAFKA-5146: remove Connect dependency from Streams module (apache#10131)

MINOR: stabilize ListOffsetsRequestTest#testResponseIncludesLeaderEpoch

d143185

chia7712 requested a review from dajac March 23, 2021 17:20

dengziming approved these changes Mar 24, 2021

View reviewed changes

Comment thread core/src/test/scala/unit/kafka/server/ListOffsetsRequestTest.scala Outdated

Comment thread core/src/test/scala/unit/kafka/server/ListOffsetsRequestTest.scala Outdated

address review comment

44adc40

Merge branch 'trunk' into MINOR-10389

8479cd9

Merge branch 'trunk' into MINOR-10389

c647266

ijuma reviewed Mar 26, 2021

View reviewed changes

revert mini ISR change

cdacbb8

dengziming mentioned this pull request Mar 31, 2021

[DO NOT MERGE] investigate flaky test #10368

Closed

3 tasks

Merge branch 'trunk' into MINOR-10389

d7f2b07

chia7712 mentioned this pull request Mar 31, 2021

KAFKA-12561: don't set timeout for future.get #10410

Merged

3 tasks

showuon approved these changes Mar 31, 2021

View reviewed changes

Merge branch 'trunk' into MINOR-10389

23a4df1

chia7712 changed the title ~~MINOR: stabilize ListOffsetsRequestTest#testResponseIncludesLeaderEpoch~~ KAFKA-12384: stabilize ListOffsetsRequestTest#testResponseIncludesLeaderEpoch Apr 6, 2021

Merge branch 'trunk' into MINOR-10389

ca67f52

ijuma approved these changes Apr 6, 2021

View reviewed changes

chia7712 merged commit 174f0f9 into apache:trunk Apr 7, 2021

kamalcph mentioned this pull request May 16, 2023

KAFKA-9579 Fetch implementation for records in the remote storage through a specific purgatory. #13535

Merged

3 tasks

chia7712 deleted the MINOR-10389 branch March 25, 2024 15:21

Conversation

chia7712 commented Mar 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

dengziming left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chia7712 commented Mar 24, 2021

Uh oh!

dajac commented Mar 24, 2021

Uh oh!

chia7712 commented Mar 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dajac commented Mar 24, 2021

Uh oh!

chia7712 commented Mar 24, 2021

Uh oh!

chia7712 commented Mar 26, 2021

Uh oh!

ijuma Mar 26, 2021

Choose a reason for hiding this comment

Uh oh!

chia7712 Mar 26, 2021

Choose a reason for hiding this comment

Uh oh!

ijuma Mar 26, 2021

Choose a reason for hiding this comment

Uh oh!

chia7712 Mar 26, 2021

Choose a reason for hiding this comment

Uh oh!

chia7712 commented Mar 31, 2021

Uh oh!

showuon left a comment

Choose a reason for hiding this comment

Uh oh!

chia7712 commented Mar 31, 2021

Uh oh!

chia7712 commented Apr 6, 2021

Uh oh!

ijuma commented Apr 6, 2021

Uh oh!

chia7712 commented Apr 6, 2021

Uh oh!

ijuma left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

chia7712 commented Mar 23, 2021 •

edited

Loading

chia7712 commented Mar 24, 2021 •

edited

Loading