KAFKA-8460: reduce the record size and increase the delay time by showuon · Pull Request #9775 · apache/kafka

showuon · 2020-12-22T02:12:33Z

Looking into this flaky test, the error messages are:

Timed out before consuming expected 1350 records. The number consumed was 1230.

https://ci-builds.apache.org/job/Kafka/job/kafka-trunk-jdk8/303/testReport/kafka.api/PlaintextConsumerTest/testLowMaxFetchSizeForRequestAndPartition/

Timed out before consuming expected 1350 records. The number consumed was 1200.

https://ci-builds.apache.org/job/Kafka/job/kafka-trunk-jdk8/305/testReport/kafka.api/PlaintextConsumerTest/testLowMaxFetchSizeForRequestAndPartition/

Timed out before consuming expected 1350 records. The number consumed was 1215.

https://ci-builds.apache.org/job/Kafka/job/kafka-trunk-jdk8/305/testReport/junit/kafka.api/PlaintextConsumerTest/testLowMaxFetchSizeForRequestAndPartition/

We can see, the number consumes are not fixed number and close to 1350. After checking the test, I found the test is expected to be slow because it tests we can consume all partitions if fetch.max.bytes and max.partition.fetch.bytes are low. So I think the test has no bug, just need more time.

What I did are:

reduce the record size for each partition (from 15 -> 10), it should speed up the test, but also be able to test the original scenario
increase the timeout value (from 60 sec -> 90 sec)

Hope this can makes the test more reliable!

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

showuon · 2020-12-22T02:16:43Z

@chia7712 , could you help review this small PR? Thanks.

chia7712

@showuon Thanks for your patch. Please take a look at following comments. Thanks!

chia7712 · 2020-12-22T04:00:16Z

+    // we produce 10 records for each topic partition. There are 3 topics, and 30 partitions each topic,
+    // so total producerRecords size should be 10 * 3 * 30 = 900
+    val producerRecords = partitions.flatMap(sendRecords(producer, numRecords = 10, _))
+    val consumerRecords = consumeRecords(consumer, producerRecords.size, waitTimeMs = 90 * 1000)


Personally, 90 seconds is too long to be a test case. Reducing the produce size can't resolve this issue?

Agree! I increase to 90 secs just in case. I think reduce the record size is good enough.

revert back to 60 secs now.

chia7712 · 2020-12-22T04:01:14Z

-                                     maxPollRecords: Int = Int.MaxValue): ArrayBuffer[ConsumerRecord[K, V]] = {
+                                     maxPollRecords: Int = Int.MaxValue,
+                                     waitTimeMs: Int = 60000): ArrayBuffer[ConsumerRecord[K, V]] = {
    val records = new ArrayBuffer[ConsumerRecord[K, V]]


Could you add the initial size? It collects all return records so the default size is too small to this case.

Nice catch! Updated. Thanks.

showuon · 2020-12-22T07:02:29Z

@chia7712 , I was too naive, I saw there are only 7xx records consumed in recent build:

Timed out before consuming expected 1350 records. The number consumed was 720.

https://ci-builds.apache.org/job/Kafka/job/kafka-trunk-jdk15/353/testReport/junit/kafka.api/PlaintextConsumerTest/testLowMaxFetchSizeForRequestAndPartition/
https://ci-builds.apache.org/job/Kafka/job/kafka-trunk-jdk11/333/testReport/junit/kafka.api/PlaintextConsumerTest/testLowMaxFetchSizeForRequestAndPartition/

I don't know how slow the system will be. So, I reduce to 5 records each partition, total will be 450 records. FYI.

showuon · 2020-12-24T09:03:18Z

@chia7712 , I found the test failed in my PR tests:

Timed out before consuming expected 450 records. The number consumed was 325

It only consumed 325 records within 60 seconds!! So slow! Do you think I should reduce the records lower?

showuon · 2020-12-30T03:21:17Z

Monitoring recent test failed: I think reduce the records to 450 should be good enough. How do you think? @chia7712

org.scalatest.exceptions.TestFailedException: Timed out before consuming expected 1350 records. The number consumed was 1275.

org.scalatest.exceptions.TestFailedException: Timed out before consuming expected 1350 records. The number consumed was 1005.

https://ci-builds.apache.org/job/Kafka/job/kafka-trunk-jdk11/342/testReport/junit/kafka.api/PlaintextConsumerTest/testLowMaxFetchSizeForRequestAndPartition/
https://ci-builds.apache.org/job/Kafka/job/kafka-trunk-jdk11/342/testReport/junit/kafka.api/PlaintextConsumerTest/testLowMaxFetchSizeForRequestAndPartition/

chia7712 · 2020-12-30T09:54:41Z

@showuon Is it a potential bug which can slowdown the consumer in this test case? Or this bug is caused by busy Jenkins?

showuon · 2021-01-14T03:16:57Z

@chia7712 , after further investigation, I found the tests have some issues, and cause this flaky test, not related to the record size. I opened another PR: #9877 to address it. Thanks.

KAFKA-8460: reduce the record size and increase the delay time

5c145e3

showuon force-pushed the KAFKA-8460 branch from 1158206 to 5c145e3 Compare December 22, 2020 02:14

chia7712 reviewed Dec 22, 2020

View reviewed changes

KAFKA-8460: use 60 secs timeout

b434501

KAFKA-8460: reduce record size

392ac90

showuon force-pushed the KAFKA-8460 branch from 7c43589 to 392ac90 Compare December 22, 2020 07:31

showuon closed this Jan 13, 2021

showuon deleted the KAFKA-8460 branch January 13, 2021 09:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-8460: reduce the record size and increase the delay time#9775

KAFKA-8460: reduce the record size and increase the delay time#9775
showuon wants to merge 3 commits intoapache:trunkfrom
showuon:KAFKA-8460

showuon commented Dec 22, 2020 •

edited

Loading

Uh oh!

showuon commented Dec 22, 2020

Uh oh!

chia7712 left a comment

Uh oh!

chia7712 Dec 22, 2020

Uh oh!

showuon Dec 22, 2020

Uh oh!

showuon Dec 22, 2020

Uh oh!

chia7712 Dec 22, 2020

Uh oh!

showuon Dec 22, 2020

Uh oh!

showuon commented Dec 22, 2020 •

edited

Loading

Uh oh!

showuon commented Dec 24, 2020 •

edited

Loading

Uh oh!

showuon commented Dec 30, 2020

Uh oh!

chia7712 commented Dec 30, 2020

Uh oh!

showuon commented Jan 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

showuon commented Dec 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

showuon commented Dec 22, 2020

Uh oh!

chia7712 left a comment

Choose a reason for hiding this comment

Uh oh!

chia7712 Dec 22, 2020

Choose a reason for hiding this comment

Uh oh!

showuon Dec 22, 2020

Choose a reason for hiding this comment

Uh oh!

showuon Dec 22, 2020

Choose a reason for hiding this comment

Uh oh!

chia7712 Dec 22, 2020

Choose a reason for hiding this comment

Uh oh!

showuon Dec 22, 2020

Choose a reason for hiding this comment

Uh oh!

showuon commented Dec 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

showuon commented Dec 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

showuon commented Dec 30, 2020

Uh oh!

chia7712 commented Dec 30, 2020

Uh oh!

showuon commented Jan 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

showuon commented Dec 22, 2020 •

edited

Loading

showuon commented Dec 22, 2020 •

edited

Loading

showuon commented Dec 24, 2020 •

edited

Loading