KAFKA-9538: Flaky test: testResetOffsetsExportImportPlan#6561
KAFKA-9538: Flaky test: testResetOffsetsExportImportPlan#6561hachikuji merged 2 commits intoapache:trunkfrom
Conversation
|
retest this please |
|
@huxihx This PR shows conflict. Can you rebase the PR to resolve them? @hachikuji @cmccabe This test fails very often -- can we bump up the priority to review this PR? Does the proposed fix make sense? I am not familiar with this test. |
https://issues.apache.org/jira/browse/KAFKA-8211 Reduced offset-committing interval from 5 seconds to 1 seoncd, hoping consumer#committed returns offset more quickly. Besides enriched the output message for the exceptional case.
|
Occasionally, the group is not in inactive state when resetting offsets. That would return the empty offset to fail the assertion. We should ensure the group inactivity before running |
| @@ -416,6 +416,9 @@ class ResetConsumerGroupOffsetTest extends ConsumerGroupCommandTest { | |||
| produceConsumeAndShutdown(topic = topic1, group = group1, totalMessages = 100, numConsumers = 2) | |||
| produceConsumeAndShutdown(topic = topic2, group = group2, totalMessages = 100, numConsumers = 5) | |||
There was a problem hiding this comment.
This test case doesn't really need 5 consumers. Can we change both of these to use a single consumer?
|
ok to test |
The recent increase in the flakiness of one of the offset reset tests (KAFKA-9538) traces back to #7941. After investigation, we found that following this patch, the consumer was sending an additional metadata request prior to performing the group assignment. This slight timing difference was enough to trigger the test failures. The problem turned out to be due to a bug in `SubscriptionState.groupSubscribe`, which no longer counted the local subscription when determining if there were new topics to fetch metadata for. Hence the extra metadata update. This patch restores the old logic. Without the fix, we saw 30-50% test failures locally. With it, I could no longer reproduce the failure. However, #6561 is probably still needed to improve the resilience of this test. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
hachikuji
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the patch!
|
ok to test |
The recent increase in the flakiness of one of the offset reset tests (KAFKA-9538) traces back to #7941. After investigation, we found that following this patch, the consumer was sending an additional metadata request prior to performing the group assignment. This slight timing difference was enough to trigger the test failures. The problem turned out to be due to a bug in `SubscriptionState.groupSubscribe`, which no longer counted the local subscription when determining if there were new topics to fetch metadata for. Hence the extra metadata update. This patch restores the old logic. Without the fix, we saw 30-50% test failures locally. With it, I could no longer reproduce the failure. However, #6561 is probably still needed to improve the resilience of this test. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
…#8095) The recent increase in the flakiness of one of the offset reset tests (KAFKA-9538) traces back to apache#7941. After investigation, we found that following this patch, the consumer was sending an additional metadata request prior to performing the group assignment. This slight timing difference was enough to trigger the test failures. The problem turned out to be due to a bug in `SubscriptionState.groupSubscribe`, which no longer counted the local subscription when determining if there were new topics to fetch metadata for. Hence the extra metadata update. This patch restores the old logic. Without the fix, we saw 30-50% test failures locally. With it, I could no longer reproduce the failure. However, apache#6561 is probably still needed to improve the resilience of this test. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
The recent increase in the flakiness of one of the offset reset tests (KAFKA-9538) traces back to #7941. After investigation, we found that following this patch, the consumer was sending an additional metadata request prior to performing the group assignment. This slight timing difference was enough to trigger the test failures. The problem turned out to be due to a bug in `SubscriptionState.groupSubscribe`, which no longer counted the local subscription when determining if there were new topics to fetch metadata for. Hence the extra metadata update. This patch restores the old logic. Without the fix, we saw 30-50% test failures locally. With it, I could no longer reproduce the failure. However, #6561 is probably still needed to improve the resilience of this test. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
The recent increase in the flakiness of one of the offset reset tests (KAFKA-9538) traces back to #7941. After investigation, we found that following this patch, the consumer was sending an additional metadata request prior to performing the group assignment. This slight timing difference was enough to trigger the test failures. The problem turned out to be due to a bug in `SubscriptionState.groupSubscribe`, which no longer counted the local subscription when determining if there were new topics to fetch metadata for. Hence the extra metadata update. This patch restores the old logic. Without the fix, we saw 30-50% test failures locally. With it, I could no longer reproduce the failure. However, #6561 is probably still needed to improve the resilience of this test. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
|
retest this please |
|
Having a hard time getting a build here. I verified the fix works locally, so I will go ahead and merge. |
…#8095) The recent increase in the flakiness of one of the offset reset tests (KAFKA-9538) traces back to apache#7941. After investigation, we found that following this patch, the consumer was sending an additional metadata request prior to performing the group assignment. This slight timing difference was enough to trigger the test failures. The problem turned out to be due to a bug in `SubscriptionState.groupSubscribe`, which no longer counted the local subscription when determining if there were new topics to fetch metadata for. Hence the extra metadata update. This patch restores the old logic. Without the fix, we saw 30-50% test failures locally. With it, I could no longer reproduce the failure. However, apache#6561 is probably still needed to improve the resilience of this test. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
…#8095) The recent increase in the flakiness of one of the offset reset tests (KAFKA-9538) traces back to apache#7941. After investigation, we found that following this patch, the consumer was sending an additional metadata request prior to performing the group assignment. This slight timing difference was enough to trigger the test failures. The problem turned out to be due to a bug in `SubscriptionState.groupSubscribe`, which no longer counted the local subscription when determining if there were new topics to fetch metadata for. Hence the extra metadata update. This patch restores the old logic. Without the fix, we saw 30-50% test failures locally. With it, I could no longer reproduce the failure. However, apache#6561 is probably still needed to improve the resilience of this test. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
https://issues.apache.org/jira/browse/KAFKA-8211
Reduced offset-committing interval from 5 seconds to 1 second, hoping consumer#committed returns offset more quickly. Besides, enriched the output message for the exceptional case.
More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.
Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.
Committer Checklist (excluded from commit message)