Skip to content

MINOR: Fix unnecessary metadata fetch before group assignment#8095

Merged
hachikuji merged 2 commits intoapache:trunkfrom
hachikuji:fix-flaky-reset-test
Feb 12, 2020
Merged

MINOR: Fix unnecessary metadata fetch before group assignment#8095
hachikuji merged 2 commits intoapache:trunkfrom
hachikuji:fix-flaky-reset-test

Conversation

@hachikuji
Copy link
Copy Markdown
Contributor

@hachikuji hachikuji commented Feb 12, 2020

I tracked the recent increase in the flakiness of one of the offset reset tests (KAFKA-9538) back to #7941. After investigation, I found that following this patch, the consumer was sending an additional metadata request prior to performing the group assignment. This slight timing difference was enough to trigger the test failures. The problem turned out to be due to a bug in SubscriptionState.groupSubscribe, which no longer counted the local subscription when determining if there were new topics to fetch metadata for. Hence the extra metadata update.

Without the fix, I saw 30-50% test failures locally. With it, I could no longer reproduce the failure. However, #6561 is probably still needed to improve the resilience of this test.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Copy link
Copy Markdown
Contributor

@rajinisivaram rajinisivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hachikuji Thanks for the PR. Left one minor comment, apart from that LGTM

throw new IllegalStateException(SUBSCRIPTION_EXCEPTION_MESSAGE);
groupSubscription = new HashSet<>(groupSubscription);
return groupSubscription.addAll(topics);
groupSubscription = new HashSet<>(topics);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were accumulating topics earlier, but now we are replacing groupSubscription. I guess that is fine since we are passing in all the topics. May be worth updating the comment and also verifying that we are no longer accumulating in the unit test.

@hachikuji hachikuji merged commit 0a5dec0 into apache:trunk Feb 12, 2020
hachikuji added a commit that referenced this pull request Feb 12, 2020
The recent increase in the flakiness of one of the offset reset tests (KAFKA-9538) traces back to #7941. After investigation, we found that following this patch, the consumer was sending an additional metadata request prior to performing the group assignment. This slight timing difference was enough to trigger the test failures. The problem turned out to be due to a bug in `SubscriptionState.groupSubscribe`, which no longer counted the local subscription when determining if there were new topics to fetch metadata for. Hence the extra metadata update. This patch restores the old logic.

Without the fix, we saw 30-50% test failures locally. With it, I could no longer reproduce the failure. However, #6561 is probably still needed to improve the resilience of this test.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
rite2nikhil pushed a commit to confluentinc/kafka that referenced this pull request Feb 12, 2020
…#8095)

The recent increase in the flakiness of one of the offset reset tests (KAFKA-9538) traces back to apache#7941. After investigation, we found that following this patch, the consumer was sending an additional metadata request prior to performing the group assignment. This slight timing difference was enough to trigger the test failures. The problem turned out to be due to a bug in `SubscriptionState.groupSubscribe`, which no longer counted the local subscription when determining if there were new topics to fetch metadata for. Hence the extra metadata update. This patch restores the old logic.

Without the fix, we saw 30-50% test failures locally. With it, I could no longer reproduce the failure. However, apache#6561 is probably still needed to improve the resilience of this test.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
hachikuji added a commit that referenced this pull request Feb 12, 2020
The recent increase in the flakiness of one of the offset reset tests (KAFKA-9538) traces back to #7941. After investigation, we found that following this patch, the consumer was sending an additional metadata request prior to performing the group assignment. This slight timing difference was enough to trigger the test failures. The problem turned out to be due to a bug in `SubscriptionState.groupSubscribe`, which no longer counted the local subscription when determining if there were new topics to fetch metadata for. Hence the extra metadata update. This patch restores the old logic.

Without the fix, we saw 30-50% test failures locally. With it, I could no longer reproduce the failure. However, #6561 is probably still needed to improve the resilience of this test.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
hachikuji added a commit that referenced this pull request Feb 12, 2020
The recent increase in the flakiness of one of the offset reset tests (KAFKA-9538) traces back to #7941. After investigation, we found that following this patch, the consumer was sending an additional metadata request prior to performing the group assignment. This slight timing difference was enough to trigger the test failures. The problem turned out to be due to a bug in `SubscriptionState.groupSubscribe`, which no longer counted the local subscription when determining if there were new topics to fetch metadata for. Hence the extra metadata update. This patch restores the old logic.

Without the fix, we saw 30-50% test failures locally. With it, I could no longer reproduce the failure. However, #6561 is probably still needed to improve the resilience of this test.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
stanislavkozlovski pushed a commit to stanislavkozlovski/kafka that referenced this pull request Feb 18, 2020
…#8095)

The recent increase in the flakiness of one of the offset reset tests (KAFKA-9538) traces back to apache#7941. After investigation, we found that following this patch, the consumer was sending an additional metadata request prior to performing the group assignment. This slight timing difference was enough to trigger the test failures. The problem turned out to be due to a bug in `SubscriptionState.groupSubscribe`, which no longer counted the local subscription when determining if there were new topics to fetch metadata for. Hence the extra metadata update. This patch restores the old logic.

Without the fix, we saw 30-50% test failures locally. With it, I could no longer reproduce the failure. However, apache#6561 is probably still needed to improve the resilience of this test.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
qq619618919 pushed a commit to qq619618919/kafka that referenced this pull request May 12, 2020
…#8095)

The recent increase in the flakiness of one of the offset reset tests (KAFKA-9538) traces back to apache#7941. After investigation, we found that following this patch, the consumer was sending an additional metadata request prior to performing the group assignment. This slight timing difference was enough to trigger the test failures. The problem turned out to be due to a bug in `SubscriptionState.groupSubscribe`, which no longer counted the local subscription when determining if there were new topics to fetch metadata for. Hence the extra metadata update. This patch restores the old logic.

Without the fix, we saw 30-50% test failures locally. With it, I could no longer reproduce the failure. However, apache#6561 is probably still needed to improve the resilience of this test.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants