Skip to content

KAFKA-10048: Possible data gap for a consumer after a failover when u…#8730

Merged
mimaison merged 1 commit intoapache:trunkfrom
asdaraujo:trunk
Oct 2, 2020
Merged

KAFKA-10048: Possible data gap for a consumer after a failover when u…#8730
mimaison merged 1 commit intoapache:trunkfrom
asdaraujo:trunk

Conversation

@asdaraujo
Copy link
Copy Markdown
Contributor

@asdaraujo asdaraujo commented May 27, 2020

Changed the MM2 checkpoint mirror task to ensure it replicates consumer offsets even when they are equal to zero to avoid issues with consumers after failovers.

Modified the test case to cover the possible scenario of consumer gap, as described on KAFKA-10048.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Copy link
Copy Markdown
Contributor

@ryannedolan ryannedolan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for discovering, verifying, and fixing this issue! Change makes sense to me.

Requested change: add deadline to consumeAllMessages() -- otherwise lgtm.

@viktorsomogyi
Copy link
Copy Markdown
Contributor

@ijuma would you please review this or suggest who is the right committer to help with this?

@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jul 27, 2020

@mimaison, would you be able to review this?

@mimaison
Copy link
Copy Markdown
Member

Sure I'll take a look this week

Copy link
Copy Markdown
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this is indeed a problem if some partitions don't have records before the failover.

I've left a few comments. Overall I find the test pretty hard to follow. I wonder if more comments or some refactoring would help.

@asdaraujo
Copy link
Copy Markdown
Contributor Author

@mimaison Thanks for the feedback. I've refactored the tests. Could you please give it another review.

@mimaison
Copy link
Copy Markdown
Member

mimaison commented Sep 2, 2020

Thanks @asdaraujo, I'll try to take another look later on this week.

Can you rebase on trunk? to pick up a change (241e144) that is now necessary to run the CI tests

@asdaraujo
Copy link
Copy Markdown
Contributor Author

Thanks, @mimaison . I've rebased it.

Copy link
Copy Markdown
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update.
Considering the code change (a single line), the test is really complicated. Other existing tests already cover MM2 use cases. The new test could only focus on the difference of behaviour caused by the fix.

@asdaraujo asdaraujo force-pushed the trunk branch 2 times, most recently from edb9ead to 0ee597e Compare September 29, 2020 02:06
@asdaraujo asdaraujo requested a review from mimaison September 29, 2020 14:50
Copy link
Copy Markdown
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I think we're almost there! I've left a couple of comments

…sing MM2

Ensure that the MM2 checkpoint mirror task replicates consumer offsets even when they are
zero to avoid issues with consumers after failovers.

Author: Andre Araujo <asdaraujo@gmail.com>
@mimaison mimaison merged commit 6b95f1e into apache:trunk Oct 2, 2020
@ning2008wisc
Copy link
Copy Markdown
Contributor

Hi @mimaison I will rebase my pr #9224 on top of this one over this weekend and hope you may have time soon to review it as your next one. Thanks

javierfreire pushed a commit to javierfreire/kafka that referenced this pull request Oct 8, 2020
…sing MM2 (apache#8730)

Ensure that the MM2 checkpoint mirror task replicates consumer offsets even when they are
zero to avoid issues with consumers after failovers.

Author: Andre Araujo <asdaraujo@gmail.com>

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ryanne Dolan <ryannedolan@gmail.com>, Edoardo Comar <ecomar@uk.ibm.com>, heritamas
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants