Skip to content

MINOR: Fix partition loading checks in GroupCoordinator#4788

Merged
hachikuji merged 2 commits intoapache:trunkfrom
hachikuji:minor-fix-group-loading-error
Apr 2, 2018
Merged

MINOR: Fix partition loading checks in GroupCoordinator#4788
hachikuji merged 2 commits intoapache:trunkfrom
hachikuji:minor-fix-group-loading-error

Conversation

@hachikuji
Copy link
Copy Markdown
Contributor

In the group coordinator, we currently we check whether the partition is owned before checking whether it is loading. Since loading is a prerequisite for partition ownership, it means that it is not actually possible to see the COORDINATOR_LOAD_IN_PROGRESS error. The impact is mostly harmless: while loading the group, the client may send unnecessary FindCoordinator requests to rediscover the coordinator. I've fixed the bug and restructured the code to enable testing.

In the process of fixing this bug, I made the following minor changes:

  1. We now verify valid groupId in all request handlers.
  2. Currently if the coordinator is loading when a SyncGroup is received, we'll return NOT_COORDINATOR. I've changed this to return REBALANCE_IN_PROGRESS since the rebalance state will have been lost on coordinator failover. This effectively forces the consumer to rejoin the group, which seems preferable over unnecessarily rediscovering the coordinator.
  3. I added a check for the COORDINATOR_LOAD_IN_PROGRESS handler in SyncGroup. Although we do not currently return this error, it seems reasonable that we might want to some day, so it seems better to get the check in now.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@hachikuji hachikuji requested a review from guozhangwang March 28, 2018 16:57
@hachikuji hachikuji changed the title MINOR: Fix coordinator loading checks in GroupCoordinator MINOR: Fix partition loading checks in GroupCoordinator Mar 28, 2018
Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a couple minor comments. Please feel free to merge after that.

groupManager.getGroup(groupId) match {
responseCallback: SyncCallback): Unit = {
validateGroup(groupId) match {
case Some(error) if error == Errors.COORDINATOR_LOAD_IN_PROGRESS =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are handling COORDINATOR_LOAD_IN_PROGRESS on the client side now, could we just return this error code than translating it to REBALANCE_IN_PROGRESS?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to do that, but it would break older clients. Maybe we can leave a note somewhere to do this in the next protocol bump? I guess we could leave out the client handling of the error until then. What do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

case Some(error) => responseCallback(Array.empty, error)

case None =>
groupManager.getGroup(groupId) match {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indention below seems not aligned?

@hachikuji hachikuji merged commit 8662a02 into apache:trunk Apr 2, 2018
hachikuji added a commit that referenced this pull request Apr 11, 2018
…pId (#4851)

We had a regression in #4788 which caused the offset commit/fetch/describe APIs to fail if the groupId was empty. This should be allowed for backwards compatibility. Additionally, I have modified DeleteGroups to allow removal of the empty group, which was missed in the initial implementation. I've added a test case to ensure that we do not miss this again in the future.

Reviewers: Ismael Juma <ismael@juma.me.uk>
koqizhao pushed a commit to koqizhao/kafka that referenced this pull request Apr 12, 2018
…pId (apache#4851)

We had a regression in apache#4788 which caused the offset commit/fetch/describe APIs to fail if the groupId was empty. This should be allowed for backwards compatibility. Additionally, I have modified DeleteGroups to allow removal of the empty group, which was missed in the initial implementation. I've added a test case to ensure that we do not miss this again in the future.

Reviewers: Ismael Juma <ismael@juma.me.uk>
koqizhao pushed a commit to koqizhao/kafka that referenced this pull request Apr 12, 2018
…pId (apache#4851)

We had a regression in apache#4788 which caused the offset commit/fetch/describe APIs to fail if the groupId was empty. This should be allowed for backwards compatibility. Additionally, I have modified DeleteGroups to allow removal of the empty group, which was missed in the initial implementation. I've added a test case to ensure that we do not miss this again in the future.

Reviewers: Ismael Juma <ismael@juma.me.uk>
koqizhao pushed a commit to koqizhao/kafka that referenced this pull request Apr 12, 2018
…pId (apache#4851)

We had a regression in apache#4788 which caused the offset commit/fetch/describe APIs to fail if the groupId was empty. This should be allowed for backwards compatibility. Additionally, I have modified DeleteGroups to allow removal of the empty group, which was missed in the initial implementation. I've added a test case to ensure that we do not miss this again in the future.

Reviewers: Ismael Juma <ismael@juma.me.uk>
ying-zheng pushed a commit to ying-zheng/kafka that referenced this pull request Jul 6, 2018
…pId (apache#4851)

We had a regression in apache#4788 which caused the offset commit/fetch/describe APIs to fail if the groupId was empty. This should be allowed for backwards compatibility. Additionally, I have modified DeleteGroups to allow removal of the empty group, which was missed in the initial implementation. I've added a test case to ensure that we do not miss this again in the future.

Reviewers: Ismael Juma <ismael@juma.me.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants