Skip to content

KAFKA-7194; Fix buffer underflow if onJoinComplete is retried after failure#5417

Merged
rajinisivaram merged 2 commits intoapache:trunkfrom
hachikuji:KAFKA-7194
Jul 24, 2018
Merged

KAFKA-7194; Fix buffer underflow if onJoinComplete is retried after failure#5417
rajinisivaram merged 2 commits intoapache:trunkfrom
hachikuji:KAFKA-7194

Conversation

@hachikuji
Copy link
Copy Markdown
Contributor

An untimely wakeup can cause ConsumerCoordinator.onJoinComplete to throw a WakeupException before completion. On the next poll(), it will be retried, but this leads to an underflow error because the buffer containing the assignment data will already have been advanced. The solution is to duplicate the buffer passed to onJoinComplete.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@hachikuji hachikuji force-pushed the KAFKA-7194 branch 2 times, most recently from 1b85e29 to fb028f1 Compare July 23, 2018 23:18
Copy link
Copy Markdown
Contributor

@kkonstantine kkonstantine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hachikuji ! Makes perfect sense.
LGTM!

// if there are any metadata changes affecting any of the consumed partitions (whether or not this
// instance is subscribed to the topics).
this.metadata.setTopics(subscriptions.groupSubscription());
if (!client.ensureFreshMetadata(Long.MAX_VALUE)) throw new TimeoutException();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hachikuji any brief comment about why this is not needed any more?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a couple reasons. Primarily it seemed like an unnecessary optimization which added some complicated failures to think about. We have already updated the internal assignment state at this point, but if this call raises an exception, then the user may see this updated state prior to having their onPartitionsAssigned callback invoked. I was also happy to get rid of one of the final cases of indefinite blocking in the consumer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! Thanks

Copy link
Copy Markdown
Contributor

@rajinisivaram rajinisivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hachikuji Thanks for the PR, LGTM. Merging to trunk and 2.0.

@rajinisivaram rajinisivaram merged commit c83ecf4 into apache:trunk Jul 24, 2018
rajinisivaram pushed a commit that referenced this pull request Jul 24, 2018
…ailure (#5417)

An untimely wakeup can cause ConsumerCoordinator.onJoinComplete to throw a WakeupException before completion. On the next poll(), it will be retried, but this leads to an underflow error because the buffer containing the assignment data will already have been advanced. The solution is to duplicate the buffer passed to onJoinComplete.

Reviewers: Konstantine Karantasis <konstantine@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants