Skip to content

KAFKA-6768; Transactional producer may hang in close with pending requests#4842

Merged
hachikuji merged 1 commit intoapache:trunkfrom
hachikuji:KAFKA-6768
Apr 9, 2018
Merged

KAFKA-6768; Transactional producer may hang in close with pending requests#4842
hachikuji merged 1 commit intoapache:trunkfrom
hachikuji:KAFKA-6768

Conversation

@hachikuji
Copy link
Copy Markdown
Contributor

This patch fixes an edge case in producer shutdown which prevents close() from completing due to a pending request which will never be sent due to shutdown initiation. I have added a test case which reproduces the scenario.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@hachikuji
Copy link
Copy Markdown
Contributor Author

cc @huxihx Looks like we missed this in #4563.


AbstractRequest.Builder<?> requestBuilder = nextRequestHandler.requestBuilder();
while (running) {
while (!forceClose) {
Copy link
Copy Markdown
Contributor Author

@hachikuji hachikuji Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem was basically that initiateClose set running to true, which prevented the transactional request from being sent. That in turn kept batched records stuck in the accumulator and prevented shutdown from completing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (for documentation): initiateClose would set running to false.

Copy link
Copy Markdown
Contributor

@apurvam apurvam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hachikuji
Copy link
Copy Markdown
Contributor Author

For the record, I was able to reproduce the failure in 4/32 runs of our transaction system tests. With this patch, I did not see any failures in 32 runs.

@hachikuji hachikuji merged commit 0a8f35b into apache:trunk Apr 9, 2018
ying-zheng pushed a commit to ying-zheng/kafka that referenced this pull request Jul 6, 2018
…uests (apache#4842)

This patch fixes an edge case in producer shutdown which prevents `close()` from completing due to a pending request which will never be sent due to shutdown initiation. I have added a test case which reproduces the scenario.

Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants