Skip to content

KAFKA-7534: Error in flush calling close may prevent underlying store from closing#5833

Merged
mjsax merged 2 commits intoapache:trunkfrom
bbejeck:KAFKA-7534_flush_error_may_not_allow_proper_close
Oct 26, 2018
Merged

KAFKA-7534: Error in flush calling close may prevent underlying store from closing#5833
mjsax merged 2 commits intoapache:trunkfrom
bbejeck:KAFKA-7534_flush_error_may_not_allow_proper_close

Conversation

@bbejeck
Copy link
Copy Markdown
Member

@bbejeck bbejeck commented Oct 24, 2018

Calling the CachingKeyValueStore#close() method first calls CachingKeyValueStore.flush(). If there is an exception thrown during the flush call, the underlying store is not closed. Subsequently, another task can't open the RocksDB store and receives a No locks available exception.

I added a unit test that fails with the proposed fix.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@bbejeck
Copy link
Copy Markdown
Member Author

bbejeck commented Oct 24, 2018

ping @mjsax @vvcephei for reviews

try {
flush();
} finally {
underlying.close();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a nested try-catch to make use cache.close() is called, too? Not sure how critical it is to ensure that the cache is closed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that, but I took it out for the same reason thinking it's not important to ensure the cache is closed. But thinking about it some more, it doesn't hurt anything, so I'll put it back.

@bbejeck
Copy link
Copy Markdown
Member Author

bbejeck commented Oct 24, 2018

updated this

@bbejeck
Copy link
Copy Markdown
Member Author

bbejeck commented Oct 24, 2018

Java 11 build timed out.

retest this please

@bbejeck
Copy link
Copy Markdown
Member Author

bbejeck commented Oct 25, 2018

failure unrelated

retest this please

@vvcephei
Copy link
Copy Markdown
Contributor

LGTM. Thanks, @bbejeck .

@bbejeck
Copy link
Copy Markdown
Member Author

bbejeck commented Oct 25, 2018

retest this please

@bbejeck
Copy link
Copy Markdown
Member Author

bbejeck commented Oct 25, 2018

@omkreddy can we please get this PR into 2.0.1?

@bbejeck
Copy link
Copy Markdown
Member Author

bbejeck commented Oct 26, 2018

Both builds timed out after 3 hours

retest this please

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Oct 26, 2018

Java11: https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/293/testReport/junit/kafka.log/LogCleanerIntegrationTest/testMarksPartitionsAsOfflineAndPopulatesUncleanableMetrics/

kafka.log.LogCleanerIntegrationTest.testMarksPartitionsAsOfflineAndPopulatesUncleanableMetrics failed with

org.junit.runners.model.TestTimedOutException: test timed out after 15000 milliseconds

Merging. Java8 is green. \cc @vvcephei

@mjsax mjsax merged commit 57502a6 into apache:trunk Oct 26, 2018
mjsax pushed a commit that referenced this pull request Oct 26, 2018
… from closing (#5833)

Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>
mjsax pushed a commit that referenced this pull request Oct 26, 2018
… from closing (#5833)

Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>
mjsax pushed a commit that referenced this pull request Oct 26, 2018
… from closing (#5833)

Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>
mjsax pushed a commit that referenced this pull request Oct 26, 2018
… from closing (#5833)

Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>
@mjsax
Copy link
Copy Markdown
Member

mjsax commented Oct 26, 2018

Merged to trunk and cherry-picked to 2.1, 2.0, 1.1, and 1.0 branch. Will mark with fixed version 2.2.0 for now -- if new RCs are rolled, we should add other fixed versions accordingly.

pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
… from closing (apache#5833)

Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>
@bbejeck bbejeck deleted the KAFKA-7534_flush_error_may_not_allow_proper_close branch July 10, 2024 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants