Skip to content

KAFKA-10221: Backport fix for KAFKA-9603 to 2.5#8987

Merged
vvcephei merged 2 commits intoapache:2.5from
cadonna:AK9603-fix_standbys_segmented_stores_2.5
Jul 8, 2020
Merged

KAFKA-10221: Backport fix for KAFKA-9603 to 2.5#8987
vvcephei merged 2 commits intoapache:2.5from
cadonna:AK9603-fix_standbys_segmented_stores_2.5

Conversation

@cadonna
Copy link
Copy Markdown
Member

@cadonna cadonna commented Jul 6, 2020

KAFKA-9603 reports that the number of open files keeps increasing
in RocksDB. The reason is that bulk loading is turned on but
never turned off in segmented state stores for standby tasks.

This bug was fixed in 2.6 through PR #8661 by using code that is not present in 2.5.
So cherry-picking was not possible.

This PR backports the fix to 2.5.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

KAFKA-9603 reports that the number of open files keeps increasing
in RocksDB. The reason is that bulk loading is turned on but
never turned off in segmented state stores for standby tasks.

This bug was fixed in 2.6 by using code that is not present in 2.5.

This PR backports the fix to 2.5.
@cadonna
Copy link
Copy Markdown
Member Author

cadonna commented Jul 6, 2020

Call for review: @vvcephei @ableegoldman @guozhangwang @mjsax @abbccdda

@vvcephei
Copy link
Copy Markdown
Contributor

vvcephei commented Jul 7, 2020

Test this please

Copy link
Copy Markdown
Contributor

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, @cadonna !

I just have one request, because I had to read the one-line fix about 10 times to understand how it works. Elegant! :)

// have the local RocksDB instance for the segment. In this case, toggleDBForBulkLoading
// will only close the database and open it again with bulk loading enabled.
if (!bulkLoadSegments.contains(segment)) {
if (!bulkLoadSegments.contains(segment) && context instanceof ProcessorContextImpl) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah, this is subtle. IIUC, the fix works by asserting that we should only enable bulk loading if the provided context is a ProcessorContextImpl, which is the kind of context that is only provided when adding the store to an active task.

This seems correct to me, and although it's very subtle, it also seems ok as a patch for an older codebase that won't need to be maintained much. But maybe we can have a comment, or an internal method for the check, like

Suggested change
if (!bulkLoadSegments.contains(segment) && context instanceof ProcessorContextImpl) {
if (!bulkLoadSegments.contains(segment) && isStoreForActiveTask(context)) {

so that it'll be more obvious what's going on here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Will do!

@vvcephei
Copy link
Copy Markdown
Contributor

vvcephei commented Jul 7, 2020

Retest this please

@vvcephei
Copy link
Copy Markdown
Contributor

vvcephei commented Jul 7, 2020

Test this please

1 similar comment
@vvcephei
Copy link
Copy Markdown
Contributor

vvcephei commented Jul 7, 2020

Test this please

@vvcephei
Copy link
Copy Markdown
Contributor

vvcephei commented Jul 7, 2020

Retest this please

@mjsax mjsax added the streams label Jul 7, 2020
@vvcephei
Copy link
Copy Markdown
Contributor

vvcephei commented Jul 8, 2020

The test failure was unrelated.
kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

Thanks for the backport, @cadonna ! I'll go ahead and merge now.

@vvcephei vvcephei merged commit 17f9f3a into apache:2.5 Jul 8, 2020
@cadonna cadonna deleted the AK9603-fix_standbys_segmented_stores_2.5 branch July 29, 2020 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants