Skip to content

KAFKA-8637: WriteBatch objects leak off-heap memory#7050

Merged
guozhangwang merged 2 commits intoapache:trunkfrom
ableegoldman:rocksdb-leak
Jul 15, 2019
Merged

KAFKA-8637: WriteBatch objects leak off-heap memory#7050
guozhangwang merged 2 commits intoapache:trunkfrom
ableegoldman:rocksdb-leak

Conversation

@ableegoldman
Copy link
Copy Markdown
Member

@ableegoldman ableegoldman commented Jul 8, 2019

Should be cherry-picked back to 2.3 (picked from 2.2 to 2.1 in 7077 )

@ableegoldman
Copy link
Copy Markdown
Member Author

}

dbAccessor.close();
db.close();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment and maybe link to RocksDB docs to clarify the correct order of close() calls? Otherwise, we might forget about it and reorder accidentally...

Also: this may affect more versions compared to the memory leak? Should we extract it into it's own PR to easily backport?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking again at the docs https://github.com/facebook/rocksdb/wiki/RocksJava-Basics I am still not sure if the order is now completely correct.
The doc says ColumnFamilyHandle > RocksDB > DBOptions > ColumnFamilyOptions
but the code does ColumnFamilyHandle > ColumnFamilyOptions > DBOptions > RocksDB due to userSpecifiedOptions.
Additionally, only some are set to null afterwards, does this have a special reason?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is moving the db.close() to before userSpecifiedOptions.close() so aren't we now closing the db in the right order? That said, the docs do suggest we should flip the order in which DBOptions & ColumnFamilyOptions are closed in RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapter#close (which, side note, is an obscenely long name...)

But Matthias is right, this should be split into a separate PR (and well documented) -- will open another one

Also, I think on this branch everything is set to null -- but if you were looking at the 2.2 branch, yeah I did forget to set filter to null. We were rushing to get that in before 2.2.1 was cut 😄

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting out #7076 @ableegoldman

Please review the new PR, too, @pkleindl

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.
My question regarding the null setting was about the DBOptions & ColumnFamilyOptions in RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapter, why they are not set to null too.
As for the long name, it's probably a german language thing ;-)

final S segment = entry.getKey();
final WriteBatch batch = entry.getValue();
segment.write(batch);
batch.close();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test this this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure what a good way to test this would be...thoughts?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be tricky. We would need to intercept the call to new WriteBatch within getWriteBatches() using power-mock and later verify if close() was called on the mock. Might not be worth it...

@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Jul 10, 2019

retest this please

2 similar comments
@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Jul 11, 2019

retest this please

@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Jul 11, 2019

retest this please

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Jul 12, 2019

Java11 2.12 timed out. Java11 2.13 and Java8 failed with flaky tests.

Retest this please.

@ableegoldman
Copy link
Copy Markdown
Member Author

Java 8 passed, Java11 2.13 failed with flaky tests. retest this, please

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Jul 15, 2019

Java8 and 11/2.12 passed. 11/2.13 failed (test result not available any longer).

Retest this please.

@guozhangwang guozhangwang merged commit 8cefe97 into apache:trunk Jul 15, 2019
guozhangwang pushed a commit that referenced this pull request Jul 15, 2019
Should be cherry-picked back to 2.3 (picked from 2.2 to 2.1 in 7077 )

Reviewers: pkleindl <44436474+pkleindl@users.noreply.github.com>, Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
@guozhangwang
Copy link
Copy Markdown
Contributor

Cherry-picked to 2.3 as well.

ijuma pushed a commit to confluentinc/kafka that referenced this pull request Jul 16, 2019
Should be cherry-picked back to 2.3 (picked from 2.2 to 2.1 in 7077 )

Reviewers: pkleindl <44436474+pkleindl@users.noreply.github.com>, Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
ijuma added a commit to confluentinc/kafka that referenced this pull request Jul 20, 2019
* apache-github/2.3:
  MINOR: Update documentation for enabling optimizations (apache#7099)
  MINOR: Remove stale streams producer retry default docs. (apache#6844)
  KAFKA-8635; Skip client poll in Sender loop when no request is sent (apache#7085)
  KAFKA-8615: Change to track partition time breaks TimestampExtractor (apache#7054)
  KAFKA-8670; Fix exception for kafka-topics.sh --describe without --topic mentioned (apache#7094)
  KAFKA-8602: Separate PR for 2.3 branch (apache#7092)
  KAFKA-8530; Check for topic authorization errors in OffsetFetch response (apache#6928)
  KAFKA-8662; Fix producer metadata error handling and consumer manual assignment (apache#7086)
  KAFKA-8637: WriteBatch objects leak off-heap memory (apache#7050)
  KAFKA-8620: fix NPE due to race condition during shutdown while rebalancing (apache#7021)
  HOT FIX: close RocksDB objects in correct order (apache#7076)
  KAFKA-7157: Fix handling of nulls in TimestampConverter (apache#7070)
  KAFKA-6605: Fix NPE in Flatten when optional Struct is null (apache#5705)
  Fixes apache#8198 KStreams testing docs use non-existent method pipe (apache#6678)
  KAFKA-5998: fix checkpointableOffsets handling (apache#7030)
  KAFKA-8653; Default rebalance timeout to session timeout for JoinGroup v0 (apache#7072)
  KAFKA-8591; WorkerConfigTransformer NPE on connector configuration reloading (apache#6991)
  MINOR: add upgrade text (apache#7013)
  Bump version to 2.3.1-SNAPSHOT
xiowu0 pushed a commit to linkedin/kafka that referenced this pull request Aug 22, 2019
…ap memory (apache#7050)

TICKET = KAFKA-8637
LI_DESCRIPTION =
EXIT_CRITERIA = HASH [a32918b]
ORIGINAL_DESCRIPTION =

Should be cherry-picked back to 2.3 (picked from 2.2 to 2.1 in 7077 )

Reviewers: pkleindl <44436474+pkleindl@users.noreply.github.com>, Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
(cherry picked from commit a32918b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants