Skip to content

[Improvement] DataSegment intern improvement (reduce 60% memory consume on coordinator)#8165

Closed
pzhdfy wants to merge 3 commits intoapache:masterfrom
pzhdfy:datasegment_intern
Closed

[Improvement] DataSegment intern improvement (reduce 60% memory consume on coordinator)#8165
pzhdfy wants to merge 3 commits intoapache:masterfrom
pzhdfy:datasegment_intern

Conversation

@pzhdfy
Copy link
Copy Markdown
Contributor

@pzhdfy pzhdfy commented Jul 26, 2019

Description

In our druid cluster, we have 10 millions active segments. And we set load rule with 2 replications.
Then we find coordinator consume 50GB memory and cause GC problem.

We dump the JVM and analysis the memory, then we found about 30 millions DataSegment objects.
This is because one segment will generate 3 DataSegment objects.
One is from poll DB in SQLMetadataSegmentManager
The other two is from zookeeper announcement(2 replications) in BatchServerInventoryView

This three DataSegment objects are usually the same,
So can use
Interner DATA_SEGMENT_INTERNER = Interners.newWeakInterner();
to deduplicate.

1.When poll from DB or read from Znode, we use DATA_SEGMENT_INTERNER to deduplicate.
2.When poll from DB, always update loadSpec in DataSegment, this is useful when deep storage migration.
3.When read from Znode, skip intern the realtime node.Because segment from realtime is short-time living and has incorrect size, dimensions,loadSpect

With this improvement.
The memory consume in coordinator reduces to 20GB.
And this is also useful in broker, from 35G to 18GB.

@leventov leventov self-requested a review July 26, 2019 16:24
private static final Interner<String> STRING_INTERNER = Interners.newWeakInterner();
private static final Interner<List<String>> DIMENSIONS_INTERNER = Interners.newWeakInterner();
private static final Interner<List<String>> METRICS_INTERNER = Interners.newWeakInterner();
private static final Interner<DataSegment> DATA_SEGMENT_INTERNER = Interners.newWeakInterner();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This humongous Map with millions of weak references would be a problem for GC in itself: see #6357 for context.

You should adopt the design with BatchServerInverntoryView and SQLSegmentMetadataManager probing into each other's memory, similarly to what is explained here: #7395 (comment)

private final Map<String, Object> loadSpec;
private final List<String> dimensions;
private final List<String> metrics;
private volatile Map<String, Object> loadSpec;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making DataSegment non-immutable was ruled out in this discussion: #7571. Please read it in full. I proposed a solution here: #7571 (comment). Please check if you can implement it in this PR.

{
DataSegment result = DATA_SEGMENT_INTERNER.intern(dataSegment);
if (updateLoadSpec) {
result.dimensions = dataSegment.dimensions;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think DataSegment shouldn't become mutable. However, it would be nice if you would solve this problem in this PR: #6358.

@drcrallen
Copy link
Copy Markdown
Contributor

Some history on interning and data segments for any archeologists out there: #3238
and its downfall #3286

@stale
Copy link
Copy Markdown

stale Bot commented Oct 6, 2019

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale Bot added the stale label Oct 6, 2019
@stale
Copy link
Copy Markdown

stale Bot commented Nov 3, 2019

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants