Skip to content

DataSegment memory optimizations#5094

Merged
gianm merged 18 commits intoapache:masterfrom
metamx:fat-data-segments
Dec 12, 2017
Merged

DataSegment memory optimizations#5094
gianm merged 18 commits intoapache:masterfrom
metamx:fat-data-segments

Conversation

@leventov
Copy link
Copy Markdown
Member

  • Deduplicated DataSegment contents (loadSpec's keys, dimensions and metrics lists as a whole) more aggressively;
  • Used ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average;
  • Pruned DataSegment.loadSpec on Brokers

…trics lists as a whole) more aggressively; use ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average; prune DataSegment.loadSpec on brokers
@leventov
Copy link
Copy Markdown
Member Author

Is it expected that we have two almost identical classes called "DataSegmentTest" in different modules?

@himanshug
Copy link
Copy Markdown
Contributor

himanshug commented Nov 16, 2017

Is it expected that we have two almost identical classes called "DataSegmentTest" in different modules?

If they are testing different things and can't be merged into single class, then its fine. Or else, please merge them.

Pruned DataSegment.loadSpec on Brokers

Can we just drop them from announcements made from historicals/realtime-processes instead ? I think @nishantmonu51 already made a change at some point to optionally drop metric/dimension names.

@leventov
Copy link
Copy Markdown
Member Author

@himanshug could you point to a PR?

@himanshug
Copy link
Copy Markdown
Contributor

#2784

@leventov
Copy link
Copy Markdown
Member Author

@himanshug thanks. I think it's different because it doesn't affect the presence of the data on coordinator endpoints. And it's a more decoupled approach.

@himanshug
Copy link
Copy Markdown
Contributor

There are endpoints at coordinator ( https://github.com/druid-io/druid/blob/master/server/src/main/java/io/druid/server/http/MetadataResource.java ) to get source of truth details of a Segment information from DB.
So, I'm not sure if there is any benefit in keeping that information even in Coordinator memory.

In fact I would change even the default behavior to not publish dimension, metrics, loadSpecs information in announcements so that all users get that.

does anyone else have an opinion ? @gianm @drcrallen @nishantmonu51

@himanshug
Copy link
Copy Markdown
Contributor

it was decided in dev-sync today to go with current state of this PR wherein historicals announce the loadSpec and they are dropped by broker only.

@leventov
Copy link
Copy Markdown
Member Author

@himanshug @gianm could you please take a look?

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 27, 2017

@leventov i'll take a look now

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, couple minor comments.

return interner.intern(input);
}
};
private static final Interner<String> stringInterner = Interners.newWeakInterner();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well CAPITALIZE these if you are renaming them anyway.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

private static final Interner<List<String>> dimensionsInterner = Interners.newWeakInterner();
private static final Interner<List<String>> metricsInterner = Interners.newWeakInterner();
private static final Function<String, String> internFun = Interners.asFunction(stringInterner);
private static final Map<String, Object> PRUNED_LOAD_SPEC = ImmutableMap.of(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any APIs on the Broker that return DataSegment objects? If so, this needs a mention in release notes at least.

private static final Interner<String> stringInterner = Interners.newWeakInterner();
private static final Interner<List<String>> dimensionsInterner = Interners.newWeakInterner();
private static final Interner<List<String>> metricsInterner = Interners.newWeakInterner();
private static final Map<String, Object> PRUNED_LOAD_SPEC = ImmutableMap.of(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any APIs on the broker that return DataSegment objects? If so, this needs at least a mention on the release notes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also suggest adding a note to the docs for any relevant methods, something like "This method does not return loadSpecs for segments; if you need those then you can use the Coordinator API XXX."

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, as far as I can tell. It's exposed only in MetadataResource and DatasourcesResource, which are present only on Coordinator node.

@leventov
Copy link
Copy Markdown
Member Author

leventov commented Dec 8, 2017

@gianm do you have more comments on this PR?

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Dec 12, 2017

@leventov LGTM; sorry, am a few days behind on github email at the moment.

@gianm gianm merged commit 64848c7 into apache:master Dec 12, 2017
@leventov leventov deleted the fat-data-segments branch December 18, 2017 14:31
@jon-wei jon-wei added this to the 0.12.0 milestone Jan 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants