DataSegment memory optimizations#5094
Conversation
…trics lists as a whole) more aggressively; use ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average; prune DataSegment.loadSpec on brokers
|
Is it expected that we have two almost identical classes called "DataSegmentTest" in different modules? |
If they are testing different things and can't be merged into single class, then its fine. Or else, please merge them.
Can we just drop them from announcements made from historicals/realtime-processes instead ? I think @nishantmonu51 already made a change at some point to optionally drop metric/dimension names. |
|
@himanshug could you point to a PR? |
|
@himanshug thanks. I think it's different because it doesn't affect the presence of the data on coordinator endpoints. And it's a more decoupled approach. |
|
There are endpoints at coordinator ( https://github.com/druid-io/druid/blob/master/server/src/main/java/io/druid/server/http/MetadataResource.java ) to get source of truth details of a Segment information from DB. In fact I would change even the default behavior to not publish dimension, metrics, loadSpecs information in announcements so that all users get that. does anyone else have an opinion ? @gianm @drcrallen @nishantmonu51 |
|
it was decided in dev-sync today to go with current state of this PR wherein historicals announce the loadSpec and they are dropped by broker only. |
…rename TestHelper.getJsonMapper() to makeJsonMapper()
|
@himanshug @gianm could you please take a look? |
|
@leventov i'll take a look now |
gianm
left a comment
There was a problem hiding this comment.
Generally LGTM, couple minor comments.
| return interner.intern(input); | ||
| } | ||
| }; | ||
| private static final Interner<String> stringInterner = Interners.newWeakInterner(); |
There was a problem hiding this comment.
Might as well CAPITALIZE these if you are renaming them anyway.
| private static final Interner<List<String>> dimensionsInterner = Interners.newWeakInterner(); | ||
| private static final Interner<List<String>> metricsInterner = Interners.newWeakInterner(); | ||
| private static final Function<String, String> internFun = Interners.asFunction(stringInterner); | ||
| private static final Map<String, Object> PRUNED_LOAD_SPEC = ImmutableMap.of( |
There was a problem hiding this comment.
Are there any APIs on the Broker that return DataSegment objects? If so, this needs a mention in release notes at least.
| private static final Interner<String> stringInterner = Interners.newWeakInterner(); | ||
| private static final Interner<List<String>> dimensionsInterner = Interners.newWeakInterner(); | ||
| private static final Interner<List<String>> metricsInterner = Interners.newWeakInterner(); | ||
| private static final Map<String, Object> PRUNED_LOAD_SPEC = ImmutableMap.of( |
There was a problem hiding this comment.
Are there any APIs on the broker that return DataSegment objects? If so, this needs at least a mention on the release notes.
There was a problem hiding this comment.
Would also suggest adding a note to the docs for any relevant methods, something like "This method does not return loadSpecs for segments; if you need those then you can use the Coordinator API XXX."
There was a problem hiding this comment.
No, as far as I can tell. It's exposed only in MetadataResource and DatasourcesResource, which are present only on Coordinator node.
|
@gianm do you have more comments on this PR? |
|
@leventov LGTM; sorry, am a few days behind on github email at the moment. |
DataSegmentcontents (loadSpec's keys, dimensions and metrics lists as a whole) more aggressively;ArrayMapinstead of defaultLinkedHashMapfor DataSegment.loadSpec, because they have only 3 entries on average;