Skip to content

Improve heap usage for IncrementalIndex#2228

Merged
fjy merged 1 commit intoapache:masterfrom
metamx:incremental-index-mem2
Jan 13, 2016
Merged

Improve heap usage for IncrementalIndex#2228
fjy merged 1 commit intoapache:masterfrom
metamx:incremental-index-mem2

Conversation

@nishantmonu51
Copy link
Copy Markdown
Member

With current code OnheapIncrementalIndex ends up creating a new object of ColumnSelectorFactory (24 bytes each) and ColumnSelector ( 24 bytes) for every aggregator for each druid row.

This means an overhead of 48 bytes * number of aggs per row which becomes significant as the number of aggregators are increased. e.g for 1M rows each having 20 aggregators it turns out to be 800Mb.

This PR aims at removing this overhead by reusing the ColumnSelectorFactory and ColumnSelector by caching the selector objects.

For measuring the impact on heap usage for aggregators I created an IncrementalIndex with 1M rows each row having 20 longsum aggregators and 1 dimension and got an overall reduction in heap size from 1.9G to 1G. ( ~50% improvement)

Actual improvements in the index size will vary with distribution of number of aggregators and dimensions in IncrementalIndex.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Jan 7, 2016

👍 I think this looks cool and it is crazy we missed this optimization :P

@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Jan 7, 2016

👍 looks good to me

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do those need to be concurrent maps? I don't think ColumnSelectors are threadsafe but I also don't think we ever share columnselectorfactories across multiple threads.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about to use Cache in Guava, which also can specify various options, including expire policy.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be accessed by multiple threads in case of groupBy queryies,
dint used Guava Cache as we don't need expiration policies here, since if we expire entries we will end up creating multiple selector objects.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nishantmonu51 Currently ColumnSelectorFactory.makeXXXXColumnSelector – such as this one – are not thread safe, this may be an oversight in the groupBy query engine. The resulting ColumnSelectors maybe be thread safe but not the methods that create them. We may want to investigate what needs to be done there.

Doesn't need to be for this PR, but I think the guarantees provided by the different methods are not clear. Maybe @cheddar has some input?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for groupBy, I particularly referred to the case when multiple threads adds rows to the same IncrementalIndex, the add method to IncIndex needs to be thread safe, for traversing the rows from the segments using the XXXColumnSelector uses single thread and doesnt need these guarantees.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheddar any suggestions on whether we need to make it thread safe or not ?

@nishantmonu51 nishantmonu51 force-pushed the incremental-index-mem2 branch from 4e7fdf4 to b9dec22 Compare January 8, 2016 06:54
@nishantmonu51 nishantmonu51 added this to the 0.9.0 milestone Jan 12, 2016
…in each row

clear selectors on close.

Add comments about thread safety.
@nishantmonu51 nishantmonu51 force-pushed the incremental-index-mem2 branch from b9dec22 to 4863e2c Compare January 12, 2016 19:16
@nishantmonu51
Copy link
Copy Markdown
Member Author

@xvrl Added code comments about thread safety.

@xvrl
Copy link
Copy Markdown
Member

xvrl commented Jan 12, 2016

👍 looks good to me

@drcrallen
Copy link
Copy Markdown
Contributor

Checking out thread safety aspect as I'm familiar with this part of the code. Give me a min.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Jan 13, 2016

👍

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Jan 13, 2016

again :D

@drcrallen
Copy link
Copy Markdown
Contributor

The comment is correct about the implementation of add to facts needing to be thread safe. The column selector impl provided to cache the column selectors looks correct for the case of column selectors used.

fjy added a commit that referenced this pull request Jan 13, 2016
Improve heap usage for IncrementalIndex
@fjy fjy merged commit 4c014c1 into apache:master Jan 13, 2016
@drcrallen drcrallen deleted the incremental-index-mem2 branch January 13, 2016 22:48
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Jan 13, 2016

@drcrallen missed your comment, sorry!

@fjy fjy mentioned this pull request Feb 5, 2016
@fjy fjy added the Improvement label Feb 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants