Skip to content

Do persist IncrementalIndex in another thread in IndexGeneratorReducer#2149

Merged
drcrallen merged 1 commit intoapache:masterfrom
binlijin:master
Jan 21, 2016
Merged

Do persist IncrementalIndex in another thread in IndexGeneratorReducer#2149
drcrallen merged 1 commit intoapache:masterfrom
binlijin:master

Conversation

@binlijin
Copy link
Copy Markdown
Contributor

Current in IndexGeneratorReducer, the reduce build a small incremental index then persist it until there is no more row, finally merge the persisted indexs.
This patch is to new background threads to do small incremental index persist. Using this feature causes a notable increase in memory pressure and cpu usage, but will make the job finish more quickly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most places in code would use Futures, and then call Futures.allAsList(futures).get(1, TimeUnit.HOURS) or similar.

The catch with that approach is you need to be able to have the incremental index garbage collected, so you have to eliminate hard references to the incremental index in the future.

@drcrallen
Copy link
Copy Markdown
Contributor

This PR has significant memory pressure changes in the reducer and changes the default behavior.

With this PR the JVM now holds onto 2 incremental index objects and the persist objects at the same time (instead of 1 incremental index object and the persist objects). This is a notable increase in memory pressure and should not be enabled by default.

To get around such constraints, the executor service can default to the sameThreadExecutorService and use a blocking service with a set backpressure size as an option.

Such an option could be represented by "io.druid.index.persist.background.count" or similar, which defaults to 0. In the case of 0 the sameThreadExecutorService can be used, in the case of > 0 the executor service with a blocking queue could have its capacity set to the config value. In the case of < 0 is an error.

There are use cases where this can be very handy, but this PR needs some major JVM heap pressure benchmarks before such behavior can be turned on by default.

@binlijin
Copy link
Copy Markdown
Contributor Author

Yes, the memory will increase so may be we can decrease the rowFlushBoundary.

@binlijin binlijin closed this Dec 29, 2015
@binlijin binlijin reopened this Dec 29, 2015
@gianm
Copy link
Copy Markdown
Contributor

gianm commented Dec 29, 2015

Agree with @drcrallen, would be good to have this as an option that is off by default for reasons of increased memory pressure and increased cpu usage (2 threads instead of 1).

@binlijin
Copy link
Copy Markdown
Contributor Author

@drcrallen @gianm
Yes, with the latest patch this feature is an option and turn off by default .
And i do not know why travis fail...

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Dec 29, 2015

@binlijin can you pull from master and merge into this PR? it'll help with the failing travis-ci checks

@binlijin binlijin closed this Dec 30, 2015
@binlijin binlijin reopened this Dec 30, 2015
@binlijin
Copy link
Copy Markdown
Contributor Author

@fjy Yes, it is ok now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread.currentThread.interrupt() to reset interrupted flag status?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u use com.google.common.base.Preconditions ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@binlijin binlijin closed this Jan 6, 2016
@binlijin binlijin reopened this Jan 6, 2016
@binlijin binlijin closed this Jan 6, 2016
@binlijin binlijin reopened this Jan 6, 2016
@binlijin
Copy link
Copy Markdown
Contributor Author

@drcrallen @gianm @himanshug what about now?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of new background threads to use for incremental persists. Using this feature causes a notable increase in memory pressure and cpu usage, but will make the job finish more quickly. If changing from the default of 0 (use current thread for persists), we recommend setting it to 1.

@himanshug
Copy link
Copy Markdown
Contributor

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use Execs.newBlockingSingleThreaded(..) instead ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Execs.newBlockingSingleThreaded(..) only have one background thread to persist incremental Index, so i have not use it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@binlijin binlijin closed this Jan 14, 2016
@binlijin binlijin reopened this Jan 14, 2016
@binlijin
Copy link
Copy Markdown
Contributor Author

What else can i do for the merge?we use this feature in our hadoop build job.

@binlijin binlijin closed this Jan 19, 2016
@binlijin binlijin reopened this Jan 19, 2016
@binlijin binlijin closed this Jan 20, 2016
@xvrl
Copy link
Copy Markdown
Member

xvrl commented Jan 20, 2016

@binlijin can we put a description in the PR that explains what problem this is solving?

@binlijin
Copy link
Copy Markdown
Contributor Author

rebase

@binlijin binlijin reopened this Jan 20, 2016
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Jan 20, 2016

@binlijin Can you update the description of the problem being solved with this PR?

@binlijin
Copy link
Copy Markdown
Contributor Author

@xvrl @fjy update the description

@nishantmonu51
Copy link
Copy Markdown
Member

👍, this looks good to me,
@drcrallen : I think this can be merged, if you dont have any outstanding comments ?

@fjy fjy added this to the 0.9.0 milestone Jan 20, 2016
drcrallen added a commit that referenced this pull request Jan 21, 2016
Do persist IncrementalIndex in another thread in IndexGeneratorReducer
@drcrallen drcrallen merged commit 2a69a58 into apache:master Jan 21, 2016
@drcrallen
Copy link
Copy Markdown
Contributor

👍

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numBackgroundPersistThreads would be more consistent with our other properties, such as druid.processing.numThreads. We should try to keep property naming consistent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xvrl, this has been merged, file a new PR to rename the properties?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants