Do persist IncrementalIndex in another thread in IndexGeneratorReducer#2149
Do persist IncrementalIndex in another thread in IndexGeneratorReducer#2149drcrallen merged 1 commit intoapache:masterfrom binlijin:master
Conversation
There was a problem hiding this comment.
Most places in code would use Futures, and then call Futures.allAsList(futures).get(1, TimeUnit.HOURS) or similar.
The catch with that approach is you need to be able to have the incremental index garbage collected, so you have to eliminate hard references to the incremental index in the future.
|
This PR has significant memory pressure changes in the reducer and changes the default behavior. With this PR the JVM now holds onto 2 incremental index objects and the persist objects at the same time (instead of 1 incremental index object and the persist objects). This is a notable increase in memory pressure and should not be enabled by default. To get around such constraints, the executor service can default to the sameThreadExecutorService and use a blocking service with a set backpressure size as an option. Such an option could be represented by "io.druid.index.persist.background.count" or similar, which defaults to 0. In the case of 0 the sameThreadExecutorService can be used, in the case of > 0 the executor service with a blocking queue could have its capacity set to the config value. In the case of < 0 is an error. There are use cases where this can be very handy, but this PR needs some major JVM heap pressure benchmarks before such behavior can be turned on by default. |
|
Yes, the memory will increase so may be we can decrease the rowFlushBoundary. |
|
Agree with @drcrallen, would be good to have this as an option that is off by default for reasons of increased memory pressure and increased cpu usage (2 threads instead of 1). |
|
@drcrallen @gianm |
|
@binlijin can you pull from master and merge into this PR? it'll help with the failing travis-ci checks |
|
@fjy Yes, it is ok now. |
There was a problem hiding this comment.
Thread.currentThread.interrupt() to reset interrupted flag status?
There was a problem hiding this comment.
can u use com.google.common.base.Preconditions ?
|
@drcrallen @gianm @himanshug what about now? |
There was a problem hiding this comment.
The number of new background threads to use for incremental persists. Using this feature causes a notable increase in memory pressure and cpu usage, but will make the job finish more quickly. If changing from the default of 0 (use current thread for persists), we recommend setting it to 1.
There was a problem hiding this comment.
can you use Execs.newBlockingSingleThreaded(..) instead ?
There was a problem hiding this comment.
Execs.newBlockingSingleThreaded(..) only have one background thread to persist incremental Index, so i have not use it.
|
What else can i do for the merge?we use this feature in our hadoop build job. |
|
@binlijin can we put a description in the PR that explains what problem this is solving? |
|
rebase |
|
@binlijin Can you update the description of the problem being solved with this PR? |
|
👍, this looks good to me, |
Do persist IncrementalIndex in another thread in IndexGeneratorReducer
|
👍 |
There was a problem hiding this comment.
numBackgroundPersistThreads would be more consistent with our other properties, such as druid.processing.numThreads. We should try to keep property naming consistent.
There was a problem hiding this comment.
@xvrl, this has been merged, file a new PR to rename the properties?
Current in IndexGeneratorReducer, the reduce build a small incremental index then persist it until there is no more row, finally merge the persisted indexs.
This patch is to new background threads to do small incremental index persist. Using this feature causes a notable increase in memory pressure and cpu usage, but will make the job finish more quickly.