Use thread priorities. (aka set nice values for background-like tasks)#984
Conversation
8b0ec82 to
d05032b
Compare
7171d10 to
89dca15
Compare
f04d7e3 to
4a3d9cf
Compare
|
I think we need to test this PR a bit more to understand if the possible benefits justify the code complexity. I also think we should make sure that persists will not be starved out with these changes. |
|
I did simple tests, and when running a local firehose (so just firing off events as fast as they can be parsed from TPCH file) then the realtime index task runs in about 20% LESS TIME (good) to get to the first persist-n-merge when the persist and the merge thread pools are LOW priority as compared to NORM. Executing any queries on the nodes tends to have no measurable impact on the total ingestion time on my local machine (<50% cpu usage for ingest, persist, merge, and query all combined). I did not wait for the persist-n-merge to complete. The reason is that looking at the thread breakout, the persist and realtime index task (plumber) are the ones that run concurrently, with the merge task running all by its lonesome. This is on my Mac though, so mileage on Linux may vary since the schedulers are not exactly the same. |
There was a problem hiding this comment.
can we add a link to the documentation explaining this parameter
There was a problem hiding this comment.
if this is some weird undocumented behavior in the vm that we rely on, I would prefer we simply put a note in the documentation about this and let users set the parameter themselves in the forking task runner vm parameter.
There was a problem hiding this comment.
https://docs.oracle.com/cd/E15289_01/doc.40/e15062/optionxx.htm#BABGBFHF
Without setting the command line flags the behavior is disabled at the JVM level
63f027f to
0027b91
Compare
There was a problem hiding this comment.
I think it would greatly simplify the code if we removed this method and just defaulted to null, only setting the priority if it is actually set to something, rather than assuming the default is normal.
There was a problem hiding this comment.
the default IS normal.
There was a problem hiding this comment.
And the code would actually be more complex since you couldn't have the cases with no priority call the cases with priority with NULL values
There was a problem hiding this comment.
the default is technically the priority of the creating thread.
Even if in our case this might be normal all the time, I think it will make things less cluttered to not not define a default everywhere and leave it null
afe4a04 to
109d92a
Compare
|
@nishantmonu51 I updated this PR to use your Task context |
109d92a to
57adf0a
Compare
There was a problem hiding this comment.
With the changes in #1679,
there will be two context values "priority" and "taskPriority" which looks similar but have different semantics.
Can we rename this to taskThreadPriority to be more explicit ?
There was a problem hiding this comment.
renaming to backgroundThreadPriority would also be fine.
|
we should also add docs for the new configs. |
|
This PR needs to be rebased. |
|
@thedrow yes, and I'm supposed to have a meeting with @himanshug and some other folks on task priority. |
|
Then I'll worry about rebasing against current master |
|
@drcrallen Any news on this one? It should be useful if merged. |
57adf0a to
5166ebe
Compare
|
@nishantmonu51 how would you feel about leaving it as an advanced feature for now? |
f8d7e5e to
1dfe80f
Compare
|
👍 |
1dfe80f to
8a43d77
Compare
|
Conflicts were trivial. All around extra config options |
There was a problem hiding this comment.
docs and good default values required
There was a problem hiding this comment.
Defaults revert to prior behavior and do not set thread priority.
* Defaults the thread priority to java.util.Thread.NORM_PRIORITY in io.druid.indexing.common.task.AbstractTask
* Each exec service has its own Task Factory which is assigned a priority for spawned task. Therefore each priority class has a unique exec service
* Added priority to tasks as taskPriority in the task context. <0 means low, 0 means take default, >0 means high. It is up to any particular implementation to determine how to handle these numbers
* Add options to ForkingTaskRunner
* Add "-XX:+UseThreadPriorities" default option
* Add "-XX:ThreadPriorityPolicy=42" default option
* AbstractTask - Removed unneded @JsonIgnore on priority
* Added priority to RealtimePlumber executors. All sub-executors (non query runners) get Thread.MIN_PRIORITY
* Add persistThreadPriority and mergeThreadPriority to realtime tuning config
8a43d77 to
2e1d6aa
Compare
|
👍 |
Use thread priorities. (aka set `nice` values for background-like tasks)
taskPrioritypriority as a task context parameter. <0 means low, 0 means take default, >0 means high. It is up to any particular implementation to determine how to handle these numbers-XX:+UseThreadPriorities-XX:ThreadPriorityPolicy=42Running with
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=1with root privileges enables thisRunning with
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=1WITHOUT root privileges DOES NOT enables this. And will give the warningJava HotSpot(TM) 64-Bit Server VM warning: -XX:ThreadPriorityPolicy requires root privilege on LinuxRunning with
-XX:+UseThreadPrioritiesdoes NOT enable thisRunning with
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42REGARDLESS of root privileges enables thisThis PR is to hopefully help when a task is running on a node that also needs to service queries (notably a realtime node doing persisting and merging)
The impact on query time can be seen in the image below.

As you can see, the maximum query time, and the query time during the long merge process, seem unaffected by this patch. While the persists are accumulating before the merge, the query time linearly increases (assumed proportional to the data intake, though I have nothing to back up that assumption). The max value reached before the persist-n-merge portion is also too close to make any sort of claim. What is obvious, however, is that the first part is completed notably faster.
It is worth noting that CGroups override nice values, so this setting should not interfere with CGroup resources for co-tenant workloads.