-
-
Notifications
You must be signed in to change notification settings - Fork 748
Description
Background
Today Dask decides where to place a task based on an "earliest start time" heuristic. For a given task it tries to find the worker on which it will start the soonest. It does this by computing two values:
- The amount of work currently on the worker (currently measured by occupancy)
- The amount of time it would take to transfer all dependencies not on the worker to that worker
This is what is computed by the worker_objective function
Problem
However, this is incorrect if the task that we're considering has higher priority than some of the tasks currently running on that worker. In that case looking at the occupancy of the worker is incorrect, because this task gets to cut in line.
But in general while we can count all of the work that is higher priority than this task, that might be somewhat expensive, especially in cases where there are lots of tasks on a worker (which is common). This might be the kind of thing where Cython can save us, but even then I'm not sure.
Proposed solutions
Let's write down a few possible solutions:
- Brute force: we can look at all possible tasks in
ws.processingand count up the amount of occupancy of tasks with higher priority - Ignore: We could ignore occupancy altogether, and just let work stealing take charge
- Middle ground: We could randomly take a few tasks in
ws.processing(maybe four?) and see where we stand among those four. If we're worse then all of them then great, we take the full brunt of occupancy. If we're better than all of them then we take 0%. If we're in the middle then we take 50% and so on. - fancy: we maintain some sort of t-digest per worker. This seems extreme, but we would only need to track like three quantile values for this to work well most of the time.
- less fancy: maybe we track min/max/mean and blend between them?
3 and 5 seem like the most probable. Each has some concerns:
- Sampling: I'm not sure how best to get these items.
iter(seq(...))is ordered these days, and so not a great sample.random.sampleis somewhat expensive.%timeit random.sample(list(d), 2)takes 8us for me for a dict with 1000 items. - min/max/mean: Our priorities are hierarchical, and so mean (or any quantile) is a little wonky.
Importance
This is, I suspect, especially important in workloads where we have lots of rootish tasks, which are common. The variance among all of those tasks can easily swamp the signal that tasks should stay where their data is.