Skip to content

Co-assign neighboring tasks to neighboring workers #4892

@mrocklin

Description

@mrocklin

Edit: This proposal is now defunct(ish). Read below for updated proposal.

Sometimes we assign tasks that are closely related to different machines, resulting in unnecessary communication. The most simple case is when we have a couple of siblings

   c
  / \
 /   \
a     b

Because neither a nor b have dependencies, they are assigned somewhat randomly onto workers. However because c requires them both one or the other will have to move across the network. This can cause excess communication which can bog down the system.

This has come up several times, notably in #2602 and ocean-transport/coiled_collaboration#3 and #4864 . There are a few different things that we can do to solve these issues. This issue contains one part of a solution, assigning sibling tasks to the same worker.

Straightforward (but slow) solution

@fjetter provided a solution here #4864 (comment)

which involves iterating up through all dependents of a task, and then back down through all of those tasks dependencies, in order to find similar machines. This doesn't work in the general case (we have to navigate through k-squared links, where k is the number of dependencies/dependents) but it might be ok in a restricted case.

Restricted case

In particular we seem to be interested in the following conditions:

  1. The task has no dependencies (otherwise we would schedule wherever those are)
  2. The task has very few dependents, possibly only one
  3. That dependent has very few dependencies, maybe less than five? (ten?)
  4. There are many more tasks than workers, so that we're comfortable losing some concurrency here
  5. The task we're thinking about is sizeable, and not something that we would want to move, relative to it's computation time

I suspect that there is an order by which we could check these conditions that is fast in all of the cases where it matters. Additionally, if we have several dependencies to check then we might consider just accepting the location of the very first dependency that is already assigned, and skip checking the rest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions