Skip to content

[Question] who runs the scheduler? #1

@justheuristic

Description

@justheuristic

First of all, thanks for the paper!
It was very intriguing to view model parallelism as an optimization problem in itself.

I wonder how would such scheduling work in a fully decentralized system?
Naively, you could run it concurrently on all nodes in hope that they find the same solution.

However, this naive option may be difficult to implement in geographically distributed networks: if nodes observe slightly different network bandwith, or if they take network measurements at a different time, they may end up with different solutions.

Is there a way to guarantee such network is consistent?
I mean, you can always elect a "leader" or let nodes vote on the solution, but perhaps there are more natural way to approach this.
What would you suggest?

p.s. another group that i'm in close contact faced similar issue their paper, and they ended up with a heuristic load-balancing rule where nodes greedily switch pipeline stages. However, unlike your work, they do not prove that such rule leads to optimal throughput.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions