[Question] who runs the scheduler?

First of all, thanks for the paper!
It was very intriguing to view model parallelism as an optimization problem in itself.

I wonder how would such scheduling work in a fully decentralized system?
Naively, you could run it concurrently on all nodes in hope that they find the same solution.

However, this naive option may be difficult to implement in geographically distributed networks: if nodes observe slightly different network bandwith, or if they take network measurements at a different time, they may end up with different solutions.

Is there a way to guarantee such network is consistent?
I mean, you can always elect a "leader" or let nodes vote on the solution, but perhaps there are more natural way to approach this.
What would you suggest?

p.s. another group that i'm in close contact faced similar issue [their paper](https://openreview.net/pdf?id=U1edbV4kNu_), and they ended up with a heuristic load-balancing rule where nodes greedily switch pipeline stages. However, unlike your work, they do not prove that such rule leads to optimal throughput.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] who runs the scheduler? #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] who runs the scheduler? #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions