Skip to content

Single node launcher issue #713

@jeffra

Description

@jeffra

When running something like: deepspeed --include worker-1 test.py from worker-0 we currently run test.py only on worker-0. This is due to this line of code in our runner:

https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/launcher/runner.py#L307

We currently prevent pdsh from launching the job if the number of workers is 1. However, we do not currently check to make sure that the 1 worker is the local worker we are invoking the deepspeed launcher from.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions