From 546c53a498f953f77f9312452b07ecb9011b4034 Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Wed, 17 Mar 2021 19:46:10 -0700 Subject: [PATCH] [doc] launcher As discussed in https://github.com/microsoft/DeepSpeed/issues/662 this PR modifies the doc: * explains what to use instead of CUDA_VISIBLE_DEVICES * puts the `--hostfile` cl arg in the correct place in the invocation script Fixes: https://github.com/microsoft/DeepSpeed/issues/662 --- docs/_tutorials/getting-started.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/docs/_tutorials/getting-started.md b/docs/_tutorials/getting-started.md index 37f104f0739e..e12388aaf973 100644 --- a/docs/_tutorials/getting-started.md +++ b/docs/_tutorials/getting-started.md @@ -186,8 +186,8 @@ slots available. The following command launches a PyTorch training job across all available nodes and GPUs specified in `myhostfile`: ```bash -deepspeed \ - --deepspeed --deepspeed_config ds_config.json --hostfile=myhostfile +deepspeed --hostfile=myhostfile \ + --deepspeed --deepspeed_config ds_config.json ``` Alternatively, DeepSpeed allows you to restrict distributed training of your model to a @@ -264,3 +264,10 @@ not detected or passed in then DeepSpeed will query the number of GPUs on the local machine to discover the number of slots available. The `--include` and `--exclude` arguments work as normal, but the user should specify 'localhost' as the hostname. + +Also note that `CUDA_VISIBLE_DEVICES` can't be used with DeepSpeed to control +which devices should be used. For example, to use only gpu1 of the current +node, do: +```bash +deepspeed --include localhost:1 ... +```