-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Description
With a simple invocation of run_varuna:
$ python -m varuna.run_varuna --batch_size 3 --nstages 3 --chunk_size 1 varuan_test.py
The script has the following terminal output:
No apex!
No apex
['127.0.0.1']
ssh 127.0.0.1 echo "python -u -m varuna.launcher --ngpus_per_server 4 --node_rank 0 --nservers 1 --master_addr 127.0.0.1 --nstages 3 --batch_size 3 --chunk_size 1 --code_dir /fsx/users/jamesreed/varuna varuan_test.py" > launch_varuna.sh; VARUNA_MANAGER_IP=10.200.30.184 VARUNA_MORPH_PORT=4200 VARUNA_HEARTBEAT_PORT=5000 bash launch_varuna.sh
With the last line (i believe) indicating that run_varuna is trying to use ssh to to issue a command on the host (in this case localhost). However, this host requires two-factor authentication on login. run_varuna launches this command (twice, seemingly) as a background process and ostensibly with stdin disconnected, thus the interactive 2fa prompt cannot be completed. It would be good to have a way to issue commands to the host that is compatible with common security policies or is built on top of standard cluster management tools
Metadata
Metadata
Assignees
Labels
No labels