Skip to content

add sleep#7498

Merged
titu1994 merged 3 commits intor1.21.0from
geshen/add_sleep_to_exp_manager
Sep 24, 2023
Merged

add sleep#7498
titu1994 merged 3 commits intor1.21.0from
geshen/add_sleep_to_exp_manager

Conversation

@gshennvm
Copy link
Collaborator

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Fixes #7460. Exp manager now has a sleep argument to sleep the non rank 0 ranks.

PR Type:

  • New Feature
  • Bugfix
  • Documentation

Additional Information

Signed-off-by: Gerald Shen <geshen@nvidia.com>
Signed-off-by: Gerald Shen <geshen@nvidia.com>
Signed-off-by: Gerald Shen <geshen@nvidia.com>
titu1994
titu1994 previously approved these changes Sep 24, 2023
Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@titu1994 titu1994 merged commit b8e631f into r1.21.0 Sep 24, 2023
@titu1994 titu1994 deleted the geshen/add_sleep_to_exp_manager branch September 24, 2023 04:17
# Add lightning file logging to global_rank zero
add_filehandlers_to_pl_logger(log_dir / 'lightning_logs.txt', log_dir / 'nemo_error_log.txt')

elif trainer.num_devices * trainer.num_devices > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it meant to be trainer.num_nodes * trainer.num_devices?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof. Yes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thanks for catching that!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's not a big deal though since if it's unlikely that someone runs with num_nodes > 1 and num_devices == 1 ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments