Skip to content

feat: lots of fixes#17

Merged
terrykong merged 24 commits intomainfrom
head-node-colocate
Mar 21, 2025
Merged

feat: lots of fixes#17
terrykong merged 24 commits intomainfrom
head-node-colocate

Conversation

@terrykong
Copy link
Copy Markdown
Collaborator

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

Checklist when contributing

  • TBD

Additional Information

  • Related to # (issue)

@github-actions github-actions Bot added Documentation Improvements or additions to documentation CI Relating to CI labels Mar 21, 2025
@terrykong terrykong force-pushed the head-node-colocate branch from 4506420 to f1114cf Compare March 21, 2025 07:58
@terrykong terrykong changed the title wip: lots of fixes feat: lots of fixes Mar 21, 2025
@terrykong terrykong added Run CICD and removed Run CICD Documentation Improvements or additions to documentation CI Relating to CI labels Mar 21, 2025
@github-actions github-actions Bot added Documentation Improvements or additions to documentation CI Relating to CI labels Mar 21, 2025
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
parthchadha
parthchadha previously approved these changes Mar 21, 2025
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong merged commit 20af897 into main Mar 21, 2025
5 checks passed
@terrykong terrykong deleted the head-node-colocate branch March 21, 2025 18:22
parthchadha pushed a commit that referenced this pull request Mar 21, 2025
- flatten hyperparams for tb no longer errors for lists (was an issue for schedulers)
- the submission script now overlaps the head on the first worker (no longer needs extra node just for head)
- fixes the CI to handle weird permissions issues
- added sphinx build and doctest to CI
- added functional tests to CI
- nuked an old example
- added docs for functional tests
- --no-container-mount-home
- fix a unit tests that expected cuda to skip
- allow running unit tests on slurm head node with no gpu
- add a hermetic script to run functional tests

Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
KiddoZhu pushed a commit that referenced this pull request May 6, 2025
- flatten hyperparams for tb no longer errors for lists (was an issue for schedulers)
- the submission script now overlaps the head on the first worker (no longer needs extra node just for head)
- fixes the CI to handle weird permissions issues
- added sphinx build and doctest to CI
- added functional tests to CI
- nuked an old example
- added docs for functional tests
- --no-container-mount-home
- fix a unit tests that expected cuda to skip
- allow running unit tests on slurm head node with no gpu
- add a hermetic script to run functional tests

Signed-off-by: Terry Kong <terryk@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Relating to CI Documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants