Skip to content

feat: step scheduler section#59

Merged
akoumpa merged 6 commits intomainfrom
akoumparouli/feat_step_scheduler_section
Nov 18, 2025
Merged

feat: step scheduler section#59
akoumpa merged 6 commits intomainfrom
akoumparouli/feat_step_scheduler_section

Conversation

@akoumpa
Copy link
Contributor

@akoumpa akoumpa commented Nov 18, 2025

Introduces the step_scheduler section in the YAML configs.

step_scheduler:
  global_batch_size: 8 # the global batch size across all ranks / nodes, previously batch.batch_size_per_node <- that's not exactly semantically the same size step_scheduler.global_batch_size = batch.batch_size_per_node * num_nodes
  local_batch_size: 1 # the local batch size on each GPU, previously: data.dataloader.batch_size
  ckpt_every_steps: 1000 # how frequently to save checkpoints, previously: logging.save_every
  num_epochs: 100 # number of epochs, previously: training.num_epochs
  log_every: 2 # how frequently to log in terminal, previously: logging.log_every

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 18, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa
Copy link
Contributor Author

akoumpa commented Nov 18, 2025

/ok to test a625930

@akoumpa
Copy link
Contributor Author

akoumpa commented Nov 18, 2025

/ok to test 923c548

@akoumpa
Copy link
Contributor Author

akoumpa commented Nov 18, 2025

/ok to test cbb7d7b

@akoumpa
Copy link
Contributor Author

akoumpa commented Nov 18, 2025

/ok to test cbb7d7b

@akoumpa akoumpa merged commit ab47b52 into main Nov 18, 2025
29 of 39 checks passed
lbliii pushed a commit that referenced this pull request Nov 19, 2025
* introduce step_scheduler section

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add step_scheduler section

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* lint

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* rm dead code

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
huvunvidia pushed a commit that referenced this pull request Feb 12, 2026
* introduce step_scheduler section

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add step_scheduler section

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* lint

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* rm dead code

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments