fix: Step-3.5-Flash layer_types mismatch and related recipe fixes (#1… by akoumpa · Pull Request #1936 · NVIDIA-NeMo/Automodel

akoumpa · 2026-04-21T05:24:28Z

…916)

fix: add tiktoken dep, patch Step-3.5-Flash layer_types mismatch, tune Qwen MoE recipes

Add tiktoken to base deps for Moonlight's TikToken-based remote tokenizer.
Retry AutoConfig.from_pretrained when upstream configs ship layer_types longer than num_hidden_layers (e.g. stepfun-ai/Step-3.5-Flash) by truncating layer_types in the raw config dict and rebuilding via the resolved config class (dynamic module or CONFIG_MAPPING).
Bump qwen3_moe_30b_hellaswag hf_kl_threshold 1e-3 -> 1e-2 and qwen3_moe_30b_uccl_ep ep_size 16 -> 8.

Update uv lock
Apply suggestion from @claude[bot]

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

Add specific line by line info of high level changes in this PR.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

copy-pr-bot · 2026-04-21T05:24:32Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

akoumpa · 2026-04-21T05:24:40Z

/ok to test 5a82735

@claude

) * fix: add tiktoken dep, patch Step-3.5-Flash layer_types mismatch, tune Qwen MoE recipes - Add tiktoken to base deps for Moonlight's TikToken-based remote tokenizer. - Retry AutoConfig.from_pretrained when upstream configs ship layer_types longer than num_hidden_layers (e.g. stepfun-ai/Step-3.5-Flash) by truncating layer_types in the raw config dict and rebuilding via the resolved config class (dynamic module or CONFIG_MAPPING). - Bump qwen3_moe_30b_hellaswag hf_kl_threshold 1e-3 -> 1e-2 and qwen3_moe_30b_uccl_ep ep_size 16 -> 8. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: hemildesai <hemild@nvidia.com> * Update uv lock Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> * Apply suggestion from @claude[bot] Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> --------- Signed-off-by: hemildesai <hemild@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Signed-off-by: hemildesai <hemild@nvidia.com>

hemildesai · 2026-04-21T16:06:33Z

/ok to test 0504afd

thomasdhc

Automation approval

copy-pr-bot Bot temporarily deployed to test April 21, 2026 05:25 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 21, 2026 05:25 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 21, 2026 05:52 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci April 21, 2026 06:14 Failure

hemildesai force-pushed the cherry-pick-1916-r0.4.0 branch from 5a82735 to 0504afd Compare April 21, 2026 16:06

copy-pr-bot Bot had a problem deploying to nemo-ci April 21, 2026 16:06 Failure

copy-pr-bot Bot temporarily deployed to nemo-ci April 21, 2026 16:06 Inactive

copy-pr-bot Bot temporarily deployed to test April 21, 2026 16:06 Inactive

hemildesai marked this pull request as ready for review April 21, 2026 16:07

hemildesai requested review from a team, HuiyingLi, ZhiyuLi-Nvidia, adil-a, hemildesai and pthombre as code owners April 21, 2026 16:07

copy-pr-bot Bot temporarily deployed to nemo-ci April 21, 2026 16:16 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 21, 2026 16:39 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 21, 2026 17:06 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci April 21, 2026 17:06 Failure

copy-pr-bot Bot temporarily deployed to nemo-ci April 21, 2026 17:06 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 21, 2026 20:12 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 21, 2026 20:21 Inactive

thomasdhc approved these changes Apr 21, 2026

View reviewed changes

akoumpa merged commit 5f24def into r0.4.0 Apr 21, 2026
87 of 91 checks passed

akoumpa deleted the cherry-pick-1916-r0.4.0 branch April 21, 2026 21:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Step-3.5-Flash layer_types mismatch and related recipe fixes (#1…#1936

fix: Step-3.5-Flash layer_types mismatch and related recipe fixes (#1…#1936
akoumpa merged 1 commit intor0.4.0from
cherry-pick-1916-r0.4.0

akoumpa commented Apr 21, 2026

Uh oh!

copy-pr-bot Bot commented Apr 21, 2026

Uh oh!

akoumpa commented Apr 21, 2026

Uh oh!

hemildesai commented Apr 21, 2026

Uh oh!

thomasdhc left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

akoumpa commented Apr 21, 2026

What does this PR do ?

Changelog

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented Apr 21, 2026

Uh oh!

akoumpa commented Apr 21, 2026

Uh oh!

hemildesai commented Apr 21, 2026

Uh oh!

thomasdhc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants