Skip to content

fix: Step-3.5-Flash layer_types mismatch and related recipe fixes (#1…#1936

Merged
akoumpa merged 1 commit intor0.4.0from
cherry-pick-1916-r0.4.0
Apr 21, 2026
Merged

fix: Step-3.5-Flash layer_types mismatch and related recipe fixes (#1…#1936
akoumpa merged 1 commit intor0.4.0from
cherry-pick-1916-r0.4.0

Conversation

@akoumpa
Copy link
Copy Markdown
Contributor

@akoumpa akoumpa commented Apr 21, 2026

…916)

  • fix: add tiktoken dep, patch Step-3.5-Flash layer_types mismatch, tune Qwen MoE recipes
  • Add tiktoken to base deps for Moonlight's TikToken-based remote tokenizer.
  • Retry AutoConfig.from_pretrained when upstream configs ship layer_types longer than num_hidden_layers (e.g. stepfun-ai/Step-3.5-Flash) by truncating layer_types in the raw config dict and rebuilding via the resolved config class (dynamic module or CONFIG_MAPPING).
  • Bump qwen3_moe_30b_hellaswag hf_kl_threshold 1e-3 -> 1e-2 and qwen3_moe_30b_uccl_ep ep_size 16 -> 8.
  • Update uv lock

  • Apply suggestion from @claude[bot]


What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 21, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@akoumpa
Copy link
Copy Markdown
Contributor Author

akoumpa commented Apr 21, 2026

/ok to test 5a82735

)

* fix: add tiktoken dep, patch Step-3.5-Flash layer_types mismatch, tune Qwen MoE recipes

- Add tiktoken to base deps for Moonlight's TikToken-based remote tokenizer.
- Retry AutoConfig.from_pretrained when upstream configs ship layer_types
  longer than num_hidden_layers (e.g. stepfun-ai/Step-3.5-Flash) by
  truncating layer_types in the raw config dict and rebuilding via
  the resolved config class (dynamic module or CONFIG_MAPPING).
- Bump qwen3_moe_30b_hellaswag hf_kl_threshold 1e-3 -> 1e-2 and
  qwen3_moe_30b_uccl_ep ep_size 16 -> 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: hemildesai <hemild@nvidia.com>

* Update uv lock

Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

* Apply suggestion from @claude[bot]

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

---------

Signed-off-by: hemildesai <hemild@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Signed-off-by: hemildesai <hemild@nvidia.com>
@hemildesai hemildesai force-pushed the cherry-pick-1916-r0.4.0 branch from 5a82735 to 0504afd Compare April 21, 2026 16:06
@hemildesai
Copy link
Copy Markdown
Contributor

/ok to test 0504afd

Copy link
Copy Markdown
Contributor

@thomasdhc thomasdhc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automation approval

@akoumpa akoumpa merged commit 5f24def into r0.4.0 Apr 21, 2026
87 of 91 checks passed
@akoumpa akoumpa deleted the cherry-pick-1916-r0.4.0 branch April 21, 2026 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants