Skip to content

Fix OLMoE routing and Mistral4 RoPE dimensions#45366

Closed
owwll wants to merge 1 commit intohuggingface:mainfrom
owwll:copilot/fix-olmoe-mistral4-bugs
Closed

Fix OLMoE routing and Mistral4 RoPE dimensions#45366
owwll wants to merge 1 commit intohuggingface:mainfrom
owwll:copilot/fix-olmoe-mistral4-bugs

Conversation

@owwll
Copy link
Copy Markdown

@owwll owwll commented Apr 10, 2026

This PR addresses two separate issues:

  1. Fixes a bug in Mistral4 RoPE dimension calculation.
    The Mistral4RotaryEmbedding was incorrectly using the full head_dim to calculate the rotary dimension, instead of respecting the partial_rotary_factor. This has been corrected to use int(head_dim * config.rope_parameters.get("partial_rotary_factor", 1.0)). A new regression test has been added to test_modeling_mistral4.py to cover this case.

  2. Fixes bugs in the OLMoE model.

    • The OlmoeAttention layer was incorrectly initializing its normalization layers (q_norm and k_norm) with the wrong dimension. This has been fixed.
    • The OlmoeTopKRouter was returning softmax probabilities instead of raw logits, which is inconsistent with other router implementations. This has been corrected.
    • New tests have been added to test_modeling_olmoe.py to validate these fixes.

This contribution follows the project's contribution guidelines.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: mistral4, olmoe

@owwll owwll closed this Apr 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45366&sha=de9d2f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant