Add Solar-Open Model#43244
Conversation
|
@oesni you can ping me when you think it's ready for review (assuming it's not yet because it's a draft) |
|
It's ready for review! @vasqu |
|
wonder if it's okay to add |
vasqu
left a comment
There was a problem hiding this comment.
Looks already super good, my main points are mostly related to making the config more aligned with the current way we handle rope + tests to add a small dummy model for us - 100B is sadly too heavy for our CI 😢
| rendered properly in your Markdown viewer. | ||
|
|
||
| --> | ||
| *This model was released on 2025-12-31 and added to Hugging Face Transformers on 2026-01-13.* |
There was a problem hiding this comment.
Just as reminder to keep track of this when we merge
There was a problem hiding this comment.
It's now enforced on our CI, will need make fix-repo but that happens automatically then
Just checked why it failed, we should not add it there. You can run |
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
vasqu
left a comment
There was a problem hiding this comment.
Super small nits, let's move the test(s) under causal lm tester, mostly the one test to check if partial rotary factor has the correct default
| attention_bias (`bool`, *optional*, defaults to `False`): | ||
| Whether to use a bias in the projection layers. | ||
| attention_dropout (`float`, *optional*, defaults to 0.0): | ||
| The dropout ratio for the attention probabilities. |
There was a problem hiding this comment.
We usually only support extra branches / features when they are actually used within a model
|
Yup, dont worry about the CI - it's been a bit flaky these past few days/weeks |
|
run-slow: solar_open |
|
This comment contains models: ["models/solar_open"] |
CI Results✅ No failing test specific to this PR 🎉 ! |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, solar_open |
|
run-slow: solar_open |
|
This comment contains models: ["models/solar_open"] |
CI Results✅ No failing test specific to this PR 🎉 ! |
|
Merging now 🤗 thanks for the contribution |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Thanks for the review! 🤗 @vasqu |
* feat: implement solar-open-100b * feat: update modeling_solar_open.py * feat: update solar-open config * chore: apply style * feat: remove _tied_weights_keys * feat: update modeling code * chore: remove speech_to_text_2 in modeling * docs: solar_open model * test: solar open model * chore: re-convert modular * fix: remove require_read_token * Apply suggestion from @vasqu Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * chore: update lincse year -> 2026 * feat: add solar_open to tokenizer mapping * chore: update license year * test: remove _torch_compile_train_cls * docs: update solar_open doc * refactor: simplify SolarOpenDecoderLayer * refactor: inherit Glm4MoeConfig class * fix: handle head_dim properly * chore: apply style * fix: default parameters * test: use tiny dummy model * update expectations and switch to eager moe (no fluctuations per grouped_mm / batched_mm) * chore: remove trust_remote_code (suggestion from @vasqu) Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * Update src/transformers/models/solar_open/modular_solar_open.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * chore: update config docstring * chore: add partial_rotary_factor workaround comment * test: check default config values in test_modeling_solar_open.py * fix: config class interface * docs: add SolarOpen to doctree * docs: update dates * Revert "feat: add solar_open to tokenizer mapping" This reverts commit 038b1c1. * feat: remove unnecessary configs * test: update SolarOpenConfig tests * fix: attention_dropout issue on training * Revert "feat: remove unnecessary configs" This reverts commit 9023688. * Revert "fix: attention_dropout issue on training" This reverts commit 3c275dc. * Revert "Revert "feat: remove unnecessary configs"" This reverts commit e6adcd9. * Revert "Revert "fix: attention_dropout issue on training"" This reverts commit 573fa9a. * feat: inherit attention from Llama * fix: remove del for attention_bias and attention_dropout * chore: convert solar_open * fix date --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> Co-authored-by: vasqu <antonprogamer@gmail.com>
What does this PR do?
Implements Solar-Open model.
Solar Open is the open-weights MoE Solar LLM created by Upstage.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.