Updated Ben by BenjaminBossan · Pull Request #43319 · huggingface/transformers

BenjaminBossan · 2026-01-16T12:27:12Z

As discussed internally.

Still not finished, running into this error now:

MixtralForCausalLM LOAD REPORT from: peft-internal-testing/mixtral-pre-v5-lora
Key                                                              | Status   | Details                                                                                    
-----------------------------------------------------------------+----------+--------------------------------------------------------------------------------------------
model.layers.{0, 1}.mlp.experts.base_layer.lora_A.default.weight | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([8, 16, 1024]) vs model:torch.Size([64, 7168])
model.layers.{0, 1}.mlp.experts.base_layer.lora_B.default.weight | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([2, 7168, 16]) vs model:torch.Size([1024, 64])
model.layers.{0, 1}.mlp.experts.lora_B.default.weight            | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([8, 1024, 8]) vs model:torch.Size([3584, 64]) 
model.layers.{0, 1}.mlp.experts.lora_A.default.weight            | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([8, 8, 3584]) vs model:torch.Size([64, 1024])

ArthurZucker

looks fairly good!

ArthurZucker · 2026-01-16T12:28:36Z

-        if isinstance(conversion, WeightRenaming):
+
+    # strip "base_model.model" and add adapter name
+    new_weight_conversions = [


Suggested change

new_weight_conversions = [

base_lora_conversion = [

ArthurZucker · 2026-01-16T12:28:58Z

+            conversion.source_patterns = new_source_patterns
+
+            pat = conversion.target_patterns[0]
+            pat = pat.replace("gate_up_proj", "base_layer").replace(".down_proj", "")


we shouldmake this general, for any model -> use a similar mapping to what we have

HuggingFaceDocBuilderDev · 2026-01-16T12:35:53Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2026-01-16T12:38:49Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43319&sha=32adc7

* current changes * finally! * collection is giid * what kinda works * nit * fix name * small nits * introduce loading info and config? * try to remove some duplication * trying to simplify its really not that hard is it? * nit * is this better? * update * fix * better? * small fix * force change lora * push * up * replace gate_up_ * push * Updated Ben (#43319) * Getting closer (#43327) It was necessary to flatten the LoRA weights for 3d MoE, as LoRA always expected 2d weights (being nn.Linear). * style * bring back eval() * nits * Revert "bring back eval()" This reverts commit bcee589. * fix quantizer * fix * fix key mapping not recognized * fix kwargs shinannigans * fix more kwargs passing * up * fix `use_safetensors=False` call? * nits? * properly pass use_safetensors=False * fix * style * defaut factory * style * simplify * fix custom adapter_state_dict * small updates * nit * style * Fix mixtral loading * rank needed to be set to 2*r for concatenated gate up projection parameter so that PEFT allocates 2*r and matches the converted weights (using rank_pattern) * the weights needed to be transposed to match the counter parts * MoE in PEFT assumes (experts, in, out) but Mixtral MoE is transposed so we need to patch this assumption in PEFT for now * Make style * Fix error messages * hardcode checking if .bin works * fix another test * fix regex renaming patterns * nits * help debug tests * style * Patch `update_layer` instead of `_get_in_out_features` The latter does not exist in released PEFT versions and therefore is not an ideal target for this PR :) * Handle Qwen2 conversion similarly to mixtral * updates, explicit, simplify * style * nit * fix `httpx.LocalProtocolError: Illegal header value b'unknown/None; hf_hub/1.3.2; python/3.13.2; torch/2.9.1; transformers/5.0.0.dev0;` * some of the last nits --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> Co-authored-by: nemo <git@ningu.net>

Updated Ben

32adc70

ArthurZucker approved these changes Jan 16, 2026

View reviewed changes

BenjaminBossan merged commit 50e4f0e into huggingface:peft-x-moes Jan 16, 2026
9 of 25 checks passed

BenjaminBossan deleted the peft-x-moes-ben branch January 16, 2026 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated Ben#43319

Updated Ben#43319
BenjaminBossan merged 1 commit intohuggingface:peft-x-moesfrom
BenjaminBossan:peft-x-moes-ben

BenjaminBossan commented Jan 16, 2026

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker Jan 16, 2026

Uh oh!

ArthurZucker Jan 16, 2026

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jan 16, 2026

Uh oh!

github-actions Bot commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

BenjaminBossan commented Jan 16, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jan 16, 2026

Uh oh!

github-actions Bot commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants