Fix PEFT x MoEs by ArthurZucker · Pull Request #43261 · huggingface/transformers

ArthurZucker · 2026-01-13T17:55:23Z

What does this PR do?

This should serve as an example of how the weight loader can be re-used in other project.
The content is probably gonna be upstreamed to peft!
Current status:

What to expect:

You already loaded a transformers model
You want to load only the Peft adapters:
a. we check the weight conversion mapping
b. if there are ops, we replace them with mapped peft ops
c. we just collect lora_A and lora_B together, process them like so:

class PeftConcatenate(Concatenate):
    """Convert per-expert LoRA weights to merged MoE weights using SVD."""
    @torch.no_grad
    def convert(
        self, input_dict: dict[str, list[torch.Tensor]], source_patterns: list[str], target_patterns: list[str], **kwargs
    ) -> dict[str, list[torch.Tensor]]:
        lora_a_out = []
        lora_b_out = []
        for k,v in input_dict.items():
            if "lora_A" in k:
                lora_a_out.append(v)
            elif "lora_B" in k:
                lora_b_out.append(v)
        lora_a_out = torch.cat(lora_a_out, dim=0)
        for i in range(len(lora_b_out)):
            lora_b_out.append(torch.block_diag(lora_b_out[0][i], lora_b_out[1][i]))
        lora_b_out = torch.stack(lora_b_out[2:], dim=0)
        return {
            target_patterns[0]+".lora_A.weight": [lora_a_out],
            target_patterns[0]+".lora_B.weight": [lora_b_out],
        }

    d. The output fused gate_up.lora_A/B are loaded in the model

CF: (Credits to @BenjaminBossan for the pic)

HuggingFaceDocBuilderDev · 2026-01-13T18:03:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

See huggingface/transformers#42491 and huggingface/transformers#43261.

…x-moes

It was necessary to flatten the LoRA weights for 3d MoE, as LoRA always expected 2d weights (being nn.Linear).

BenjaminBossan · 2026-01-16T16:54:36Z

Just leaving some general comments on this PR:

I wouldn't override __setattr__ in WeightTransform, as it's error prone. Instead, change source_patterns and target_patterns to @property with an appropriate @setter and move the logic there.
The hotswap code path has been removed from load_adapter, it needs to be added back in.
Similarly, there was a if peft_config.inference_mode: self.eval() call there that's now missing.
As discussed, the currently hard-coded Mixtral conversion ops need to be moved to a mapping that is only called for Mixtral.
We should add some sanity checks, e.g. if the expert layer is targeted and the PEFT adapter is not LoRA, raise a helpful error message.

…x-moes

* rank needed to be set to 2*r for concatenated gate up projection parameter so that PEFT allocates 2*r and matches the converted weights (using rank_pattern) * the weights needed to be transposed to match the counter parts * MoE in PEFT assumes (experts, in, out) but Mixtral MoE is transposed so we need to patch this assumption in PEFT for now

…o peft-x-moes

github-actions · 2026-01-23T17:45:10Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43261&sha=92d0fa

…o peft-x-moes

The latter does not exist in released PEFT versions and therefore is not an ideal target for this PR :)

…o peft-x-moes

…f_hub/1.3.2; python/3.13.2; torch/2.9.1; transformers/5.0.0.dev0;`

ArthurZucker

Ready!

Continuation of PR huggingface#2995. Background: huggingface/transformers#42491 and huggingface/transformers#43261. This change implements conversion operations for converting some existing PEFT checkpoints, mainly dealing with the fusing of MoE layers in transformers v5. The code added here is currently a copy from the code that exists in transformers which is supposed to be gated as soon PEFT v0.19 is released and use the code in this PR. The copying makes testing a bit difficult since there's currently no routing depending on the PEFT version in transformers. Older transformers versions, therefore, need patching to forcefully use the PEFT implementation of the conversion. As soon as the routing is implemented in transformers we can conditionally disable the patching.

current changes

fecb961

ArthurZucker added 10 commits January 14, 2026 09:16

finally!

97485bf

collection is giid

070dc03

what kinda works

2dd33fc

nit

20849ae

fix name

e4e14ce

small nits

016fef0

introduce loading info and config?

69edb49

try to remove some duplication

31884a0

trying to simplify its really not that hard is it?

ceb2384

nit

7ea396f

ArthurZucker marked this pull request as ready for review January 14, 2026 15:56

is this better?

42ab9fd

BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Jan 14, 2026

[WIP] Support transformers weight conversion

f9deba2

See huggingface/transformers#42491 and huggingface/transformers#43261.

BenjaminBossan mentioned this pull request Jan 14, 2026

[WIP] Support transformers weight conversion huggingface/peft#2995

Closed

ArthurZucker and others added 12 commits January 15, 2026 11:47

update

605801b

Merge branch 'main' of github.com:huggingface/transformers into peft-…

851d355

…x-moes

fix

6abfb77

better?

6497a5d

small fix

f1ba880

force change lora

a22bc4e

push

176cc11

up

86ba8c3

replace gate_up_

df02ccf

push

95e7624

Updated Ben (#43319)

50e4f0e

Getting closer (#43327)

6857f5f

It was necessary to flatten the LoRA weights for 3d MoE, as LoRA always expected 2d weights (being nn.Linear).

Merge branch 'main' of github.com:huggingface/transformers into peft-…

86b74c5

…x-moes

nemo and others added 7 commits January 23, 2026 16:12

Make style

36aa96e

Fix error messages

fc7c1f8

hardcode checking if .bin works

9436a4b

fix another test

19fe834

Merge branch 'peft-x-moes' of github.com:huggingface/transformers int…

f6d9f4e

…o peft-x-moes

fix regex renaming patterns

92d0fad

ArthurZucker and others added 13 commits January 23, 2026 19:10

nits

11c58cf

Merge branch 'main' into peft-x-moes

d729013

help debug tests

5c02c8a

style

0124593

Merge branch 'peft-x-moes' of github.com:huggingface/transformers int…

4b7f2e1

…o peft-x-moes

Patch update_layer instead of _get_in_out_features

654f49e

The latter does not exist in released PEFT versions and therefore is not an ideal target for this PR :)

Handle Qwen2 conversion similarly to mixtral

493948d

updates, explicit, simplify

edb61fe

style

7720711

Merge branch 'peft-x-moes' of github.com:huggingface/transformers int…

94f2bc7

…o peft-x-moes

nit

f33a133

fix `httpx.LocalProtocolError: Illegal header value b'unknown/None; h…

5c92bbd

…f_hub/1.3.2; python/3.13.2; torch/2.9.1; transformers/5.0.0.dev0;`

some of the last nits

8c13e37

ArthurZucker commented Jan 24, 2026

View reviewed changes

ArthurZucker mentioned this pull request Jan 24, 2026

[DeepSpeed] add weight_mapping to _load_state_dict_into_zero3_model #43303

Merged

5 tasks

ArthurZucker merged commit 3af2eb7 into main Jan 24, 2026
22 of 26 checks passed

ArthurZucker deleted the peft-x-moes branch January 24, 2026 10:06

Cyrilvallez mentioned this pull request Jan 29, 2026

Regex post processing in loading #43585

Merged

githubnemo mentioned this pull request Feb 27, 2026

WIP: Support tranformers weight conversion huggingface/peft#3071

Closed

tomaarsen mentioned this pull request Apr 14, 2026

[fix] PEFT integration fixes preventing save/load & integration #45428

Merged

6 tasks

BenjaminBossan mentioned this pull request Apr 28, 2026

FIX Restore LoRA hotswapping functionality #45682

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PEFT x MoEs#43261

Fix PEFT x MoEs#43261
ArthurZucker merged 67 commits intomainfrom
peft-x-moes

ArthurZucker commented Jan 13, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jan 13, 2026

Uh oh!

BenjaminBossan commented Jan 16, 2026

Uh oh!

github-actions Bot commented Jan 23, 2026

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ArthurZucker commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jan 13, 2026

Uh oh!

BenjaminBossan commented Jan 16, 2026

Uh oh!

github-actions Bot commented Jan 23, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ArthurZucker commented Jan 13, 2026 •

edited

Loading