n-to-1 kernel fusion via `KernelConfig` by michaelbenayoun · Pull Request #45363 · huggingface/transformers

michaelbenayoun · 2026-04-10T18:53:22Z

What does this PR do?

This PR adds support for fusing multiple modules into a single kernel — the motivating case being fused RMSNorm+MLP kernels, but the API is generic.

What changed

FusedModuleBase, fuse_modules, unfuse_modules, register_fusion_patterns added to hub_kernels.py
KernelConfig now accepts tuple keys that trigger fusion before kernelization

Two ways to use it

Option A — inline (no model changes needed)

Embed the glob patterns directly in the KernelConfig key as (name, path) pairs:

KernelConfig({
    (
        ("RMSNorm", "model.layers.*.post_attention_layernorm"),
        ("MLP",     "model.layers.*.mlp"),
    ): "org/repo:RMSNormMLP",
})

Option B — via registry (model declares its patterns)

The model class declares where its fusable modules live:

class MyModel(PreTrainedModel):
    _kernel_fusion_patterns = {
        "RMSNormMLP": ["model.layers.*.post_attention_layernorm", "model.layers.*.mlp"],
    }

Or externally without touching the class:

register_fusion_patterns(MyModel, {"RMSNormMLP": [...]})

Then the KernelConfig key is just the kernel name:

KernelConfig({("RMSNorm", "MLP"): "org/repo:RMSNormMLP"})

Example

import copy
import torch

from transformers import AutoModelForCausalLM, AutoTokenizer, KernelConfig
from transformers.integrations import unfuse_modules


model_id = "michaelbenayoun/qwen3-tiny-4kv-heads-4layers-random"
tokenizer = AutoTokenizer.from_pretrained(model_id)

kernel_config = KernelConfig({
    (
        ("RMSNorm", "model.layers.*.post_attention_layernorm"),
        ("MLP",     "model.layers.*.mlp"),
    ): "michaelbenayoun/dummy-rmsnorm-mlp:RMSNormMLP",
})

model = AutoModelForCausalLM.from_pretrained(model_id, use_kernels=True, kernel_config=kernel_config, device_map="cuda")

input_ids = tokenizer("Hello, how are you?", return_tensors="pt").input_ids.to(model.device)

original_model = copy.deepcopy(model)
unfuse_modules(original_model)
original_model.eval()

with torch.no_grad():
    fused_out = model(input_ids).logits
    original_out = original_model(input_ids).logits

print("Max diff fused vs original:", (fused_out - original_out).abs().max().item())

Question

How is related to #45041 and is it serving the same purpose / needs?

HuggingFaceDocBuilderDev · 2026-04-10T19:03:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

michaelbenayoun · 2026-04-10T20:20:01Z

To be removed before merging. Temporary file.

github-actions · 2026-04-10T20:20:49Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3

michaelbenayoun · 2026-04-13T19:48:35Z

+    _FUSION_PATTERNS_REGISTRY[model_class_or_instance] = patterns
+
+
+class FusedModuleBase(nn.Module):


@ArthurZucker the API is way simpler than #44979 because the whole purpose of fusion here is to replace the forward with a kernel. So we do not need all the complex machinery. What we need is just a module and being able to replace its forward with the kernel forward.

ArthurZucker

much simpler, much better! I like it 🤗 (not an in depth review as we have to discuss some stuff internally!)

ArthurZucker · 2026-04-14T13:14:02Z

+        module.add_module(child_names[0], fused_instance)
+        for child_name in child_names[1:]:
+            module.add_module(child_name, nn.Identity())


we're probably gonna have concerns with hooks, especially with TP potentially but also accelerate!

feat: module fusion API for kernels

b387190

michaelbenayoun added 3 commits April 10, 2026 15:11

fix: improve __repr__ for fused modules

6bc9402

wip: integration to KernelConfig

62d4454

wip: add temporary example

4082fe1

michaelbenayoun commented Apr 10, 2026

View reviewed changes

Comment thread fused_qwen_example.py

Copy link
Copy Markdown

Member Author

michaelbenayoun Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be removed before merging. Temporary file.

michaelbenayoun added 5 commits April 13, 2026 15:15

wip: pattern matching in KernelConfig and actual kernel repo

ac4a699

refactor: move relevant code to hub_kernels.py

e13111f

docs: reformat docstring

d9d53f0

refactor: remove comment

e1c7f3f

Merge branch 'main' into fused_kernels

db0b7f0

michaelbenayoun changed the title ~~Fused kernels support~~ n-to-1 kernel fusion via KernelConfig Apr 13, 2026

michaelbenayoun mentioned this pull request Apr 13, 2026

Module Fusion API #44979

Open

michaelbenayoun requested a review from ArthurZucker April 13, 2026 19:46

michaelbenayoun commented Apr 13, 2026

View reviewed changes

ArthurZucker reviewed Apr 14, 2026

View reviewed changes

Merge branch 'main' into fused_kernels

e21d06e

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

n-to-1 kernel fusion via `KernelConfig`#45363

n-to-1 kernel fusion via `KernelConfig`#45363
michaelbenayoun wants to merge 10 commits intohuggingface:mainfrom
michaelbenayoun:fused_kernels

michaelbenayoun commented Apr 10, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 10, 2026

Uh oh!

michaelbenayoun Apr 10, 2026

Uh oh!

github-actions Bot commented Apr 10, 2026

Uh oh!

michaelbenayoun Apr 13, 2026

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		_FUSION_PATTERNS_REGISTRY[model_class_or_instance] = patterns


		class FusedModuleBase(nn.Module):

Conversation

michaelbenayoun commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

What changed

Two ways to use it

Option A — inline (no model changes needed)

Option B — via registry (model declares its patterns)

Example

Question

Uh oh!

HuggingFaceDocBuilderDev commented Apr 10, 2026

Uh oh!

michaelbenayoun Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 10, 2026

Uh oh!

michaelbenayoun Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

michaelbenayoun commented Apr 10, 2026 •

edited

Loading