[PoC] HF exporters by IlyasMoutawwakil · Pull Request #41992 · huggingface/transformers

IlyasMoutawwakil · 2025-11-03T14:20:21Z

What does this PR do?

Edit: some PRs were opened taking pieces of this one, like #42697 and #42317 so now it's mostly about HfExporters 🤗

This is an attempt at standardizing native transformers support of an export backend (dynamo, onnx, executorch).
Motivation:

The dynamo backend is cool and fast but also very strict compared to torchscript (for good reasons) ; for example with torchscript, data-dependent if statements are simply traced-through with a warning, but with dynamo it tries to guard the control flow and fail a fair amount of times (see all the if not torch.compilers.is_exporting() in this PR). This means that if we were to transition in optimum-onnx/optimum-intel to dynamo export, we would have to rewrite/patch entire modules to avoid these errors. This PR suggests adding a native component in Transformers that handles the export process and is fully tested with all models to catch these modeling problems early on. It also gives users a friendly API to experiment with exporting freshly added models which are not yet supported in optimum-onnx. optimum-onnx will build on top of this API and be the place for seamless and easy end-to-end export, handling all the extra steps like generating the inputs, dynamic axes, splitting models (encoder-decoder, vlms), handling inference, etc.

I started with the simplest models (encoders) then decoders (with pkv inputs/outputs) and now the integration works with almost all transformers models (including encoder-decoders and vlms) except a select few.

We simulate the generation loop by generating two tokens and capturing the the model's inputs at each "stage". That gives us the processed inputs for prefill and decode.
For multimodal/multicomponent models, we run the model with prefill inputs and capture the inputs of each component, decomposing the prefill into vision encoder, mm projector, language model..
Dynamic shapes can be passed by user or generated automatically by creating a dict with Dim.AUTO hints and letting torch infer which axes are dynamic (simplifies dynamic export testing).
Cache is handled with a generic Pytree registration recipe that works on all cache classes.
For each backend (onnx for example) we use a "multi stage patching" approach, meaning we try to resolve as many export issues as possible without having to change anything in the modeling. The first attempt is by patching torch ops to include behaviours/patterns not supported by the exporter. The second stage is by manipulating the fx graph. The third is by manipulating the framework-specific IR (e.g. onnx ir).

import torch

from transformers import AutoModelForMaskedLM, AutoTokenizer
from transformers.exporters.exporter_onnx import OnnxConfig, OnnxExporter


model_id = "hf-internal-testing/tiny-random-BertForMaskedLM"
tokenizer = AutoTokenizer.from_pretrained(model_id)
sample_inputs = dict(tokenizer(["Hello, my dog is cute"] * 2, return_tensors="pt"))
bert = AutoModelForMaskedLM.from_pretrained(model_id)
exporter = OnnxExporter(export_config=OnnxConfig(dynamic=True))
onnx_bert = exporter.export(model=bert, sample_inputs=sample_inputs)

# testing with different sized inputs
new_input = dict(tokenizer("Hello, my cat is soooooooooooooo adorable!", return_tensors="pt"))
onnx_outputs = onnx_bert.call_reference(**new_input)  # uses numpy under the hood
ort_outputs = onnx_bert(**new_input)  # uses onnxruntime under the hood
torch.testing.assert_close(onnx_outputs[0], ort_outputs[0], rtol=1e-04, atol=1e-04)

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.exporters.exporter_onnx import OnnxConfig, OnnxExporter
from transformers.exporters.utils import decompose_prefill_decode


model_id = "hf-internal-testing/tiny-random-LlamaForCausalLM"
tokenizer = AutoTokenizer.from_pretrained(model_id)
llama = AutoModelForCausalLM.from_pretrained(model_id).eval()
sample_inputs = dict(tokenizer(["Hello, my dog is cute"] * 2, return_tensors="pt"))

# decompose into prefill and decode stages
stages = decompose_prefill_decode(llama, sample_inputs)

# export each stage
exporter = OnnxExporter(export_config=OnnxConfig(dynamic=True))
for name, model, inputs in stages:
    print(f"Exporting {name}...")
    exported = exporter.export(model=model, sample_inputs=inputs)
    exported.save(f"onnx_llama_{name}.onnx", external_data=True)
    print(f"  {name} exported successfully")

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2025-11-03T14:35:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…ausal LMs

IlyasMoutawwakil · 2025-11-05T11:36:10Z

Currently all models (except a select few) are tested and pass the tests successfully !

389 passed, 87 skipped, 413 warnings in 143.73s (0:02:23)

skipped tests either:

explicitly skipped with test_torch_exportable = False, this is for custom cache models and some MoEs (15).
errors with an informative error torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNod (67).
errors with a cryptic Expected cond to be True, but got False.. (16).

github-actions · 2026-04-08T19:34:06Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	c60b877d	workflow commit (merge commit)
PR	7f9382c9	branch commit (from PR)
main	380e3cc5	base commit (on `main`)

Model CI Report

❌ 6 new failed tests from this PR 😭

glm4v:
tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_get_video_features_attentions (✅ ⟹ ❌)
tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_get_video_features_hidden_states (✅ ⟹ ❌)
tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_get_video_features_output_0 (✅ ⟹ ❌)
tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_get_video_features_output_1 (✅ ⟹ ❌)
tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_get_video_features_output_2 (✅ ⟹ ❌)
tests/models/glm4v/test_modeling_glm4v.py::Glm4vIntegrationTest::test_small_model_integration_test_with_video (✅ ⟹ ❌)

github-actions · 2026-04-09T15:49:22Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=41992&sha=f30fa9

IlyasMoutawwakil · 2026-04-10T06:58:28Z

run-slow: dia, efficientloftr, ernie4_5_vl_moe, falcon_mamba, flava, glm46v, glm4v, glm4v_moe, glm_image, glm_moe_dsa

github-actions · 2026-04-10T06:59:43Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/dia", "models/efficientloftr", "models/ernie4_5_vl_moe", "models/falcon_mamba", "models/flava", "models/glm46v", "models/glm4v", "models/glm4v_moe", "models/glm_image", "models/glm_moe_dsa"]
quantizations: []

IlyasMoutawwakil · 2026-04-10T08:07:02Z

run-slow: dia, efficientloftr, ernie4_5_vl_moe, falcon_mamba, flava, glm46v, glm4v, glm4v_moe, glm_image, glm_moe_dsa, glm_ocr

github-actions · 2026-04-10T08:07:55Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	a4365fea	workflow commit (merge commit)
PR	f30fa969	branch commit (from PR)
main	1e931b8f	base commit (on `main`)

⚠️ No test being reported (jobs are skipped or cancelled)!

github-actions · 2026-04-10T08:09:09Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/dia", "models/efficientloftr", "models/ernie4_5_vl_moe", "models/falcon_mamba", "models/flava", "models/glm46v", "models/glm4v", "models/glm4v_moe", "models/glm_image", "models/glm_moe_dsa", "models/glm_ocr"]
quantizations: []

github-actions · 2026-04-10T10:25:30Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	49ec14f9	workflow commit (merge commit)
PR	e44c5aec	branch commit (from PR)
main	c585eeaa	base commit (on `main`)

⚠️ No test being reported (jobs are skipped or cancelled)!

github-actions · 2026-04-13T08:41:04Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: dia, efficientloftr, ernie4_5_vl_moe, esm, falcon_mamba, flava, glm46v, glm4v, glm4v_moe, glm_image, glm_moe_dsa

IlyasMoutawwakil added 6 commits October 24, 2025 14:51

initial poc

4721d30

support exporting causal models

448206d

fix cache recreation issue

46ef449

group utils

23d96d9

dynamic axis on a best effort basis

e37cf45

allow user to pass their own pkv

7552964

IlyasMoutawwakil marked this pull request as draft November 3, 2025 14:29

IlyasMoutawwakil and others added 15 commits November 4, 2025 07:49

Merge branch 'main' into hf-exporters

5857f10

misc

fba576d

cascading exports

c4a3a2d

add encoder decoder cache support

25904a1

add testing for dynamo exporter

3f95193

fix cases that are easy to fix

f07de57

disable torch export for some models using custom caches

7a9e3f7

fix more models

ba02172

solve issue in model return fake tensors

ba7b4b8

disable more models with custom caches

ad73271

fix biogpt

6b838d9

biogpt

8488793

style

41dda35

error on generative encoder decoders and process attention mask for c…

c157f03

…ausal LMs

prepare_cache_inputs_for_export helper method

6eaa9f1

IlyasMoutawwakil added 6 commits November 5, 2025 13:13

add comments about non-tested models

9c4afb5

style

cfa6977

fix bamba export

e58aca3

paligemma

d2184fe

deepseek and zamba

14ea0d2

skip reformer for its custom cache

a08b663

Merge branch 'main' into hf-exporters

e2e951d

xenova mentioned this pull request Apr 9, 2026

feat: make timesfm2_5 onnx export compatible #45233

Open

6 tasks

style and docs

e44c5ae

IlyasMoutawwakil force-pushed the hf-exporters branch from f30fa96 to e44c5ae Compare April 10, 2026 07:04

dict decomposition

1a7e790

remove deferred

677a90d

IlyasMoutawwakil and others added 13 commits April 10, 2026 12:29

executorch fixes

dab2d14

dynamo doctests passing *-*

6ec03ba

clean title

5eba3d5

update

c5c7020

normalize qwen omni

679eb1b

only pure functions

ff66d95

patch views for exeuctorch

f008002

better vision dynamic tensors pre computation

aef0409

audio functions

2ae78c7

Merge branch 'main' into hf-exporters

e3e66c6

fix

d8432a5

fix

234f7eb

define vision modeling utils

aa220b7

IlyasMoutawwakil mentioned this pull request Apr 20, 2026

Extract dynamic vision/audio tensors into standalone pure functions #45396

Open

6 tasks

evalstate mentioned this pull request Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PoC] HF exporters#41992

[PoC] HF exporters#41992
IlyasMoutawwakil wants to merge 232 commits intomainfrom
hf-exporters

IlyasMoutawwakil commented Nov 3, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2025

Uh oh!

IlyasMoutawwakil commented Nov 5, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 8, 2026

Uh oh!

github-actions Bot commented Apr 9, 2026

Uh oh!

IlyasMoutawwakil commented Apr 10, 2026

Uh oh!

github-actions Bot commented Apr 10, 2026

Uh oh!

IlyasMoutawwakil commented Apr 10, 2026

Uh oh!

github-actions Bot commented Apr 10, 2026

Uh oh!

github-actions Bot commented Apr 10, 2026

Uh oh!

github-actions Bot commented Apr 10, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

IlyasMoutawwakil commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2025

Uh oh!

IlyasMoutawwakil commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 8, 2026

CI Results

Commit Info

Model CI Report

Uh oh!

github-actions Bot commented Apr 9, 2026

Uh oh!

IlyasMoutawwakil commented Apr 10, 2026

Uh oh!

github-actions Bot commented Apr 10, 2026

Uh oh!

IlyasMoutawwakil commented Apr 10, 2026

Uh oh!

github-actions Bot commented Apr 10, 2026

CI Results

Commit Info

Uh oh!

github-actions Bot commented Apr 10, 2026

Uh oh!

github-actions Bot commented Apr 10, 2026

CI Results

Commit Info

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

IlyasMoutawwakil commented Nov 3, 2025 •

edited

Loading

IlyasMoutawwakil commented Nov 5, 2025 •

edited

Loading