Add experimental support for transformers>=5.0 + min torch 2.8 by kevalmorabia97 · Pull Request #975 · NVIDIA/Model-Optimizer

kevalmorabia97 · 2026-03-04T19:30:11Z

What does this PR do?

Add experimental support for transformers >=5.0 and remove deprecated usages: https://github.com/huggingface/transformers/blob/main/MIGRATION_GUIDE_V5.md
⚠️ For accelerate examples that used --warmup-ratio: float (deprecated in 5.x), we now change it to --warmup-steps: float | int which works as ratio if float but only for 5.x. For 4.x, it will error out if float and prompt user to change back to --warmup-ratio or pass an int absolute step count.
⚠️ Unified Hugging Face checkpoint export for quantized checkpoints may not work for some models with transformers>=5.0 yet as it requires a lot of fixes (e.g. change in how MoE experts are organized)
- Add Workaround for TRT-LLM's import of deprecated transformers functions so trt-llm based gpu unit tests work fine. Still deployment for models needs proper fixes directly in TRT-LLM hence llm/vlm ptq example tests still run with transformers 4.57
- Everything except PTQ and Export (mainly MoE) should work fine with transformers>=5.0
Bump min torch to 2.8 and enable 2.11 cicd testing
NOTE: Upcoming Nemo:26.04 container comes with transformers 5.3

Testing

CI/CD tests passing
Manually tested unit tests, gpu tests with transformers 4.56 and 5.4
Manually tested example tests (except trt-llm container tests) with transformers 4.56 and 5.4
2-gpu nightly CICD tests manually triggered and passing: gpu tests, example tests

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, using torch.load(..., weights_only=True), avoiding pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other source, did you follow IP policy in CONTRIBUTING.md?: N/A
Did you write any new necessary tests?: ✅
Did you update Changelog?: ✅

Summary by CodeRabbit

New Features
- Make remote-code usage opt-in via a configurable --trust_remote_code flag across examples and tools.
Bug Fixes
- Improve checkpoint/resume detection and related training guidance to avoid erroneous errors.
Refactor
- Consolidate dtype/config naming, switch warmup settings from ratio → steps, and unify tokenizer invocation patterns.
Documentation
- Simplify changelog title and add misc notes for release 0.44.
Chores
- Remove scheduled PR-branch cleanup workflow and relax/remove several transformers version pins.
Tests
- Adjust test gates, skips, and structures to align with updated deps and behaviors.

copy-pr-bot · 2026-03-04T19:30:16Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-03-04T19:30:21Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Replaced many Hugging Face torch_dtype kwargs with dtype, made trust_remote_code opt-in and propagated it across examples and internals, adjusted quantization/MoE/trace internals for Transformers v5 semantics, removed a scheduled GitHub Actions workflow, and pruned pinned example requirements and some test decorators.

Changes

Cohort / File(s)	Summary
Removed CI workflow `\.github/workflows/delete_outdated_pr_branches\.yml`	Deleted the scheduled/manual GitHub Actions workflow that deleted outdated PR branches.
Transformers bounds & runtime warning `pyproject.toml`, `modelopt/torch/__init__.py`, `CHANGELOG.rst`, `tox.ini`	Raised minimum `transformers` to `>=5.0`, removed prior `<5.0` upper-bounds, and updated the runtime compatibility warning/patching logic for pre-5.0 behavior.
Global dtype API (`torch_dtype` → `dtype`) `examples/...`, `modelopt/...`, `examples/windows/...`, `modelopt/onnx/...`	Replaced `torch_dtype` with `dtype` in `from_pretrained`/model-construction callsites to align with Transformers v5 API.
Make `trust_remote_code` configurable `examples/gpt-oss/...`, `examples/llm_autodeploy/...`, `examples/llm_ptq/...`, `examples/speculative_decoding/...`, `examples/llm_eval/...`, `examples/.../vlm_utils.py`, `modelopt/torch/speculative/utils.py`, `examples/windows/...`	Added `trust_remote_code: bool = False` CLI flags/dataclass fields and threaded the value into `AutoTokenizer.from_pretrained` / `AutoModel.from_pretrained` and related helpers instead of hardcoding True.
QTensor wrapper restore & RealQuantLinear behavior `modelopt/torch/opt/plugins/transformers.py`, `modelopt/torch/quantization/nn/modules/quant_linear.py`, `modelopt/torch/quantization/plugins/accelerate.py`	Added `_restore_qtensor_wrappers` to re-wrap saved QTensor metadata after patched `from_pretrained`; updated `RealQuantLinear._setup` to preserve/restore existing `QTensorWrapper` metadata and adjusted dtype fallback/forwarding logic.
Quantization / HuggingFace plugin (DBRX/MoE) updates `modelopt/torch/quantization/plugins/huggingface.py`, `modelopt/torch/quantization/utils/core_utils.py`	Changed DBRX expert APIs/weight shapes and `_QuantDbrxExperts.forward` parameter ordering; made `sync_moe_expert_amax` a no-op for non-iterable/batched expert containers.
Trace & speculative runtime patches `modelopt/torch/trace/plugins/transformers.py`, `modelopt/torch/speculative/plugins/transformers.py`, `modelopt/torch/speculative/utils.py`	Added FX-trace-friendly `BertLayer.forward` patch for Transformers≥5.0, inlined `DynamicCache` creation where used, and made `load_vlm_or_llm_with_kwargs` accept `trust_remote_code`.
Wrapper model / past_key_values handling `modelopt/onnx/llm_export_utils/export_utils.py`	Switched model loading to `dtype` and reconstructed `past_key_values` tuples from cache fields instead of using `.to_legacy_cache()`.
Tokenizer / dataset API updates `examples/llm_sparsity/...`, `examples/windows/onnx_ptq/...`, `modelopt/torch/utils/speech_dataset_utils.py`, `examples/llm_eval/lm_eval_hf.py`	Replaced `tokenizer.batch_encode_plus(...)` with direct `tokenizer(...)` calls; removed or consolidated `trust_remote_code=True` on `load_dataset` calls; moved `import datasets` out of conditionals.
Warmup/config key changes `examples/gpt-oss/configs/sft_.yaml`, `examples/llm_qat/launch.sh`, `examples/llm_qat/notebooks/`, `examples/llm_sparsity/weight_sparsity/launch_finetune.sh`	Switched warmup keys/CLI args from `warmup_ratio` to `warmup_steps`.
Device placement & adapter timing `examples/llm_sparsity/attention_sparsity/hf_sa.py`, `modelopt/torch/quantization/plugins/transformers_trainer.py`	Use `.to(model.device)` instead of `.cuda()` and moved LoRA adapter insertion earlier (before super init) using provided training args.
Finetune resume/overwrite logic `examples/llm_sparsity/weight_sparsity/finetune.py`	Simplified checkpoint gating to rely on `resume_from_checkpoint` and removed the previous overwrite-output-dir error path.
Tests: decorators, fixtures, refactors `tests/...` (multiple files)	Removed many `@pytest.mark.manual` markers, introduced skip/gates for transformers<5.0, refactored class-based tests to module-level functions, adjusted tiny model fixtures to use `torch.float32`, and skipped Whisper tests that require system deps.
Requirements & docs pruning `examples/.../requirements*.txt`, `examples/llm_ptq/README.md`, `examples/speculative_decoding/README.md`	Removed or loosened several pinned dependencies (`transformers` pins, `librosa`, `soundfile`, `deepspeed`, etc.) and updated README notes/examples.
Misc examples & scripts `examples/speculative_decoding/scripts/`, `examples/gpt-oss/sft.py`, `examples/llm_qad/`, `examples/speculative_decoding/*`, `examples/...`	Minor refactors: added/propagated `trust_remote_code` flags, removed redundant redeclarations, and reformatted calls to align with v5 API changes.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Script
    participant HF as "HuggingFace.from_pretrained"
    participant ModelOpt as "ModelOpt plugin (_restore_qtensor_wrappers)"
    participant FS as "modelopt_state.pth (FS)"

    User->>Script: run (optional --trust_remote_code)
    Script->>HF: from_pretrained(..., dtype=..., trust_remote_code=...)
    HF-->>Script: returns model instance
    Script->>ModelOpt: patched hook invoked after instantiation
    ModelOpt->>FS: check for modelopt_state.pth
    FS-->>ModelOpt: q_tensor_state (if present)
    ModelOpt->>ModelOpt: re-wrap weights preserving QTensorWrapper metadata
    ModelOpt-->>Script: model with restored wrappers
    Script-->>User: continue (quantize/export/generate)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name	Status	Explanation	Resolution
Security Anti-Patterns	❌ Error	Docstring example contains hardcoded trust_remote_code=True without justification, violating SECURITY.md requirements and contradicting PR objectives.	Remove trust_remote_code=True parameter, add inline comment justifying its necessity, or make it a configurable parameter with secure False default.
Docstring Coverage	⚠️ Warning	Docstring coverage is 34.74% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately summarizes the main objective: adding experimental support for transformers>=5.0 while bumping the minimum torch version to 2.8.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kmorabi/bump-transformers-5.0

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-24T21:38:07Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-09 04:30 UTC

kevalmorabia97 · 2026-03-24T21:40:16Z

/ok to test 2b24815

kevalmorabia97 · 2026-03-24T22:25:01Z

/ok to test 1f0726e

kevalmorabia97 · 2026-03-25T08:24:52Z

/ok to test 48b426f

codecov · 2026-03-25T08:55:44Z

Codecov Report

❌ Patch coverage is 81.25000% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.20%. Comparing base (3a177f6) to head (644e0e0).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/plugins/huggingface.py	54.28%	16 Missing ⚠️
modelopt/torch/trace/plugins/transformers.py	78.57%	3 Missing ⚠️
modelopt/torch/__init__.py	75.00%	1 Missing ⚠️
modelopt/torch/quantization/backends/nvfp4_gemm.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #975      +/-   ##
==========================================
+ Coverage   71.65%   77.20%   +5.55%     
==========================================
  Files         353      353              
  Lines       40355    40416      +61     
==========================================
+ Hits        28915    31204    +2289     
+ Misses      11440     9212    -2228

Flag	Coverage Δ
examples	`44.43% <57.14%> (+1.11%)`	⬆️
gpu	`56.76% <50.89%> (+9.45%)`	⬆️
unit	`55.13% <58.03%> (+0.09%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kevalmorabia97 · 2026-04-01T07:39:52Z

what is the reason of bumping min torch to 2.8?

2.11 was recently released and I wanted to make sure the transformers upgrade works for 2.11 as well hence enabled 2.11 in testing. Since at any time we only want to test and support last 4 torch releases (~1 year old releases), hence bumped min to 2.8 but actually we didnt need code changes so users will likely be fine if they use older torch for now

examples/windows/accuracy_benchmark/modeling.py

vishalpandya1990 · 2026-04-01T08:49:42Z

Manually tested example tests (except trt-llm container tests) with transformers 4.56 and 5.4

Does it include examples/windows samples?

Also, Please specify reasons for certain dependencies removal in requirements.txt in windows examples? Note that accuracy_benchmark can be run in a standalone virtual environment with given onnx checkpoint without setting up modelopt.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 · 2026-04-01T09:23:04Z

Manually tested example tests (except trt-llm container tests) with transformers 4.56 and 5.4

Does it include examples/windows samples?

Also, Please specify reasons for certain dependencies removal in requirements.txt in windows examples? Note that accuracy_benchmark can be run in a standalone virtual environment with given onnx checkpoint without setting up modelopt.

@vishalpandya1990 Since windows examples dont have cicd tests, I just set transformers<5.0 or whatever previously was there so this doesnt break them. Separately will need your help to update all windows examples dependencies to latest .

If dependencies are already covered in nvidia-modelopt dependencies, I removed it to avoid duplicate. For accuracy_benchmark I will add back so it can run in standalone

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

examples/windows/diffusers/qad_example/requirements.txt

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

h-guo18

Changes regarding speculative decoding LGTM

shengliangxu

Approve to unblock

realAsma · 2026-04-03T19:49:53Z

modelopt/torch/quantization/utils/core_utils.py

+    if not hasattr(experts, "__iter__"):
+        # transformers>=5.0: batched experts, no per-expert quantizers
+        return
+


@kevalmorabia97 this could break some things -

We need to take a closer look

This fix is similar to another such recent fix here: https://github.com/NVIDIA/Model-Optimizer/pull/1136/changes

What should we do here? Some MoE tests were failing without this

Can we move this to modelopt/torch/quantization/plugins/huggingface.py - this method is called from there

@kevalmorabia97 can you please move this change to

Model-Optimizer/modelopt/torch/quantization/plugins/huggingface.py

Line 1319 in 4a5ef01

if not hasattr(module, "experts"):

_is_sparse_moe_block ? and change _is_sparse_moe_block name to _is_sparse_sequaential_moe_block

The reason is our _QuantSparseMoE base class is for supporting extra utilities for sparse sequential MoEs. This method should not be called for batched gemm MoEs. We could also update the _QuantSparseMoe docstring to reflect that this base class is for Sequential MoEs (i.e each experts are implemented as standalone modules) in HF.

Fixed in 54dd176 and disabled some from test_sparse_sequential_moe.py for transformers>=5.0

PTAL

modelopt/torch/quantization/plugins/huggingface.py

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

…mers-5.0 Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Edwardf0t1

LGTM, pls resolve conflicts - I will rebase my PR for ptq + export support #1187

kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 3e28ada to 2b24815 Compare March 24, 2026 21:34

kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 1f0726e to 48b426f Compare March 25, 2026 08:24

kevalmorabia97 force-pushed the kmorabi/bump-transformers-5.0 branch from 48b426f to 0781ac7 Compare March 25, 2026 08:42

kevalmorabia97 marked this pull request as ready for review March 25, 2026 09:41

kevalmorabia97 requested review from a team as code owners March 25, 2026 09:41

kevalmorabia97 requested review from realAsma and ynankani March 25, 2026 09:41

vishalpandya1990 reviewed Apr 1, 2026

View reviewed changes

examples/windows/accuracy_benchmark/modeling.py Show resolved Hide resolved

minor

22e9d4d

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Add back windows accuracy_benchmark dependencies + trust_remote_code fix

fdeb1ab

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

vishalpandya1990 approved these changes Apr 1, 2026

View reviewed changes

vishalpandya1990 reviewed Apr 1, 2026

View reviewed changes

examples/windows/diffusers/qad_example/requirements.txt Show resolved Hide resolved

ChenhanYu mentioned this pull request Apr 1, 2026

[Speculative Decoding] Refactor EAGLE3 training to YAML-based config and recipe system #1134

Merged

kevalmorabia97 added 2 commits April 1, 2026 11:47

revert onnx extension file back to logger

6061218

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Pin transformers<5.4 in spec dec example

c74d5ec

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

h-guo18 approved these changes Apr 1, 2026

View reviewed changes

kaix-nv approved these changes Apr 1, 2026

View reviewed changes

meenchen approved these changes Apr 1, 2026

View reviewed changes

shengliangxu approved these changes Apr 1, 2026

View reviewed changes

kevalmorabia97 mentioned this pull request Apr 1, 2026

Bump the uv group across 6 directories with 6 updates #1159

Closed

rohansjoshi approved these changes Apr 2, 2026

View reviewed changes

ynankani approved these changes Apr 2, 2026

View reviewed changes

realAsma reviewed Apr 3, 2026

View reviewed changes

realAsma reviewed Apr 4, 2026

View reviewed changes

modelopt/torch/quantization/plugins/huggingface.py Show resolved Hide resolved

realAsma reviewed Apr 4, 2026

View reviewed changes

modelopt/torch/quantization/plugins/huggingface.py Show resolved Hide resolved

kevalmorabia97 added 6 commits April 6, 2026 22:30

Merge branch 'main' into kmorabi/bump-transformers-5.0

48bb138

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Fix pyproject.toml version

c3f1e87

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Fix HFEagleModel for transformers 5.5

1ede794

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Sparse Sequential MoE fixes

54dd176

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into kmorabi/bump-transfor…

7dad663

…mers-5.0 Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Merge branch 'main' into kmorabi/bump-transformers-5.0

ddde158

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Edwardf0t1 mentioned this pull request Apr 7, 2026

Generic Fused MoE Quantization + Export for transformers 5.0+ #1187

Open

3 tasks

Merge branch 'main' into kmorabi/bump-transformers-5.0

644e0e0

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Edwardf0t1 approved these changes Apr 8, 2026

View reviewed changes

Conversation

kevalmorabia97 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 4, 2026

Uh oh!

coderabbitai bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks failed

❌ Failed checks (1 error, 1 warning)

Uh oh!

github-actions bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kevalmorabia97 commented Mar 24, 2026

Uh oh!

kevalmorabia97 commented Mar 24, 2026

Uh oh!

kevalmorabia97 commented Mar 25, 2026

Uh oh!

codecov bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kevalmorabia97 commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vishalpandya1990 commented Apr 1, 2026

Uh oh!

kevalmorabia97 commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

h-guo18 left a comment

Choose a reason for hiding this comment

Uh oh!

shengliangxu left a comment

Choose a reason for hiding this comment

Uh oh!

realAsma Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

realAsma Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

realAsma Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

realAsma Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Edwardf0t1 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

kevalmorabia97 commented Mar 4, 2026 •

edited

Loading

coderabbitai bot commented Mar 4, 2026 •

edited

Loading

github-actions bot commented Mar 24, 2026 •

edited

Loading

codecov bot commented Mar 25, 2026 •

edited

Loading

kevalmorabia97 commented Apr 1, 2026 •

edited

Loading

kevalmorabia97 commented Apr 1, 2026 •

edited

Loading

kevalmorabia97 Apr 7, 2026 •

edited

Loading