Add experimental support for transformers>=5.0 + min torch 2.8#975
Add experimental support for transformers>=5.0 + min torch 2.8#975kevalmorabia97 merged 32 commits intomainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaced many Hugging Face Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Script
participant HF as "HuggingFace.from_pretrained"
participant ModelOpt as "ModelOpt plugin (_restore_qtensor_wrappers)"
participant FS as "modelopt_state.pth (FS)"
User->>Script: run (optional --trust_remote_code)
Script->>HF: from_pretrained(..., dtype=..., trust_remote_code=...)
HF-->>Script: returns model instance
Script->>ModelOpt: patched hook invoked after instantiation
ModelOpt->>FS: check for modelopt_state.pth
FS-->>ModelOpt: q_tensor_state (if present)
ModelOpt->>ModelOpt: re-wrap weights preserving QTensorWrapper metadata
ModelOpt-->>Script: model with restored wrappers
Script-->>User: continue (quantize/export/generate)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (2 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
3e28ada to
2b24815
Compare
|
|
/ok to test 2b24815 |
|
/ok to test 1f0726e |
1f0726e to
48b426f
Compare
|
/ok to test 48b426f |
48b426f to
0781ac7
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #975 +/- ##
==========================================
+ Coverage 71.65% 77.20% +5.55%
==========================================
Files 353 353
Lines 40355 40416 +61
==========================================
+ Hits 28915 31204 +2289
+ Misses 11440 9212 -2228
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
2.11 was recently released and I wanted to make sure the transformers upgrade works for 2.11 as well hence enabled 2.11 in testing. Since at any time we only want to test and support last 4 torch releases (~1 year old releases), hence bumped min to 2.8 but actually we didnt need code changes so users will likely be fine if they use older torch for now |
Does it include examples/windows samples? Also, Please specify reasons for certain dependencies removal in requirements.txt in windows examples? Note that accuracy_benchmark can be run in a standalone virtual environment with given onnx checkpoint without setting up modelopt. |
@vishalpandya1990 Since windows examples dont have cicd tests, I just set If dependencies are already covered in |
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
h-guo18
left a comment
There was a problem hiding this comment.
Changes regarding speculative decoding LGTM
shengliangxu
left a comment
There was a problem hiding this comment.
Approve to unblock
| if not hasattr(experts, "__iter__"): | ||
| # transformers>=5.0: batched experts, no per-expert quantizers | ||
| return | ||
|
|
There was a problem hiding this comment.
@kevalmorabia97 this could break some things -
We need to take a closer look
There was a problem hiding this comment.
This fix is similar to another such recent fix here: https://github.com/NVIDIA/Model-Optimizer/pull/1136/changes
What should we do here? Some MoE tests were failing without this
There was a problem hiding this comment.
Can we move this to modelopt/torch/quantization/plugins/huggingface.py - this method is called from there
There was a problem hiding this comment.
@kevalmorabia97 can you please move this change to
_is_sparse_moe_block ? and change _is_sparse_moe_block name to _is_sparse_sequaential_moe_block
There was a problem hiding this comment.
The reason is our _QuantSparseMoE base class is for supporting extra utilities for sparse sequential MoEs. This method should not be called for batched gemm MoEs. We could also update the _QuantSparseMoe docstring to reflect that this base class is for Sequential MoEs (i.e each experts are implemented as standalone modules) in HF.
There was a problem hiding this comment.
Fixed in 54dd176 and disabled some from test_sparse_sequential_moe.py for transformers>=5.0
PTAL
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…mers-5.0 Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Edwardf0t1
left a comment
There was a problem hiding this comment.
LGTM, pls resolve conflicts - I will rebase my PR for ptq + export support #1187
What does this PR do?
--warmup-ratio: float(deprecated in 5.x), we now change it to--warmup-steps: float | intwhich works as ratio if float but only for 5.x. For 4.x, it will error out if float and prompt user to change back to--warmup-ratioor pass an int absolute step count.Add Workaround for TRT-LLM's import of deprecated transformers functions so trt-llm based gpu unit tests work fine. Still deployment for models needs proper fixes directly in TRT-LLM hence llm/vlm ptq example tests still run with transformers 4.57Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True, usingtorch.load(..., weights_only=True), avoidingpickle, etc.).Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Documentation
Chores
Tests