🚨 [v5] Delete feature extractors used for vision by zucchini-nlp · Pull Request #41174 · huggingface/transformers

zucchini-nlp · 2025-09-26T09:08:32Z

What does this PR do?

As per title, let's clean up for v5

These were supposed to be deleted anyway and we had the warning logged for a long time, even before I joined. Feature Extractors are now reserved for audio models only

molbap · 2025-09-29T14:34:16Z

Nice, IIRC there were a few utils in processing handling that involved using the feature_extractor keyword as well

zucchini-nlp · 2025-09-30T10:20:44Z

Yeah, the tests are failing, I will do one more round of cleaning

github-actions · 2025-09-30T10:51:44Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto

zucchini-nlp · 2025-09-30T11:24:58Z

Done

gante

LGTM 🤗 🧹 🧹 🧹

(probably needs a few more deletions, looking at the CI issues)

HuggingFaceDocBuilderDev · 2025-09-30T16:37:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* bye bye * remove from docs * do not use feature extractor here * fix docs * do not delete it * forgot these

### Ticket N/A ### Problem description Uplift the transformers library from `4.57.1` to `5.2.0` to broaden model support and enable new models such as GLM-5 to run on our stack. Transformers 5.x is a major version with several breaking changes that required fixes across both tt-xla and tt-forge-models. ### What's changed #### Transformers 5.x breaking changes and how we addressed them **Flax/JAX backend removed (transformers 5.0, [PR #40760](huggingface/transformers#40760 All `FlaxXxx` model classes were removed from the library. As a result: - All JAX tests backed by `FlaxPreTrainedModel` are now marked `NOT_SUPPORTED_SKIP` (82 test entries updated in `test_config_inference_single_device.yaml`). Affected model families: albert, bart, beit, bert/masked_lm, longt5, mt5, t5, regnet, resnet, vit, dinov2, bloom, clip, distilbert, electra, gpt_j, gpt_neo, gpt_sw3, mistral, opt, roberta, roformer, squeezebert, wav2vec2, whisper, xglm, xlm_roberta, marian_mt, mbart50, bigbird, pegasus, vision_text_dual_encoder - Removed `FlaxPreTrainedModel` from the `Model` type alias in `types.py` and from `isinstance` checks and parameter handling in `jax_model_tester.py` and `dynamic_jax_model_tester.py` - Four mamba tensor-parallel test entries removed from `test_config_inference_tensor_parallel.yaml` (Flax mamba model class was removed) - EasyDel-based JAX models (falcon, phi1, phi1_5, phi2, phi3, gpt2, qwen 2.5/coder/3, llama, whisper) remain functional and are pinned to `transformers==4.57.1` via per-model `requirements.txt` in tt-forge-models, since EasyDel itself requires the older transformers API **Legacy cache format removed (transformers 5.0–5.2, [PR #41378](huggingface/transformers#41378), [PR #43168](huggingface/transformers#43168 `to_legacy_cache()`, `from_legacy_cache()`, `get_usable_length()`, and all deprecated `Cache` subclasses were removed. Changes made: - Updated `kimi_k2/modeling_deepseek.py`: replaced `DynamicCache.from_legacy_cache()` with a manual layer-by-layer construction, replaced `to_legacy_cache()` with a manual tuple, and replaced `get_usable_length()` with `get_seq_length()` - Updated `kimi_k2/test_kimi_k2.py`: replaced tuple-indexed shard spec keys (`args[3][0][0]`) with the new layer attribute API (`args[3].layers[0].compressed_kv`), and added `lazy_initialization()` calls for `StaticCache` layers **Unified attention interface (transformers 5.x)** Attention modules no longer return `attn_weights` when using the unified SDPA/flash/eager dispatch path, and require `_attn_implementation` to be set explicitly on the config. Updated Gemma and Mistral attention tests to: - Set `config._attn_implementation = "sdpa"` before constructing attention modules - Drop `attn_weights` from the return value of the inner attention call **`XXXFeatureExtractor` classes removed (transformers 5.0, [PR #41174](huggingface/transformers#41174 All legacy vision `FeatureExtractor` classes were replaced by `ImageProcessor` equivalents. Updated in tt-forge-models: - `detr`: `DetrFeatureExtractor` → `DetrImageProcessor` - `maskformer`: `MaskFormerFeatureExtractor` → `MaskFormerImageProcessor` - `yolos_small`: `YolosFeatureExtractor` → `YolosImageProcessor` **`encode_plus()` / `batch_encode_plus()` removed in favour of `__call__()` (transformers 5.0)** The legacy tokenizer encoding methods were formally removed. Changes made: - tt-forge-models (`huggyllama`, `mistral`, `roberta`): `tokenizer.encode_plus(...)` → `tokenizer(...)` - `examples/pytorch/sdxl-pipeline.py`: `tokenizer.batch_encode_plus(...)` → `tokenizer(...)` - `tests/torch/models/llama3/test_llama_step_n300.py`: `tokenizer.encode_plus(...)` → `tokenizer._encode_plus(...)` (private method still present in 5.x as the internal implementation; should ideally be `tokenizer(...)`) - `tests/torch/quality/image_gen/sdxl/pipeline.py`: replaced the private `tokenizer._encode_plus(...)` call (which broke in 5.x for list inputs with `padding="max_length"`) with the public `tokenizer(...)` interface with explicit `padding="max_length"`, `truncation=True`, and `return_tensors="pt"`. The old code produced mismatched sequence lengths for conditioned vs unconditioned tokens causing a `torch.cat` shape mismatch error. **`trust_remote_code` no longer needed for phi3 (transformers 5.x)** The phi3 model was upstreamed into the official transformers library and `trust_remote_code=True` is now unnecessary. Removed from `AutoTokenizer.from_pretrained`, `AutoConfig.from_pretrained`, and `model_kwargs` in the phi3 loader. **`torch.fx` support dropped (transformers 5.0, [PR #41683](huggingface/transformers#41683 `is_torch_fx_available()`, `is_torch_greater_or_equal_than_1_13`, and all `torch.fx` tracing guards were removed. Updated: - `deepseek_r1` (deepseekv2) loader in tt-forge-models - `kimi_k2/modeling_deepseek.py`: removed `is_torch_fx_available` import and the `_prepare_4d_causal_attention_mask` FX wrap block; replaced `rope_scaling["type"]` dict access with `.get()` to guard against missing keys in newer config formats **VLM sub-module path changed (transformers 5.x, [PR #42156](huggingface/transformers#42156 Vision-language models no longer expose `model.language_model` directly at the top level; it is now accessed via `model.model.language_model`. Updated `mistral/pixtral` loader to add `_get_language_model()` and `_get_vision_tower()` helpers that handle both paths when building shard specs. **`AutoProcessor` with `trust_remote_code` removed for custom processors (transformers 5.x)** `AutoProcessor.from_pretrained(trust_remote_code=True)` no longer works for models with custom processing classes not registered in the transformers auto-mapping. Updated `openvla_oft` to explicitly instantiate `PrismaticImageProcessor` and `PrismaticProcessor` from the local `openvla/pytorch/src/` source. **`tie_weights()` signature changed (transformers 5.x)** `PreTrainedModel.tie_weights()` now passes through `**kwargs`. Updated the `tie_weights` override in `openvla/pytorch/src/modeling_prismatic.py` to accept and forward `**kwargs` to avoid a `TypeError` on model init. **`XLMRobertaSdpaSelfAttention` removed (transformers 5.x)** The separate SDPA attention class was consolidated into the unified attention dispatch. Rewrote `XLMRobertaSelfAttentionWithAdapters` in `sentencizer/pytorch/src/adapter_utils.py` to conform to the new `forward()` signature using `eager_attention_forward` from transformers. **`HfFolder.get_token()` removed (huggingface_hub)** `HfFolder` was removed in recent `huggingface_hub` versions. Updated `sentencizer/pytorch/src/utils.py` to use `HfApi().token` instead. **mamba2 JAX loader removed** `mamba2/causal_lm/jax` was removed as it was non-functional and incompatible with the pinned EasyDel version used by other JAX models. #### tt-xla infrastructure changes - **`transformers` removed from `_JAX_PURGE_SKIP`** (`tests/runner/requirements.py`): `transformers` was previously excluded from the `sys.modules` purge that `RequirementsManager` performs after a per-model pip install. This meant that when an EasyDel model installed `transformers==4.57.1`, the venv's 5.2.0 stayed cached in memory and the newly installed version was never visible to imports. Removing `transformers` from the skip list (keeping only `flax`, which has genuine module-level imports in JAX infra) ensures the installed version is correctly used. All JAX infra files were audited to confirm none hold module-level `transformers` references. - **Sparse MLP router output fix** (`python_package/tt_torch/sparse_mlp.py`): `GptOssTopKRouter` was updated to return a 3-tuple `(router_logits, router_scores, router_indices)` instead of 2. Updated all three MoE dispatch paths (`SparseMLP`, `A2aSparseMLP`, `A2aSparseStackedMlp`) to unpack accordingly and simplified the weighted-sum logic to use the compact scores tensor directly, removing a workaround that used `torch.gather` / one-hot einsum. - **Performance benchmark matrix** (`.github/workflows/perf-bench-matrix.json`): Updated all PyTorch benchmark entries from `transformers==4.57.1` to `transformers==5.2.0`. The `resnet_jax` and `bge_m3_encode` entries are intentionally kept at `transformers==4.57.1` — `FlaxResNetForImageClassification` was removed in 5.x, and `FlagEmbedding` (used by bge_m3) is not yet compatible with 5.x. - **LLM benchmark version check** (`tests/benchmark/benchmarks/llm_benchmark.py`): Updated `check_transformers_version()` to require exactly `5.2.0` instead of `<= 4.57.1`. Also removed the now-unnecessary `check_transformers_version()` guard from `examples/pytorch/llama.py`. - **Resnet codegen examples skipped** (`tests/examples/test_examples.py`): Added XFAIL entries for `jax/codegen/cpp/resnet.py` and `jax/codegen/python/resnet.py` since `FlaxResNetModel` was removed in transformers 5.x. - **`surya-ocr` unpinned** (`venv/requirements-dev.txt`): Removed the `surya-ocr==0.17.0` version pin. #### tt-forge models PR: tenstorrent/tt-forge-models#529 ### CI tests for reference: Manual Release test: https://github.com/tenstorrent/tt-xla/actions/runs/23179435697 Manual Manylinux release test: https://github.com/tenstorrent/tt-xla/actions/runs/23179426382 ### Checklist - [x] Fix `gpt_oss` failure - [x] Fix JAX-only CI workflows --------- Co-authored-by: Vladimir Zeljkovic <vzeljkovic@tenstorrent.com>

zucchini-nlp added 2 commits September 26, 2025 10:53

bye bye

32b714f

remove from docs

dbe5f87

zucchini-nlp requested a review from gante September 26, 2025 09:09

zucchini-nlp added 2 commits September 30, 2025 12:49

do not use feature extractor here

8f15205

merge main

2572443

fix docs

a04a276

gante approved these changes Sep 30, 2025

View reviewed changes

do not delete it

f51c02f

forgot these

cf57f97

zucchini-nlp merged commit ae879f6 into huggingface:main Oct 1, 2025
25 checks passed

zucchini-nlp mentioned this pull request Oct 9, 2025

Welcome v5 #40822

Closed

AhnJoonSung pushed a commit to AhnJoonSung/transformers that referenced this pull request Oct 12, 2025

🚨 [v5] Delete feature extractors used for vision (huggingface#41174)

04d72a7

* bye bye * remove from docs * do not use feature extractor here * fix docs * do not delete it * forgot these

zucchini-nlp added the for_v5? label Nov 26, 2025

ssaliceTT mentioned this pull request Feb 26, 2026

Transformers v5.2.0 Uplift tenstorrent/tt-xla#3371

Merged

2 tasks

BBC-Esq mentioned this pull request Mar 30, 2026

Upgrade to HF transformers >= v5 docling-project/docling#3090

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚨 [v5] Delete feature extractors used for vision#41174

🚨 [v5] Delete feature extractors used for vision#41174
zucchini-nlp merged 7 commits intohuggingface:mainfrom
zucchini-nlp:delete-image-feat-extractors

zucchini-nlp commented Sep 26, 2025 •

edited

Loading

Uh oh!

molbap commented Sep 29, 2025

Uh oh!

zucchini-nlp commented Sep 30, 2025

Uh oh!

github-actions Bot commented Sep 30, 2025

Uh oh!

zucchini-nlp commented Sep 30, 2025

Uh oh!

gante left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zucchini-nlp commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

molbap commented Sep 29, 2025

Uh oh!

zucchini-nlp commented Sep 30, 2025

Uh oh!

github-actions Bot commented Sep 30, 2025

Uh oh!

zucchini-nlp commented Sep 30, 2025

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zucchini-nlp commented Sep 26, 2025 •

edited

Loading