Skip to content

[SAM3-LiteText] Fix modular converter KeyError and torchvision soft dependency#72

Closed
JavierYepez wants to merge 49 commits intoNielsRogge:add_sam_3_lite_textfrom
JavierYepez:add_sam_3_lite_text
Closed

[SAM3-LiteText] Fix modular converter KeyError and torchvision soft dependency#72
JavierYepez wants to merge 49 commits intoNielsRogge:add_sam_3_lite_textfrom
JavierYepez:add_sam_3_lite_text

Conversation

@JavierYepez
Copy link
Copy Markdown

What does this PR fix?

This PR resolves two bugs in the SAM3-LiteText integration introduced in huggingface#44320.

  1. KeyError: '__init__' in modular model converter
    Sam3LiteTextVisionConfig overrode init directly, which caused utils/modular_model_converter.py to crash with a KeyError: '__init__' when it tried to look up the method in original_modeling_methods. Since configurations now use @strict dataclasses from huggingface_hub, initialization logic must go in __post_init__ instead. The fix replaces the __init__ override with a proper post_init method that initializes backbone_config and then delegates to `super().post_init().

  2. Hard torchvision import in modeling file
    torchvision was unconditionally imported at the top of modeling_sam3_lite_text.py, causing an ImportError for users without it installed. The fix guards the import behind is_torchvision_available() and adds a @requires(backends=("torch", "torchvision")) decorator to `Sam3LiteTextPreTrainedModel to surface a clear error message.

  3. Configuration class cleanup
    Sam3LiteTextViTConfig was rewritten to use @auto_docstring and @strict decorators, removing the large hand-written docstring block in favor of the auto-generated one. Redundant __init__ arguments that were just pass-through wrappers around the parent SAM3 config were removed.

  4. _checkpoint_conversion_mapping → base_model_prefix
    The _checkpoint_conversion_mapping regex pattern in Sam3LiteTextModel (used to strip/add the detector_model. prefix) was replaced with the simpler base_model_prefix = "detector_model", which is the idiomatic approach in Transformers.

Test commands

python -c "from transformers import Sam3LiteTextConfig; c = Sam3LiteTextConfig(); print(c)"
python utils/modular_model_converter.py --files src/transformers/models/sam3_lite_text/modular_sam3_lite_text.py
make style
make check-repo

Notes
AI assistance was used to draft parts of the description; all changes have been reviewed line by line.
This is a companion fix to huggingface#44320, not a duplicate — it addresses bugs found during review/integration.
Related CLI fix for add_new_model_like.py is tracked separately in huggingface#44334.

hmellor and others added 30 commits March 24, 2026 20:48
* Fix tie_word_embedding issues with `Qwen2VL`

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* remove colqwen hack

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

---------

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* unify calls

* give me make!

* added tomli to quality extra

* keep teh exact same behavior

* remove circleCI from agent file

* unified UI

* better UX

* fix typing target

* dev-ci

* dev-ci

* hardcoded :dev

* removed dev blocks and python version marker

* make fix-repo
…ce#44970)

* fix 16 bytes alignement issue

* add issue for reference

* test fix for non-aligned inputs as well

* avoid dims non divisible by 8 for grouped_mm testing

* test

* style

* final fix that works for cpu builds as well

* move coment
* squash commit

* several forks mixed up, revert

* oops

* glms

* commit lost when rebasing, revert

* typing hints

* more failures

* fix repo

* comments and revert unrelated

* fix style

* fix repo
…the source size (huggingface#44899)

* fix: Correct interpolation target size

* test: Add fast test coverage
* BC

* update

* revert

* update

* style
…r Python 3.13 compat (huggingface#44986)

fix: remove `# Copied from` comments between @torch.jit.script and def for Python 3.13 compat

On Python 3.13, placing a comment between @torch.jit.script and the
function definition causes an IndentationError when torch.jit.script
calls inspect.getsource() followed by ast.parse(). The stricter parser
in Python 3.13 fails to associate the function body with the def when
a comment intervenes.

Remove the `# Copied from` comments from the three affected functions
(c2p_dynamic_expand, p2c_dynamic_expand, pos_dynamic_expand) in both
modeling_deberta_v2.py and modeling_sew_d.py, as suggested by the
maintainer in issue huggingface#44855.

Fixes huggingface#44855
* First draft

* [Videomt] Extend query-stage parity checks to 3-frame inputs

* [Videomt] Add full-model parity check against EoMT reference

* [Videomt] Compare conversion against official GitHub reference

* [Videomt] Simplify conversion to checkpoint-based HF mapping

* [Videomt] Add --verify mode against upstream GitHub implementation

* [Videomt] Improve --verify diagnostics with key remapping and layer checks

* [Videomt] Improve verify backbone candidate fallback and remapping

* [Videomt] Add DINOv3 verify compatibility patch and progress logging

* [Videomt] Extend verify diagnostics with MLP/head parity checks

* [Videomt] Make --verify succeed for converted weight mapping scope

* [videomt] Improve verify adapters and candidate traceback diagnostics

* [videomt] Adapt verify _pos_embed output for DINOv3 candidates

* [videomt] Enable DINOv3 verify candidate by adapting EVA head_dim

* [videomt] Add pre-query layer diagnostics to verify flow

* [videomt] Add deterministic verify probes and deeper pre-query diffs

* [videomt] Penalize skipped keys in verify candidate scoring

* [videomt] Add no-rope A/B diagnostics to verify pre-query layers

* [videomt] Add branch-level pre-query diagnostics to verify

* [videomt] Add fine-grained MLP diagnostics to verify

* [videomt] Verify layer-scale mapping parity in --verify

* [videomt] Validate MLP diagnostic decomposition in verify

* [videomt] Add token-group diagnostics for layer-4 MLP divergence

* [VidEoMT] Add temporal query updater path and re-verify yt_2019_vit_small

* [VidEoMT] Refine 5D execution order and re-check small checkpoint parity

* Simplify conversion script and convert all dinov2 checkpoints

* Add id2label mappings

* Fix all tests

* Add to auto mapping

* Simplify verify_conversion_against_github_reference

* Update absolute tolerance

* Update date

* Revert AGENTS.md

* Address comments

* Add circleci skill, fix circleci

* Fix CI

* Remove skills from git

* Address comments

* Address more comments

* Address comment

* Add docstrigns

* Restore AGENTS.md

* Address comment

* fix this one

* Address comments

* [fix] mistral 4 docs (huggingface#44776)

fix

* Address comment

* add expectations

* Update date

* Make fix-repo

* fix multi gpu

* fix with changes on main

* fix date

---------

Co-authored-by: vasqu <antonprogamer@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
`torch.is_autocast_enabled("meta")` raises a RuntimeError because
torch does not support autocast for the meta device. This breaks any
code that runs a forward pass on meta tensors (e.g. nnsight's `.scan()`
for tracing without materializing weights).

Since autocast is meaningless on meta tensors, return `nullcontext()`
early when `device_type == "meta"`.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ngface#44710)

* Fix AutoProcessor.from_pretrained silently dropping hub kwargs

The previous code used inspect.signature(cached_file).parameters to
filter kwargs before passing them to cached_file(). However, since
cached_file() is defined with **kwargs in its signature, only
'path_or_repo_id', 'filename', and 'kwargs' were visible as parameter
names. This meant user-supplied hub kwargs like force_download,
cache_dir, token, revision, etc. were silently dropped and never
forwarded.

Replace the inspect.signature approach with an explicit tuple of known
hub parameter names that cached_file actually accepts (via cached_files).
This matches how other auto classes like AutoTokenizer handle the same
situation.

Fixes huggingface#44704

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>

* narrow it a bit

---------

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
…ngface#45002)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* split out from timm PR

* all other VLMs

* timm backbone is not here

* oops, extra key is breaking eveerything

* .

* this test

* maybe

* fix missing keys when loading from hub

* now fix fast tests

* merge gone wrong

* fix repo

* refine the regex again!

* close the bracket

* Apply suggestions from code review

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* revert unrelated

* !

* revert more

* add submodule prefix when recursing

* i'll need to fix maskformer later

* dont duplicate the same pattern twice

* fix modular

* detr

* colpali isn't working still!

* oke, so this can be fine for now

* !

* revert

* dot lost in regex and comments

* timm wrapper is weird

* skip these, timm wrapper

* bye bye timm

* make repo check happy

* Revert "bye bye timm"

This reverts commit ca68663.

* love timm!

* Apply repo consistency fixes

* oke, the bot can't fix it so here we go

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* thanks claude

* typo

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* cross link and add docs about patching

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_output_tracing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_output_tracing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add anti-slop action

* remove maintainers names

* pin the action to 0.2.1

* tweaks @Rocketknigth1 reviews
* Rebase: Add base_model_tp_plan to OlmoeConfig (dataclass style)

Rebased onto main after configs were migrated to dataclasses.
Adds base_model_tp_plan as a class attribute and TensorParallelTesterMixin
to the OLMoE test suite.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Apply repo consistency fixes

* review: update src/transformers/models/olmoe/configuration_olmoe.py

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* review: use correct pattern for OlmoeModelTester class

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
…gingface#44146)

* rebase

* merge conflict

* merge conflict1

* merge conflict trainer

* blank space qulity run

* lint error

* modify test to address our change

* rebase

* rebase

* rebase

* rebase

* test updated with delay check

* checkpoint tests updated

* test updated in utils

* correct test condition

* style format
…uggingface#45031)

Use the correct _tied_weights_keys for CamembertForCausalLM
…#45032)

* multi runners

* multi runners

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Stacked commits cb-persistent

* Rebase fixes

* style

* ty compliance

* Fix

* nit
…ingface#45035)

The conversion operations table was missing PermuteForRope.
Added it with its reverse (itself), consistent with how other
operations are documented. PermuteForRope is self-inverse applying it twice returns the original tensor layout.
* cohere-asr model

* repo udpates

* tmp weight mapping

* add fast tests

* fix compile

* add integration tests

* update integration tests

* fixes

* clearer API

* test update

* fix

* cosmetics

* fix on parakeet encoder

* modular update

* Update src/transformers/models/cohere_asr/configuration_cohere_asr.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* make check-repo

* doc _reassemble_chunk_texts

* nit

* fix

* updates

* test update

* make style

* doc updates

* ensure bc with the hub checkpoints

* quick fixes

* remove rope - not used

* skip this one

* fix

* last fixes - needed revision + wrong main input name (less modular but we have to)

* style

* output_mask should be int!

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: vasqu <antonprogamer@gmail.com>
* first part of the fix

* fix torch imports

* revert

* fix: make from transformers import * work without torch

- `is_torchvision_available`, `is_timm_available`, `is_torchaudio_available`,
  `is_torchao_available`, `is_accelerate_available` now return False when
  torch is not installed, since all these packages require torch
- Add `@requires(backends=("torch",))` to `PI0Processor` (was missing,
  causing the lazy module to crash on import without torch)
- Fix wrong availability guards: `is_vision_available` → `is_torchvision_available`
  in pixtral processor, `is_torch_available` in smolvlm processor
- Wrap bare `import torch` / torchvision imports in `processing_sam3_video.py`
- Quote `torch.Tensor` in return type annotation of `tokenization_mistral_common.py`
- Wrap 66 `image_processing_pil_*.py` imports from torch-dependent counterparts
  in try/except with ImagesKwargs fallbacks; quote `torch.Tensor` annotations
- Restore explicit `from transformers import *` check in CircleCI
  `check_repository_consistency` job

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* up

* style of this

* revert: remove src/models changes, keep only core import fixes

The PIL image processor changes are too fragile (break on make fix-repo).
Keep only the core fixes:
- is_torchvision/timm/torchaudio/torchao/accelerate_available() check torch
- CircleCI explicit import check
- tokenization_mistral_common.py torch.Tensor annotation
- processing_sam3_video.py conditional torch imports
- processing_pixtral.py/processing_smolvlm.py availability guard fixes
- PI0Processor @requires decorator

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

* nit

* the mega quidproquo

* use rquires(backend

* more pil fixes

* fixes

* temp update

* up?

* is this it?

* style?

* revert a bunch of ai shit

* pi0 requires this

* revert some stuffs

* upd

* the fix

* yups

* ah

* up

* up

* fix

* yes?

* update

* up

* nits

* up

* up

* order

---------

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Guard import torch in processing_cohere_asr.py with is_torch_available()
- Add @requires(backends=("torch",)) to CohereAsrProcessor
- Fix is_vision_available() to use actual import test instead of find_spec
- Fix is_torchvision_available() and is_timm_available() to require vision
- Fix is_pytesseract_available() to require vision
- Fix is_mistral_common_available() to require vision
- Add setdefault for image_processing_backends in __init__.py
- Guard import torchvision in sam3/modeling_sam3.py and processing_sam3_video.py
- Add @requires(backends=("torch", "torchvision")) to Sam3PreTrainedModel
- Add @requires(backends=("torch", "torchvision")) to Sam3VideoProcessor
- Add CI checks for torch-only and PIL-only import scenarios

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* speed up docstring checker

* add doctsrings and improve test readability

* fmt

* refactor cache so it's owned by a single function and the flow is clearer

* adopt test_fetcher style for cache
…uggingface#44985)

* fix: preserve rotary_pct across save/load cycle in GPTNeoX configs

Use setdefault instead of unconditional assignment for
partial_rotary_factor in GPTNeoXConfig and GPTNeoXJapaneseConfig,
so the value saved in rope_parameters is not overwritten with the
default on reload.

* refactor: simplify partial_rotary_factor to use setdefault per review

Replace the 4-line if/else block with a single setdefault call,
matching the pattern already used for rope_theta on the line above.
As suggested by @zucchini-nlp in PR review.
javierdejesusda and others added 19 commits March 27, 2026 09:20
…gingface#45019)

* Fix GraniteConfig type hints to accept int for multiplier fields

Fixes huggingface#44877

* Also update granitemoe and granitemoeshared multiplier type hints
* squash

* fix copies

* skip, we dont need to load base model for it

* oops, one more regex since now we have no prefix
* fix Image.open failure in case "tests/models/prompt_depth_anything/test_modeling_prompt_depth_anything.py::PromptDepthAnythingModelIntegrationTest::test_inference"

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* updated

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Update update_metdata.yml

`tf` and `flax` are long gone
huggingface#44644)

* fix tests/quantization/fp_quant_integration/test_fp_quant.py::FPQuantMXFP4PseudoquantTest::test_quantized_model fail in xpu

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* updated

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Fix failing SmolLM3IntegrationTest
* check float before using normal op

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix llama4 weight

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add bnb quant skip module for llama4

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert bnb integration

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert initialization.py

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* total revert init

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix _keep_in_fp32_modules

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add _modules_to_not_quantize

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix modules_to_not_convert

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update bnb quantize condition

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
…trained` (huggingface#45058)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…ain CI (huggingface#45004)

* fix: Guard sdpa flash test and fix phi3/pi0 tests

* fix: Narrow scope by adding it to the skip list

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
fix(security): remediate workflow vulnerability in .github/workflows/update_metdata.yml

Co-authored-by: hf-security-analysis[bot] <265538906+hf-security-analysis[bot]@users.noreply.github.com>
* fix

* convert only if non-empty key
…uggingface#44983)

fix: add identity reverse_op to dequantize operations for save_pretrained

Dequantize operations (Mxfp4Dequantize, Fp8Dequantize, MetalDequantize)
raise NotImplementedError on reverse_op, causing save_pretrained to fail
for models loaded with dequantize=True.

Add _IdentityOp as the reverse_op so dequantized weights are saved as-is.
* Cache model modules in check_repo

* add test coverage
…age processors (huggingface#45045)

* [Bugfix] Remove incorrect torchvision requirement from PIL backend image processors

PR huggingface#45029 added @requires(backends=("vision", "torch", "torchvision")) to 67
PIL backend image_processing_pil_*.py files. This causes PIL backend classes
to become dummy objects when torchvision is not installed, making
AutoImageProcessor unable to find any working processor.

Fix: set @requires to ("vision",) for files that only need PIL, and
("vision", "torch") for files that also use torch directly. Also fix
5 modular source files so make fix-repo preserves the correct backends.

Fixes huggingface#45042

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [Bugfix] Remove redundant @requires(backends=("vision",)) from PIL backends

Per reviewer feedback: the vision-only @requires decorator is redundant
for PIL backend classes since PilBackend base class already handles this.

- Remove @requires(backends=("vision",)) from 43 PIL backend files
- Remove unused `requires` import from 38 files (Category A)
- Keep @requires(backends=("vision", "torch")) on method-level decorators (Category B: 5 files)

* update

* remove torch when its not necessary

* remove if typechecking

* fix  import shinanigans

* marvellous that's how we protect torch :)

* beit is torchvisionbackend

* more import cleanup

* fiixup

* fix-repo

* update

* style

* fixes

* up

* more

* fix repo

* up

* update

* fix imports

* style

* fix check copies

* arf

* converter up

* fix?

* fix copies

* fix for func

* style

* ignore

* type

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
@JavierYepez
Copy link
Copy Markdown
Author

Hello @NielsRogge I can't add you as a reviewer. Please, let me know if there is something else to do/fix

@NielsRogge
Copy link
Copy Markdown
Owner

Hi, any reason this PR was opened? I've already opened a PR for SAM3-LiteText here: huggingface#44320, and it looks like the git diff got messed up

@JavierYepez
Copy link
Copy Markdown
Author

Hi, any reason this PR was opened? I've already opened a PR for SAM3-LiteText here: huggingface#44320, and it looks like the git diff got messed up

I opened this PR because running python utils/modular_model_converter.py sam3_lite_text didn't work and the quality tests failed.

I don't know why the git diff got messed up. I forked the transformers repo, created a branch, merged yours and changed a few lines of code... Any Idea on how to fix the git diff?

@yonigozlan yonigozlan force-pushed the add_sam_3_lite_text branch from e5a5063 to 8f35675 Compare March 30, 2026 16:59
@JavierYepez
Copy link
Copy Markdown
Author

JavierYepez commented Mar 31, 2026

I see this is no longer needed. Thank you @NielsRogge and @yonigozlan for adding Efficient SAM3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.