Add Deepseek-OCR-2 model by thisisiron · Pull Request #45075 · huggingface/transformers

thisisiron · 2026-03-27T20:14:27Z

What does this PR do?

Adds the DeepSeek-OCR2 model.

Reference

Arxiv Paper: DeepSeek-OCR 2: Visual Causal Flow
Huggingface hub: deepseek-ai/DeepSeek-OCR-2

Who can review?

@zucchini-nlp @yonigozlan

…rmat

zucchini-nlp · 2026-04-02T16:42:39Z

Hey @thisisiron , I have been quite busy with other models lately and this one slipped off my radar. Will be reviewing after Easter next week :)

…rocessorKwargs

thisisiron · 2026-04-15T11:07:18Z

Hi @zucchini-nlp

Summary of updates:

Config: removed the hidden_size/rms_norm_eps sync block on DeepseekOcr2VisionConfig.
- skipped test_get_image_features_output and test_eager_matches_batched_and_grouped_inference
Processor: removed the unsafe auto <image> insertion and made images required
Model:
- replaced DeepseekOcr2Projector wrapper with a plain nn.Linear
- moved view_separator scaling to _init_weights
ImageProcessorKwargs: added FIXME comment referencing the modular-converter issue (🚨🚨 Refactor Image Processors to support different backends #43514).

zucchini-nlp · 2026-04-15T11:14:06Z

Thanks a lot, I see slow integration tests are also ✅ , so requesting a review from core maintainers!

vasqu

Sorry for the delays! So many models concurrently :D the implementation is super nice already, this is more about details and syncing with main

vasqu · 2026-04-21T19:02:41Z

+    @unittest.skip("hidden_size is on vision_config.encoder_config, not on vision_config.")
+    @parameterized.expand([True, False, None])
+    def test_get_image_features_output(self, return_dict: bool | None):
+        pass
+
+    @unittest.skip("rms_norm_eps on vision_config.encoder_config is not reached by set_config_for_less_flaky_test.")
+    @parameterized.expand(TEST_EAGER_MATCHES_BATCHED_AND_GROUPED_INFERENCE_PARAMETERIZATION)
+    def test_eager_matches_batched_and_grouped_inference(self, name, dtype):
+        pass


Imo these are important enough that it would be nice to override

test_get_image_features_output: overridden via _image_features_prepare_config_and_inputs to set vision_config.hidden_size from encoder_config.

test_eager_matches_batched_and_grouped_inference: skip removed

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

…errides

…test images

…nsformers into add-deepseek_ocr2

thisisiron · 2026-04-25T14:35:17Z

hi, @vasqu ready for review

Check

sam config vs backbone config - [link]
interpolate_pos_encoding dtype - [link]
test_get_image_features_output - [link]

github-actions · 2026-04-25T16:11:50Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, deepseek_ocr2

vasqu · 2026-04-27T17:05:08Z

Will check it out tomorrow!! Thanks a lot for fixing things up 🤗

vasqu

Don't have much on my side anymore, it's close to merge - let's fixup some last details but nothing too big anymore

vasqu · 2026-04-28T09:42:01Z

+        "encoder_config": DeepseekOcr2EncoderConfig,
+    }
+
+    sam_config: dict | PreTrainedConfig | None = None


No imo it's fine to keep as is then. It's not perfect but at the same time this model is not compatible with normal vision encoders other than sam so it's fine to be explicit in this case

vasqu · 2026-04-28T09:49:51Z

+        super().__init__(config)
+        self.proj = DeepseekOcr2SamVisionProj(config)
+
+    def interpolate_pos_encoding(self, pos_embed: torch.Tensor, target_size: int, dtype: torch.dtype) -> torch.Tensor:


Is this maybe copied/adapted from some other model, if yes let's add a comment to reference that

vasqu · 2026-04-28T10:11:07Z

+        )
+        generate_ids = model.generate(**inputs, do_sample=False, max_new_tokens=20)
+        decoded = self.processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
+        self.assertTrue(decoded.startswith("R&D QUALITY IMPROVEMENT"))


Imo should be full text - can we use Expectations() class and you fill in your cuda capability - I will adjust an entry for our CI

thisisiron added 22 commits March 27, 2026 20:08

feat: add DeepSeek-OCR-2 model registration and docs

e6bed84

feat: add DeepseekOcr2ImageProcessor

7a1a400

feat: add DeepseekOcr2Processor and refactor image processor tile_size

f3ec285

feat: add script to convert DeepSeek-OCR-2 weights to Hugging Face fo…

95b0d57

…rmat

feat: enhance DeepSeek-OCR-2 processing and inference test

b81bc0f

refactor

bbf3f94

feat: add fast image processor

2e05944

refactor

8f3270b

test: add image processor tests for DeepseekOcr2

3ee14eb

fix: make background_color configurable

58b683e

refactor: migrate image processors to pil/torchvision backend pattern

3710499

test: add processor tests for DeepseekOcr2

1bfa054

fix: update __init__

1878af2

chore: clean up unused imports and fix formatting

46ddaec

feat: add configuration, modeling, and modular for DeepseekOcr2

8754d8d

fix: style fixes, update docs, and minor cleanups

fef7b5f

fix: use @strict

5520c31

fix: register private models

fee7de0

docs: add usage example and expand DeepSeek-OCR-2 model doc

8531218

fix: add checkpoint to auto_docstring

f6fc20b

fix: remove comment

b4bfbf5

fix: remove unused max_query

e775577

thisisiron changed the title ~~[WIP] Add Deepseek-OCR2 model~~ Add Deepseek-OCR-2 model Mar 28, 2026

thisisiron mentioned this pull request Apr 2, 2026

add DeepSeek-OCR2 #45177

Open

2 tasks

zucchini-nlp self-requested a review April 2, 2026 16:42

thisisiron added 4 commits April 3, 2026 13:37

fix: clean up DeepSeek-OCR2 modular

44482df

Merge branch 'main' into add-deepseek_ocr2

4b1605a

docs: update date

c6f5eaf

refactor: inherit SamVisionEncoder

f05c252

thisisiron added 7 commits April 14, 2026 17:55

refactor: address review comments

8b391e6

Merge branch 'main' into add-deepseek_ocr2

0102cbf

fix: remove in DeepseekOcr2Model

451fd53

refactor: enforce explicit tokens in DeepseekOcr2Processor

0f73ee0

Merge branch 'main' into add-deepseek_ocr2

50b210c

refactor: inherit DeepseekOcr2ImageProcessorKwargs from GotOcr2ImageP…

f81b8b9

…rocessorKwargs

refactor: remove unused image processing

d84deae

zucchini-nlp requested review from Cyrilvallez and vasqu April 15, 2026 11:13

zucchini-nlp mentioned this pull request Apr 17, 2026

Fix: modular image processors #45492

Merged

vasqu reviewed Apr 21, 2026

View reviewed changes

thisisiron and others added 12 commits April 23, 2026 10:14

fix: update docs

765a156

Merge remote-tracking branch 'origin/main' into add-deepseek_ocr2

45aab0f

fix: export sub-configs

cd72881

Update src/transformers/models/deepseek_ocr2/modular_deepseek_ocr2.py

54294ac

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

refactor: switch to mlp_layer_types pattern and clean up redundant ov…

9310aaf

…errides

refactor: derive projector dims from sub-configs

78fdc1c

refactor: address comments on naming, decorators, and mask location

a73c8a4

refactor: use VLMModelTester, fix processor text handling, and cache …

9389109

…test images

Merge branch 'add-deepseek_ocr2' of https://github.com/thisisiron/tra…

dddd685

…nsformers into add-deepseek_ocr2

fix: enable fullgraph compile

d9180fd

refactor: move weight mapping to conversion_mapping.py

916c8b0

fix: add tie_word_embeddings

9a9b736

Merge branch 'main' into add-deepseek_ocr2

1d13301

vasqu approved these changes Apr 28, 2026

View reviewed changes

Conversation

thisisiron commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Reference

Who can review?

Uh oh!

zucchini-nlp commented Apr 2, 2026

Uh oh!

thisisiron commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp commented Apr 15, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vasqu Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

thisisiron Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thisisiron commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Check

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

vasqu commented Apr 27, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vasqu Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vasqu Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vasqu Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

thisisiron commented Mar 27, 2026 •

edited

Loading

thisisiron commented Apr 15, 2026 •

edited

Loading

thisisiron commented Apr 25, 2026 •

edited

Loading