Skip to content

Add Deepseek-OCR-2 model#45075

Open
thisisiron wants to merge 68 commits intohuggingface:mainfrom
thisisiron:add-deepseek_ocr2
Open

Add Deepseek-OCR-2 model#45075
thisisiron wants to merge 68 commits intohuggingface:mainfrom
thisisiron:add-deepseek_ocr2

Conversation

@thisisiron
Copy link
Copy Markdown
Contributor

@thisisiron thisisiron commented Mar 27, 2026

What does this PR do?

Adds the DeepSeek-OCR2 model.

Reference

Who can review?

@zucchini-nlp @yonigozlan

@thisisiron thisisiron changed the title [WIP] Add Deepseek-OCR2 model Add Deepseek-OCR-2 model Mar 28, 2026
@thisisiron thisisiron mentioned this pull request Apr 2, 2026
2 tasks
@zucchini-nlp
Copy link
Copy Markdown
Member

Hey @thisisiron , I have been quite busy with other models lately and this one slipped off my radar. Will be reviewing after Easter next week :)

@zucchini-nlp zucchini-nlp self-requested a review April 2, 2026 16:42
@thisisiron
Copy link
Copy Markdown
Contributor Author

thisisiron commented Apr 15, 2026

Hi @zucchini-nlp

Summary of updates:

  • Config: removed the hidden_size/rms_norm_eps sync block on DeepseekOcr2VisionConfig.
    • skipped test_get_image_features_output and test_eager_matches_batched_and_grouped_inference
  • Processor: removed the unsafe auto <image> insertion and made images required
  • Model:
    • replaced DeepseekOcr2Projector wrapper with a plain nn.Linear
    • moved view_separator scaling to _init_weights
  • ImageProcessorKwargs: added FIXME comment referencing the modular-converter issue (🚨🚨 Refactor Image Processors to support different backends #43514).

@zucchini-nlp
Copy link
Copy Markdown
Member

Thanks a lot, I see slow integration tests are also ✅ , so requesting a review from core maintainers!

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delays! So many models concurrently :D the implementation is super nice already, this is more about details and syncing with main

Comment thread docs/source/en/model_doc/deepseek_ocr2.md Outdated
Comment thread docs/source/en/model_doc/deepseek_ocr2.md
Comment thread docs/source/en/model_doc/deepseek_ocr2.md Outdated
Comment thread docs/source/en/model_doc/deepseek_ocr2.md Outdated
Comment thread src/transformers/models/auto/configuration_auto.py
Comment thread src/transformers/models/deepseek_ocr2/modular_deepseek_ocr2.py
Comment thread src/transformers/models/deepseek_ocr2/processing_deepseek_ocr2.py
Comment thread tests/models/deepseek_ocr2/test_modeling_deepseek_ocr2.py Outdated
Comment on lines +220 to +228
@unittest.skip("hidden_size is on vision_config.encoder_config, not on vision_config.")
@parameterized.expand([True, False, None])
def test_get_image_features_output(self, return_dict: bool | None):
pass

@unittest.skip("rms_norm_eps on vision_config.encoder_config is not reached by set_config_for_less_flaky_test.")
@parameterized.expand(TEST_EAGER_MATCHES_BATCHED_AND_GROUPED_INFERENCE_PARAMETERIZATION)
def test_eager_matches_batched_and_grouped_inference(self, name, dtype):
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo these are important enough that it would be nice to override

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_get_image_features_output: overridden via _image_features_prepare_config_and_inputs to set vision_config.hidden_size from encoder_config.

test_eager_matches_batched_and_grouped_inference: skip removed

Comment thread tests/models/deepseek_ocr2/test_modeling_deepseek_ocr2.py
@thisisiron
Copy link
Copy Markdown
Contributor Author

thisisiron commented Apr 25, 2026

hi, @vasqu ready for review

Check

  1. sam config vs backbone config - [link]
  2. interpolate_pos_encoding dtype - [link]
  3. test_get_image_features_output - [link]

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, deepseek_ocr2

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Apr 27, 2026

Will check it out tomorrow!! Thanks a lot for fixing things up 🤗

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have much on my side anymore, it's close to merge - let's fixup some last details but nothing too big anymore

Comment thread src/transformers/models/deepseek_ocr2/modular_deepseek_ocr2.py
Comment thread src/transformers/models/deepseek_ocr2/modular_deepseek_ocr2.py
"encoder_config": DeepseekOcr2EncoderConfig,
}

sam_config: dict | PreTrainedConfig | None = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No imo it's fine to keep as is then. It's not perfect but at the same time this model is not compatible with normal vision encoders other than sam so it's fine to be explicit in this case

Comment thread src/transformers/models/deepseek_ocr2/modular_deepseek_ocr2.py
super().__init__(config)
self.proj = DeepseekOcr2SamVisionProj(config)

def interpolate_pos_encoding(self, pos_embed: torch.Tensor, target_size: int, dtype: torch.dtype) -> torch.Tensor:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this maybe copied/adapted from some other model, if yes let's add a comment to reference that

Comment thread src/transformers/models/deepseek_ocr2/modular_deepseek_ocr2.py
Comment thread src/transformers/models/deepseek_ocr2/modular_deepseek_ocr2.py
Comment thread utils/check_config_attributes.py
Comment thread tests/models/deepseek_ocr2/test_modeling_deepseek_ocr2.py
)
generate_ids = model.generate(**inputs, do_sample=False, max_new_tokens=20)
decoded = self.processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
self.assertTrue(decoded.startswith("R&D QUALITY IMPROVEMENT"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo should be full text - can we use Expectations() class and you fill in your cuda capability - I will adjust an entry for our CI

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same below

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants