Add Deepseek-OCR-2 model#45075
Conversation
|
Hey @thisisiron , I have been quite busy with other models lately and this one slipped off my radar. Will be reviewing after Easter next week :) |
|
Summary of updates:
|
|
Thanks a lot, I see slow integration tests are also ✅ , so requesting a review from core maintainers! |
vasqu
left a comment
There was a problem hiding this comment.
Sorry for the delays! So many models concurrently :D the implementation is super nice already, this is more about details and syncing with main
| @unittest.skip("hidden_size is on vision_config.encoder_config, not on vision_config.") | ||
| @parameterized.expand([True, False, None]) | ||
| def test_get_image_features_output(self, return_dict: bool | None): | ||
| pass | ||
|
|
||
| @unittest.skip("rms_norm_eps on vision_config.encoder_config is not reached by set_config_for_less_flaky_test.") | ||
| @parameterized.expand(TEST_EAGER_MATCHES_BATCHED_AND_GROUPED_INFERENCE_PARAMETERIZATION) | ||
| def test_eager_matches_batched_and_grouped_inference(self, name, dtype): | ||
| pass |
There was a problem hiding this comment.
Imo these are important enough that it would be nice to override
There was a problem hiding this comment.
test_get_image_features_output: overridden via _image_features_prepare_config_and_inputs to set vision_config.hidden_size from encoder_config.
test_eager_matches_batched_and_grouped_inference: skip removed
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
…nsformers into add-deepseek_ocr2
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, deepseek_ocr2 |
|
Will check it out tomorrow!! Thanks a lot for fixing things up 🤗 |
vasqu
left a comment
There was a problem hiding this comment.
Don't have much on my side anymore, it's close to merge - let's fixup some last details but nothing too big anymore
| "encoder_config": DeepseekOcr2EncoderConfig, | ||
| } | ||
|
|
||
| sam_config: dict | PreTrainedConfig | None = None |
There was a problem hiding this comment.
No imo it's fine to keep as is then. It's not perfect but at the same time this model is not compatible with normal vision encoders other than sam so it's fine to be explicit in this case
| super().__init__(config) | ||
| self.proj = DeepseekOcr2SamVisionProj(config) | ||
|
|
||
| def interpolate_pos_encoding(self, pos_embed: torch.Tensor, target_size: int, dtype: torch.dtype) -> torch.Tensor: |
There was a problem hiding this comment.
Is this maybe copied/adapted from some other model, if yes let's add a comment to reference that
| ) | ||
| generate_ids = model.generate(**inputs, do_sample=False, max_new_tokens=20) | ||
| decoded = self.processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True) | ||
| self.assertTrue(decoded.startswith("R&D QUALITY IMPROVEMENT")) |
There was a problem hiding this comment.
Imo should be full text - can we use Expectations() class and you fill in your cuda capability - I will adjust an entry for our CI
What does this PR do?
Adds the DeepSeek-OCR2 model.
Reference
Who can review?
@zucchini-nlp @yonigozlan