[Model] Add PP-OCRv5_server_rec Model Support by liu-jiaxuan · Pull Request #43795 · huggingface/transformers

liu-jiaxuan · 2026-02-06T11:22:56Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

github-actions · 2026-03-17T12:56:50Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, pp_ocrv5_server_det, pp_ocrv5_server_rec

zhang-prog · 2026-03-17T13:11:56Z

@vasqu
PTAL.
btw:
The relationship between mobile_rec and server_rec is similar to that of server_det and mobile_det.
Perhaps we can merge this PR first. Once it’s merged, I will submit the PR for mobile_rec.

github-actions · 2026-03-17T13:18:39Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43795&sha=72497f

vasqu

Leaving my initial comments, I think this model is more standard at its core so we can support a bit more features 🤗

I nudged where I could but sometimes I jumped in between because I noticed a few details later so let me know if some things are unclear

vasqu · 2026-03-17T14:11:48Z

+    image_std = IMAGENET_STANDARD_STD
+    size = {"height": 48, "width": 320}
+    pad_size = {"height": 48, "width": 320}
+    do_resize = True


Noticing it just now but slipped through in the previous model additions: We should just have do_convert_rgb = True - then we don't need the workarounds in the other models and avoid it explicitly in the examples

Talking about lines like

# BGR to RGB conversion stacked_images = stacked_images[:, [2, 1, 0], :, :]

vasqu · 2026-03-17T14:15:29Z

+            if width > max_width:
+                max_width = width
+                max_height = height


Suggested change

if width > max_width:

max_width = width

max_height = height

if width > max_width:

max_width = width

if height > max_height:

max_height = height

We need the width and height of the widest image. The maximum height doesn’t matter.

Can we add a small comment here then, looks a bit weird from the outside

vasqu · 2026-03-17T14:15:56Z

+
+        return BatchFeature(data={"pixel_values": processed_images}, tensor_type=return_tensors)
+
+    def get_target_size(self, images):


Let's add some small docstring here and typing

vasqu · 2026-03-17T14:16:19Z

+    def post_process_text_recognition(
+        self,
+        outputs,
+    ) -> tuple[list[str], list[float]]:


Same here let's add docstring + typing

vasqu · 2026-03-17T15:02:11Z

We need tests for the image processor

vasqu · 2026-03-17T15:03:10Z

+        preds_prob, preds_idx = logits.max(dim=-1)
+        results = []
+        for idx in range(batch_size):
+            selection = torch.ones(len(preds_idx[idx]), dtype=torch.bool, device=preds_idx.device)


requires torch backend needed (see the other models, did the same there)

vasqu · 2026-03-17T15:03:46Z

+class PPOCRV5ServerRecModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestCase):
+    all_model_classes = (PPOCRV5ServerRecForTextRecognition,) if is_torch_available() else ()
+    pipeline_model_mapping = (
+        {"image-feature-extraction": PPOCRV5ServerRecForTextRecognition} if is_torch_available() else {}


We could add a pipeline example then in the docs?

I’m looking for a text recognition pipeline. What is that task called in transformers?

Just thought because you added "image-feature-extraction" to the pipeline mapping. I guess we don't have one at the moment tbh

cc @yonigozlan @zucchini-nlp if you know of any maybe

vasqu · 2026-03-17T15:04:48Z

+    @unittest.skip(reason="PPOCRV5ServerRec does not use inputs_embeds")
+    def test_inputs_embeds(self):
+        pass
+
+    @unittest.skip(reason="PPOCRV5ServerRec does not use inputs_embeds")
+    def test_inputs_embeds_matches_input_ids(self):
+        pass
+
+    @unittest.skip(reason="PPOCRV5ServerRec does not support input and output embeddings")
+    def test_model_get_set_embeddings(self):
+        pass
+
+    @unittest.skip(reason="PPOCRV5ServerRec does not support input and output embeddings")
+    def test_model_common_attributes(self):
+        pass
+
+    @unittest.skip(reason="Feed forward chunking is not implemented")
+    def test_feed_forward_chunking(self):
+        pass
+
+    @unittest.skip(reason="PPOCRV5ServerRec does not support attention")
+    def test_retain_grad_hidden_states_attentions(self):
+        pass
+
+    @unittest.skip(reason="PPOCRV5ServerRec does not support attention")
+    def test_attention_outputs(self):
+        pass
+
+    @unittest.skip(reason="PPOCRV5ServerRec does not support train")
+    def test_problem_types(self):
+        pass


Can you double check if we really need all these skips?

zhang-prog · 2026-03-18T06:14:29Z

I fully refactored server_rec and found it’s reusable for mobile_rec, so I’ve combined them both into #44808

vasqu · 2026-03-18T11:41:15Z

Let's close this PR then?

vasqu · 2026-03-18T16:33:34Z

Closing in favor of #44808

liu-jiaxuan added 3 commits February 6, 2026 11:21

add models pp_ocrv5_server_rec

358f5e8

refine codes

7c42bda

Merge remote-tracking branch 'up/main' into feat/pp_ocrv5_server_rec

f2dd049

liu-jiaxuan mentioned this pull request Feb 26, 2026

[Model] Add PP-OCRV5_mobile_rec Model Support #43793

Closed

5 tasks

liu-jiaxuan and others added 6 commits March 11, 2026 12:30

use load_backbone

796bcd6

Merge remote-tracking branch 'up/main' into feat/pp_ocrv5_server_rec

49aee01

Merge remote-tracking branch 'origin/main' into feat/pp_ocrv5_server_rec

6441a0d

refactor model architecture

329caa0

refactor image processor, tests, config

b3a79b2

Merge remote-tracking branch 'origin/main' into feat/pp_ocrv5_server_rec

9e4daa4

use new config

72497f5

vasqu reviewed Mar 17, 2026

View reviewed changes

vasqu closed this Mar 18, 2026


		return BatchFeature(data={"pixel_values": processed_images}, tensor_type=return_tensors)

		def get_target_size(self, images):

Conversation

liu-jiaxuan commented Feb 6, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions Bot commented Mar 17, 2026

Uh oh!

zhang-prog commented Mar 17, 2026

Uh oh!

github-actions Bot commented Mar 17, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhang-prog commented Mar 18, 2026

Uh oh!

vasqu commented Mar 18, 2026

Uh oh!

vasqu commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants