Skip to content

[Model] Add PP-OCRv5_server_rec Model Support#43795

Closed
liu-jiaxuan wants to merge 10 commits intohuggingface:mainfrom
liu-jiaxuan:feat/pp_ocrv5_server_rec
Closed

[Model] Add PP-OCRv5_server_rec Model Support#43795
liu-jiaxuan wants to merge 10 commits intohuggingface:mainfrom
liu-jiaxuan:feat/pp_ocrv5_server_rec

Conversation

@liu-jiaxuan
Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, pp_ocrv5_server_det, pp_ocrv5_server_rec

@zhang-prog
Copy link
Copy Markdown
Contributor

@vasqu
PTAL.
btw:
The relationship between mobile_rec and server_rec is similar to that of server_det and mobile_det.
Perhaps we can merge this PR first. Once it’s merged, I will submit the PR for mobile_rec.

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43795&sha=72497f

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving my initial comments, I think this model is more standard at its core so we can support a bit more features 🤗

I nudged where I could but sometimes I jumped in between because I noticed a few details later so let me know if some things are unclear

image_std = IMAGENET_STANDARD_STD
size = {"height": 48, "width": 320}
pad_size = {"height": 48, "width": 320}
do_resize = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticing it just now but slipped through in the previous model additions: We should just have do_convert_rgb = True - then we don't need the workarounds in the other models and avoid it explicitly in the examples

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking about lines like

            # BGR to RGB conversion
            stacked_images = stacked_images[:, [2, 1, 0], :, :]

Comment on lines +182 to +184
if width > max_width:
max_width = width
max_height = height
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if width > max_width:
max_width = width
max_height = height
if width > max_width:
max_width = width
if height > max_height:
max_height = height

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the width and height of the widest image. The maximum height doesn’t matter.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a small comment here then, looks a bit weird from the outside


return BatchFeature(data={"pixel_values": processed_images}, tensor_type=return_tensors)

def get_target_size(self, images):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some small docstring here and typing

Comment on lines +201 to +204
def post_process_text_recognition(
self,
outputs,
) -> tuple[list[str], list[float]]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here let's add docstring + typing

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need tests for the image processor

preds_prob, preds_idx = logits.max(dim=-1)
results = []
for idx in range(batch_size):
selection = torch.ones(len(preds_idx[idx]), dtype=torch.bool, device=preds_idx.device)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requires torch backend needed (see the other models, did the same there)

class PPOCRV5ServerRecModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestCase):
all_model_classes = (PPOCRV5ServerRecForTextRecognition,) if is_torch_available() else ()
pipeline_model_mapping = (
{"image-feature-extraction": PPOCRV5ServerRecForTextRecognition} if is_torch_available() else {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a pipeline example then in the docs?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m looking for a text recognition pipeline. What is that task called in transformers?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thought because you added "image-feature-extraction" to the pipeline mapping. I guess we don't have one at the moment tbh

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @yonigozlan @zucchini-nlp if you know of any maybe

Comment on lines +150 to +180
@unittest.skip(reason="PPOCRV5ServerRec does not use inputs_embeds")
def test_inputs_embeds(self):
pass

@unittest.skip(reason="PPOCRV5ServerRec does not use inputs_embeds")
def test_inputs_embeds_matches_input_ids(self):
pass

@unittest.skip(reason="PPOCRV5ServerRec does not support input and output embeddings")
def test_model_get_set_embeddings(self):
pass

@unittest.skip(reason="PPOCRV5ServerRec does not support input and output embeddings")
def test_model_common_attributes(self):
pass

@unittest.skip(reason="Feed forward chunking is not implemented")
def test_feed_forward_chunking(self):
pass

@unittest.skip(reason="PPOCRV5ServerRec does not support attention")
def test_retain_grad_hidden_states_attentions(self):
pass

@unittest.skip(reason="PPOCRV5ServerRec does not support attention")
def test_attention_outputs(self):
pass

@unittest.skip(reason="PPOCRV5ServerRec does not support train")
def test_problem_types(self):
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check if we really need all these skips?

@zhang-prog
Copy link
Copy Markdown
Contributor

I fully refactored server_rec and found it’s reusable for mobile_rec, so I’ve combined them both into #44808

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 18, 2026

Let's close this PR then?

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 18, 2026

Closing in favor of #44808

@vasqu vasqu closed this Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants