[Model] Add PP-OCRV5_mobile_rec Model Support by liu-jiaxuan · Pull Request #43793 · huggingface/transformers

liu-jiaxuan · 2026-02-06T09:34:48Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

yonigozlan

Hello @liu-jiaxuan! Thanks for opening this PR, however there is quite a bit to change here to fit the standards of the Transformers library.

The biggest issue is that you've written everything from scratch without inheriting from existing models. The modular file should maximize inheritance. Even if this is a novel architecture (especially the Conv modules part, which might not exist elsewhere in the library), components like MLP blocks, attention, and layer norms should use standard library patterns by inheriting form an existing model's module in modular.

The novel modules that can't be inherited through modular should also follow library standards in terms of naming, formatting, structure and good-practices ("PPOCRV5MobileRec" prefix for all module names, weight names standardized with other similar modules in the library, no single letter variables, type hints, docstrings when args are not standards or obvious, never use "eval()" etc.), and the model should support as much transformers features as possible, such as the attention interface through flags in PreTrainedModel( _supports_attention_backend, _supports_sdpa, _supports_flash_attn etc.)

Some other big things wrong or missing:

We shouldn't have a cv2 dependency in image processors, "slow" should use PiL/numpy functions, fast torch/torchvision.
Weight initialization shouldn't be scattered in individual module constructors but centralized in _init_weights() on the PreTrainedModel class, and use the transfromers "init" module.
Attention modules are standardized across models in the transformers library, so using modular for attention modules is a must.

Before we go deeper in reviewing this new model addition (and other Paddle Paddle ones open recently that are very similar), please have a good look at how other models are implemented in the library. Notably, you can have a look at the recently merged PP-DocLayoutV3 PR (here's its modular file.
We also have resources to learn more about how to contribute a new model and how to use modular: Contributing a new model, using modular.

Also as the multiple Paddle Paddle models that have a new model addition PR open currently seem to be quite similar, I'd recommend focusing on one (the simplest) for now, then we'll be able to leverage modular to easily add the other models.

Happy to answer any questions you may have!

github-actions · 2026-02-26T07:30:29Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, pp_ocrv5_mobile_rec

liu-jiaxuan · 2026-02-26T10:43:41Z

Hello @yonigozlan , thank you very much for your detailed review and valuable guidance!
We have revised the three models (pp_ocrv5_mobile_rec, pp_ocrv5_server_rec, and slanext) to address the issues you mentioned.

Specifically, we have implemented the following improvements:

Removed cv2 dependency: We replaced cv2 in image preprocessing and switched to numpy-based image processing. However, since all three models are designed for text or table recognition tasks and are highly sensitive to pixel-level perturbations in images, replacing cv2 operations has caused some impact on model accuracy.
Centralized weight initialization: We moved all weight initialization to the _init_weights() method of the PreTrainedModel class, and removed the initialization code from the constructors of individual modules.
Refactored inheritance and naming: Modified classes that could be modularly inherited to extend existing models, removed standalone implementations for modules that PyTorch supports directly, and added model prefixes (e.g., PPOCRV5MobileRec) to classes that cannot be directly inherited in a modular fashion.
Removed unused functionality from various model modules, such as the DropPath class you pointed out.

We will continue to refine the code according to the transformers library standards and conventions. Please let us know if you have any further comments or suggestions.
Thank you again for your help!

yonigozlan · 2026-02-26T16:36:10Z

Thanks a lot for iterating @liu-jiaxuan ! I'll have a look in the coming days.

We replaced cv2 in image preprocessing and switched to numpy-based image processing. However, since all three models are designed for text or table recognition tasks and are highly sensitive to pixel-level perturbations in images, replacing cv2 operations has caused some impact on model accuracy.

We do support using PIL.resize in slow processor and torchvision resize in fast, maybe you'll get closer results with this when choosing the equivalent interpolation than the one in cv2 than with custom numpy code?

liu-jiaxuan · 2026-02-27T08:40:43Z

Thanks a lot for iterating @liu-jiaxuan ! I'll have a look in the coming days.

We replaced cv2 in image preprocessing and switched to numpy-based image processing. However, since all three models are designed for text or table recognition tasks and are highly sensitive to pixel-level perturbations in images, replacing cv2 operations has caused some impact on model accuracy.

We do support using PIL.resize in slow processor and torchvision resize in fast, maybe you'll get closer results with this when choosing the equivalent interpolation than the one in cv2 than with custom numpy code?

Hi @yonigozlan, thank you very much for your suggestion! Based on our current experiments, using PIL/torchvision for image preprocessing in these three models results in a larger accuracy loss compared to numpy. Therefore, we have chosen the numpy-based approach. We will continue to iterate on the PIL/torchvision-based preprocessing method, and we will update the PR immediately if we achieve a better version.

vasqu · 2026-03-19T10:50:04Z

Closing in favor of #44808

liu-jiaxuan added 2 commits February 6, 2026 09:31

add model pp_ocrv5_mobile_rec

4f9d961

Merge remote-tracking branch 'up/main' into feat/pp_ocrv5_mobile_rec

35a6bda

yonigozlan reviewed Feb 6, 2026

View reviewed changes

Comment thread src/transformers/models/pp_ocrv5_mobile_rec/modular_pp_ocrv5_mobile_rec.py Outdated

Comment thread src/transformers/models/pp_ocrv5_mobile_rec/modular_pp_ocrv5_mobile_rec.py

liu-jiaxuan added 3 commits February 25, 2026 03:39

refine codes

43fe3f6

refine codes

734c64c

Merge remote-tracking branch 'up/main' into feat/pp_ocrv5_mobile_rec

4c94ecd

vasqu closed this Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Add PP-OCRV5_mobile_rec Model Support#43793

[Model] Add PP-OCRV5_mobile_rec Model Support#43793
liu-jiaxuan wants to merge 5 commits intohuggingface:mainfrom
liu-jiaxuan:feat/pp_ocrv5_mobile_rec

liu-jiaxuan commented Feb 6, 2026

Uh oh!

yonigozlan left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Feb 26, 2026

Uh oh!

liu-jiaxuan commented Feb 26, 2026

Uh oh!

yonigozlan commented Feb 26, 2026

Uh oh!

liu-jiaxuan commented Feb 27, 2026

Uh oh!

vasqu commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

liu-jiaxuan commented Feb 6, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Feb 26, 2026

Uh oh!

liu-jiaxuan commented Feb 26, 2026

Uh oh!

yonigozlan commented Feb 26, 2026

Uh oh!

liu-jiaxuan commented Feb 27, 2026

Uh oh!

vasqu commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants