[Model] Add PP-OCRV5_mobile_rec Model Support#43793
[Model] Add PP-OCRV5_mobile_rec Model Support#43793liu-jiaxuan wants to merge 5 commits intohuggingface:mainfrom
Conversation
yonigozlan
left a comment
There was a problem hiding this comment.
Hello @liu-jiaxuan! Thanks for opening this PR, however there is quite a bit to change here to fit the standards of the Transformers library.
The biggest issue is that you've written everything from scratch without inheriting from existing models. The modular file should maximize inheritance. Even if this is a novel architecture (especially the Conv modules part, which might not exist elsewhere in the library), components like MLP blocks, attention, and layer norms should use standard library patterns by inheriting form an existing model's module in modular.
The novel modules that can't be inherited through modular should also follow library standards in terms of naming, formatting, structure and good-practices ("PPOCRV5MobileRec" prefix for all module names, weight names standardized with other similar modules in the library, no single letter variables, type hints, docstrings when args are not standards or obvious, never use "eval()" etc.), and the model should support as much transformers features as possible, such as the attention interface through flags in PreTrainedModel( _supports_attention_backend, _supports_sdpa, _supports_flash_attn etc.)
Some other big things wrong or missing:
- We shouldn't have a cv2 dependency in image processors, "slow" should use PiL/numpy functions, fast torch/torchvision.
- Weight initialization shouldn't be scattered in individual module constructors but centralized in _init_weights() on the PreTrainedModel class, and use the transfromers "init" module.
- Attention modules are standardized across models in the transformers library, so using modular for attention modules is a must.
Before we go deeper in reviewing this new model addition (and other Paddle Paddle ones open recently that are very similar), please have a good look at how other models are implemented in the library. Notably, you can have a look at the recently merged PP-DocLayoutV3 PR (here's its modular file.
We also have resources to learn more about how to contribute a new model and how to use modular: Contributing a new model, using modular.
Also as the multiple Paddle Paddle models that have a new model addition PR open currently seem to be quite similar, I'd recommend focusing on one (the simplest) for now, then we'll be able to leverage modular to easily add the other models.
Happy to answer any questions you may have!
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, pp_ocrv5_mobile_rec |
|
Hello @yonigozlan , thank you very much for your detailed review and valuable guidance! Specifically, we have implemented the following improvements:
We will continue to refine the code according to the transformers library standards and conventions. Please let us know if you have any further comments or suggestions. |
|
Thanks a lot for iterating @liu-jiaxuan ! I'll have a look in the coming days.
We do support using PIL.resize in slow processor and torchvision resize in fast, maybe you'll get closer results with this when choosing the equivalent interpolation than the one in cv2 than with custom numpy code? |
Hi @yonigozlan, thank you very much for your suggestion! Based on our current experiments, using PIL/torchvision for image preprocessing in these three models results in a larger accuracy loss compared to numpy. Therefore, we have chosen the numpy-based approach. We will continue to iterate on the PIL/torchvision-based preprocessing method, and we will update the PR immediately if we achieve a better version. |
|
Closing in favor of #44808 |
What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.