[Model] Add PP-OCRv5_server_rec Model Support#43795
[Model] Add PP-OCRv5_server_rec Model Support#43795liu-jiaxuan wants to merge 10 commits intohuggingface:mainfrom
Conversation
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, pp_ocrv5_server_det, pp_ocrv5_server_rec |
|
@vasqu |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43795&sha=72497f |
vasqu
left a comment
There was a problem hiding this comment.
Leaving my initial comments, I think this model is more standard at its core so we can support a bit more features 🤗
I nudged where I could but sometimes I jumped in between because I noticed a few details later so let me know if some things are unclear
| image_std = IMAGENET_STANDARD_STD | ||
| size = {"height": 48, "width": 320} | ||
| pad_size = {"height": 48, "width": 320} | ||
| do_resize = True |
There was a problem hiding this comment.
Noticing it just now but slipped through in the previous model additions: We should just have do_convert_rgb = True - then we don't need the workarounds in the other models and avoid it explicitly in the examples
There was a problem hiding this comment.
Talking about lines like
# BGR to RGB conversion
stacked_images = stacked_images[:, [2, 1, 0], :, :]| if width > max_width: | ||
| max_width = width | ||
| max_height = height |
There was a problem hiding this comment.
| if width > max_width: | |
| max_width = width | |
| max_height = height | |
| if width > max_width: | |
| max_width = width | |
| if height > max_height: | |
| max_height = height |
There was a problem hiding this comment.
We need the width and height of the widest image. The maximum height doesn’t matter.
There was a problem hiding this comment.
Can we add a small comment here then, looks a bit weird from the outside
|
|
||
| return BatchFeature(data={"pixel_values": processed_images}, tensor_type=return_tensors) | ||
|
|
||
| def get_target_size(self, images): |
There was a problem hiding this comment.
Let's add some small docstring here and typing
| def post_process_text_recognition( | ||
| self, | ||
| outputs, | ||
| ) -> tuple[list[str], list[float]]: |
There was a problem hiding this comment.
Same here let's add docstring + typing
There was a problem hiding this comment.
We need tests for the image processor
| preds_prob, preds_idx = logits.max(dim=-1) | ||
| results = [] | ||
| for idx in range(batch_size): | ||
| selection = torch.ones(len(preds_idx[idx]), dtype=torch.bool, device=preds_idx.device) |
There was a problem hiding this comment.
requires torch backend needed (see the other models, did the same there)
| class PPOCRV5ServerRecModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestCase): | ||
| all_model_classes = (PPOCRV5ServerRecForTextRecognition,) if is_torch_available() else () | ||
| pipeline_model_mapping = ( | ||
| {"image-feature-extraction": PPOCRV5ServerRecForTextRecognition} if is_torch_available() else {} |
There was a problem hiding this comment.
We could add a pipeline example then in the docs?
There was a problem hiding this comment.
I’m looking for a text recognition pipeline. What is that task called in transformers?
There was a problem hiding this comment.
Just thought because you added "image-feature-extraction" to the pipeline mapping. I guess we don't have one at the moment tbh
There was a problem hiding this comment.
cc @yonigozlan @zucchini-nlp if you know of any maybe
| @unittest.skip(reason="PPOCRV5ServerRec does not use inputs_embeds") | ||
| def test_inputs_embeds(self): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="PPOCRV5ServerRec does not use inputs_embeds") | ||
| def test_inputs_embeds_matches_input_ids(self): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="PPOCRV5ServerRec does not support input and output embeddings") | ||
| def test_model_get_set_embeddings(self): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="PPOCRV5ServerRec does not support input and output embeddings") | ||
| def test_model_common_attributes(self): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="Feed forward chunking is not implemented") | ||
| def test_feed_forward_chunking(self): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="PPOCRV5ServerRec does not support attention") | ||
| def test_retain_grad_hidden_states_attentions(self): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="PPOCRV5ServerRec does not support attention") | ||
| def test_attention_outputs(self): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="PPOCRV5ServerRec does not support train") | ||
| def test_problem_types(self): | ||
| pass |
There was a problem hiding this comment.
Can you double check if we really need all these skips?
|
I fully refactored server_rec and found it’s reusable for mobile_rec, so I’ve combined them both into #44808 |
|
Let's close this PR then? |
|
Closing in favor of #44808 |
What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.