Skip to content

[Model] Add PP-OCRV5_mobile_det Model Support #43247

Merged
vasqu merged 55 commits intohuggingface:mainfrom
XingweiDeng:feat/pp_ocr_v5_mobile_det
Mar 13, 2026
Merged

[Model] Add PP-OCRV5_mobile_det Model Support #43247
vasqu merged 55 commits intohuggingface:mainfrom
XingweiDeng:feat/pp_ocr_v5_mobile_det

Conversation

@XingweiDeng
Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copy link
Copy Markdown
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @XingweiDeng, thanks for opening this PR!, As I said to @liu-jiaxuan in the PP-OCRV5_mobile_rec PR, there is quite a bit to change here to fit the standards of the Transformers library.

The biggest issue is that you've written everything from scratch without inheriting from existing models. The modular file should maximize inheritance. Even if this is a novel architecture (especially the Conv modules part, which might not exist elsewhere in the library), components like MLP blocks, attention, and layer norms should use standard library patterns by inheriting form an existing model's module in modular.

The novel modules that can't be inherited through modular should also follow library standards in terms of naming, formatting, structure and good-practices ("PPOCRV5MobileRec" prefix for all module names, weight names standardized with other similar modules in the library, no single letter variables, type hints, docstrings when args are not standards or obvious etc.), and the model should support as much transformers features as possible, such as the attention interface through flags in PreTrainedModel( _supports_attention_backend, _supports_sdpa, _supports_flash_attn etc.)

Some other big things wrong or missing:

  • We shouldn't have a cv2 dependency in image processors, "slow" should use PiL/numpy functions, fast torch/torchvision.
  • Weight initialization shouldn't be scattered in individual module constructors but centralized in _init_weights() on the PreTrainedModel class, and use the transfromers "init" module.
  • Attention modules are standardized across models in the transformers library, so using modular for attention modules is a must.

Before we go deeper in reviewing this new model addition (and other Paddle Paddle ones open recently that are very similar), please have a good look at how other models are implemented in the library. Notably, you can have a look at the recently merged PP-DocLayoutV3 PR (here's its modular file.
We also have resources to learn more about how to contribute a new model and how to use modular: Contributing a new model, using modular.

Also as the multiple Paddle Paddle models that have a new model addition PR open currently seem to be quite similar, I'd recommend focusing on one (the simplest) for now, then we'll be able to leverage modular to easily add the other models.

Happy to answer any questions you may have!

@XingweiDeng
Copy link
Copy Markdown
Contributor Author

Hello @yonigozlan , thank you very much for your detailed review and valuable suggestions!
We have revised the four models (pp_ocrv5_mobile_det, pp_ocrv5_server_det, pp_lcnet, uvdoc) to address the issues you mentioned.
Notably, all four of these models are convolutional neural networks and do not involve attention modules, so we do not implement related attention components.
Regarding the image processing for pp_ocrv5_mobile_det and pp_ocrv5_server_det, we have replaced as many cv2 operations as possible with PIL and NumPy implementations.
However, we were unable to find drop-in replacements that preserve model accuracy for two specific functions: cv2.fillPoly and cv2.findContours, so we have not been able to remove these usages for now.
We will continue to refine the implementation in line with the library’s standards and conventions. Please let us know if you have any further comments or guidance.
Thank you again for your help!

@XingweiDeng
Copy link
Copy Markdown
Contributor Author

Hello @yonigozlan , We have revised the four models (pp_ocrv5_mobile_det, pp_ocrv5_server_det, pp_lcnet, uvdoc) to address the issues you mentioned.

@yonigozlan
Copy link
Copy Markdown
Member

Hi @XingweiDeng ! Thanks a lot for iterating! I'll have a look in the coming days

@XingweiDeng
Copy link
Copy Markdown
Contributor Author

Hi @yonigozlan, We have made further updates based on the previous refinements to make the code more compliant with the official Transformers library standards. Please feel free to share any comments or suggestions you may have — we will promptly make revisions and follow up accordingly.

Copy link
Copy Markdown
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @XingweiDeng ! Thanks a lot for iterating on this, This looks much better!!
Main comment is that you seem to have quite an outdated branch here, so the first thing to do would be to rebase or merge with main. Then update the model to the latest standards as indicated in my review, and once that's done we shouldn't be too far from ready to merge!
Also let's add a test file for the image processor(s)

self.interpolate_mode = interpolate_mode

# ---- Head ----
self.k = k
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a better name for this, also is it used anywhere?

Comment on lines +507 to +536
def process(
logit: np.ndarray,
size: np.ndarray,
threshold: float,
box_thresh: float,
unclip_ratio: float,
min_size: int,
max_candidates: int,
) -> tuple[Union[list[np.ndarray], np.ndarray], list[float]]:
"""
Main post-processing function to convert model predictions into text boxes.

Args:
logit (torch.Tensor): Model output of shape (1, H, W).
size (torch.Tensor): Original image size (height, width).
threshold (float): Threshold for binarizing the prediction map.
box_thresh (float): Score threshold for filtering boxes.
unclip_ratio (float): Expansion ratio for unclipping.
min_size (int): Minimum side length of valid boxes.
max_candidates (int): Maximum number of boxes to extract.

Returns:
tuple:
- boxes (list or np.ndarray): Extracted text boxes.
- scores (list): Corresponding confidence scores.
"""
src_height, src_width = size
mask = logit > threshold
boxes, scores = boxes_from_bitmap(logit, mask, src_width, src_height, box_thresh, unclip_ratio, min_size, max_candidates)
return boxes, scores
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a separate method for this, let's unroll it directly in the post_process method

Comment on lines +539 to +783
outputs,
threshold: float = 0.3,
target_sizes: Optional[Union[list[tuple[int, int]], torch.Tensor]] = None,
box_thresh: float = 0.6,
max_candidates: int = 1000,
min_size: int = 3,
unclip_ratio: float = 1.5,
):
"""
Converts model outputs into detected text boxes.

Args:
preds (torch.Tensor): Model outputs.
target_sizes (TensorType or list[tuple]): Original image sizes.
threshold (float): Binarization threshold.
box_thresh (float): Box score threshold.
max_candidates (int): Maximum number of boxes.
min_size (int): Minimum box size.
unclip_ratio (float): Expansion ratio.

Returns:
list[dict]: List of detection results.
"""

results = []
for logit, size in zip(outputs.logits, target_sizes):
box, score = process(
logit=logit[0, :, :].cpu().detach().numpy(),
size=size.cpu().detach().numpy(),
threshold=threshold,
box_thresh=box_thresh,
unclip_ratio=unclip_ratio,
min_size=min_size,
max_candidates=max_candidates,
)
results.append({"scores": score, "boxes": box})
return results

def get_image_size(
self,
image: np.ndarray,
limit_side_len: int,
limit_type: str,
max_side_limit: int = 4000,
) -> tuple[dict, np.ndarray]:
"""
Computes the target size for resizing an image while preserving aspect ratio.

Args:
image (torch.Tensor): Input image.
limit_side_len (int): Maximum or minimum side length.
limit_type (str): Resizing strategy: "max", "min", or "resize_long".
max_side_limit (int): Maximum allowed side length.

Returns:
tuple:
- SizeDict: Target size.
- torch.Tensor: Original size.
"""
limit_side_len = limit_side_len or self.limit_side_len
limit_type = limit_type or self.limit_type
height, width, _ = image.shape

if limit_type == "max":
if max(height, width) > limit_side_len:
if height > width:
ratio = float(limit_side_len) / height
else:
ratio = float(limit_side_len) / width
else:
ratio = 1.0
elif limit_type == "min":
if min(height, width) < limit_side_len:
if height < width:
ratio = float(limit_side_len) / height
else:
ratio = float(limit_side_len) / width
else:
ratio = 1.0
elif limit_type == "resize_long":
ratio = float(limit_side_len) / max(height, width)
else:
raise Exception("not support limit type, image ")
resize_height = int(height * ratio)
resize_width = int(width * ratio)

if max(resize_height, resize_width) > max_side_limit:
ratio = float(max_side_limit) / max(resize_height, resize_width)
resize_height, resize_width = int(resize_height * ratio), int(resize_width * ratio)

resize_height = max(int(round(resize_height / 32) * 32), 32)
resize_width = max(int(round(resize_width / 32) * 32), 32)

if resize_height == height and resize_width == width:
return {"height": resize_height, "width": resize_width}, np.array([height, width])

if int(resize_width) <= 0 or int(resize_height) <= 0:
return None, (None, None)

return {"height": resize_height, "width": resize_width}, np.array([height, width])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the results obtained with the fast image processors are similar enough, we can completely discard the pil based processor and only keep the fast one

Comment on lines +799 to +801
limit_side_len = 960
limit_type = "max"
max_side_limit = 4000
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add these 3 attributes to the supported kwargs for this model, so that they can be provided at call time and init time.

You can have a look at other image processor fast in the library (like llava-next) to see how that should be done

Comment on lines +843 to +887
def post_process_object_detection(
self,
outputs,
threshold: float = 0.3,
target_sizes: Optional[Union[list[tuple[int, int]], torch.Tensor]] = None,
box_thresh: float = 0.6,
max_candidates: int = 1000,
min_size: int = 3,
unclip_ratio: float = 1.5,
):
"""
Converts model outputs into detected text boxes.

Args:
preds (torch.Tensor): Model outputs.
threshold (float):Binarization threshold.
target_sizes (TensorType or list[tuple]): Original image sizes.
box_thresh (float): Box score threshold.
max_candidates (int): Maximum number of boxes.
min_size (int): Minimum box size.
unclip_ratio (float): Expansion ratio.

Returns:
list[dict]: List of detection results.
"""

results = []
for logit, size in zip(outputs.logits, target_sizes):
box, score = process(
logit=logit[0, :, :].cpu().detach().numpy(),
size=size.cpu().detach().numpy(),
threshold=threshold,
box_thresh=box_thresh,
unclip_ratio=unclip_ratio,
min_size=min_size,
max_candidates=max_candidates,
)

results.append(
{
"boxes": box,
"scores": score,
}
)
return results
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put this method at the bottom of the module

and returns outputs compatible with the Transformers object detection API.
"""

_keys_to_ignore_on_load_missing = ["num_batches_tracked"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this key present in the original checkpoint? otherwise we can remove this

def forward(
self,
pixel_values: torch.FloatTensor,
labels: Optional[list[dict]] = None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove labels if it's not used

Returns:
Union[tuple[torch.FloatTensor], PPOCRV5MobileDetForObjectDetectionOutput]: Detection output containing
segmentation logits, last hidden state, and optional hidden states.
"""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully remove and use auto_docstring

Suggested change
"""

Comment on lines +1713 to +1777
class PPOCRV5MobileDetForObjectDetection(PPOCRV5MobileDetPreTrainedModel):
"""
PPOCRV5 Mobile Det model for object (text) detection tasks. Wraps the core PPOCRV5MobileDetModel
and returns outputs compatible with the Transformers object detection API.
"""

_keys_to_ignore_on_load_missing = ["num_batches_tracked"]

def __init__(self, config: PPOCRV5MobileDetConfig):
"""
Initialize the PPOCRV5MobileDetForObjectDetection with the specified configuration.

Args:
config (PPOCRV5MobileDetConfig): Configuration object containing all model hyperparameters.
"""
super().__init__(config)
self.model = PPOCRV5MobileDetModel(config)
self.post_init()

def forward(
self,
pixel_values: torch.FloatTensor,
labels: Optional[list[dict]] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
**kwargs,
) -> Union[tuple[torch.FloatTensor], PPOCRV5MobileDetForObjectDetectionOutput]:
"""
Forward pass of the PPOCRV5MobileDetForObjectDetection model, processing input images to generate
text detection logits.

Args:
pixel_values (torch.FloatTensor): Input image tensor of shape (B, 3, H, W) (preprocessed pixel values).
labels (list[dict], optional): Unused placeholder for training (object detection labels). Defaults to None.
output_hidden_states (bool, optional): Whether to return all intermediate hidden states from the backbone.
If None, uses the configuration's `output_hidden_states` value.
return_dict (bool, optional): Whether to return a `PPOCRV5MobileDetForObjectDetectionOutput` object or a tuple.
If None, uses the configuration's `use_return_dict` value.
**kwargs: Additional unused keyword arguments for compatibility.

Returns:
Union[tuple[torch.FloatTensor], PPOCRV5MobileDetForObjectDetectionOutput]: Detection output containing
segmentation logits, last hidden state, and optional hidden states.
"""
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
)
return_dict = return_dict if return_dict is not None else self.config.use_return_dict

outputs = self.model(pixel_values, output_hidden_states=output_hidden_states, return_dict=return_dict)

if not return_dict:
output = (outputs[0],)
if output_hidden_states:
output += (outputs[1], outputs[2])
else:
output += (outputs[1],)

return output

return PPOCRV5MobileDetForObjectDetectionOutput(
logits=outputs.logits,
last_hidden_state=outputs.last_hidden_state,
hidden_states=outputs.hidden_states if output_hidden_states else None,
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like PPOCRV5MobileDetForObjectDetection PPOCRV5MobileDetModel actually return the same thing. Would it make sense to remove the head from PPOCRV5MobileDetModel and have it only in PPOCRV5MobileDetForObjectDetection ? Could PPOCRV5MobileDetModel be of any use then? (For custom heads for example)

Comment thread src/transformers/models/pp_ocrv5_mobile_det/__init__.py
Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! There are a few smaller things but nothing major imo. Let's wrap it up tomorrow then 🤗

Comment thread docs/source/en/model_doc/pp_lcnet.md Outdated
image_processor = AutoImageProcessor.from_pretrained(model_path).to(model.device)

image = Image.open(requests.get("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/img_rot180_demo.jpg", stream=True).raw)
inputs = image_processor(images=[image, image], return_tensors="pt")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inputs = image_processor(images=[image, image], return_tensors="pt")
inputs = image_processor(images=[image, image], return_tensors="pt").to(model.device)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, seeing that the image processor is moved to the device here. Let's sync the examples at least to either do that or the other version

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

from transformers import AutoImageProcessor, AutoModelForObjectDetection

model_path="PaddlePaddle/PP-OCRv5_mobile_det_safetensors"
model = AutoModelForObjectDetection.from_pretrained(model_path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add devic_map here as well (with either version with device movement for the processor input)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

from transformers import AutoImageProcessor, AutoModelForObjectDetection

model_path="PaddlePaddle/PP-OCRv5_mobile_det_safetensors"
model = AutoModelForObjectDetection.from_pretrained(model_path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +430 to +431
@filter_output_hidden_states
@can_return_tuple
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@filter_output_hidden_states
@can_return_tuple
@can_return_tuple
@filter_output_hidden_states

super nitpicky, but let's keep the same order as elsewhere

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

>>> feature_maps = outputs.feature_maps
>>> list(feature_maps[-1].shape)
```"""
kwargs["output_hidden_states"] = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kwargs["output_hidden_states"] = True
kwargs["output_hidden_states"] = True # required to extract layers for the stages

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return fused_feature_map


class PPOCRV5MobileDetHead(nn.Module):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reopenin this one - can we inherit somehow form the server head or similar?

interpolation: Optional["tvF.InterpolationMode"],
**kwargs,
) -> BatchFeature:
requires_backends(self, ["torch"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's on me, I think we can directly move on top of the image processor - there is some decorator @requires(backends=("torch")) should be it. I think we should do the same for the server image processor iirc - missed that in the other PR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better late than never, I’ve just added it for server_det as well.

@@ -0,0 +1,121 @@
# coding = utf-8
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
# Copyright 2026 The HuggingFace Inc. team. All rights reserved.

slipped through, too many files :D

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sry, I will check all files

Comment on lines +153 to +156
@is_flaky()
def test_batching_equivalence(self, atol=5e-2, rtol=5e-2):
super().test_batching_equivalence(atol=atol, rtol=rtol)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh is this needed? Would like to avoid but no problem if not

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@zhang-prog
Copy link
Copy Markdown
Contributor

@vasqu I’ve finished the changes, PTAL. You can aslo run slow-test. 🤗

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed this, still something to fix on our side that this is moved but this was missing

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My last comments now! Quick fixed something else on pp lcnet re modular but should be good now

return fused_feature_map


class PPOCRV5MobileDetHead(nn.Module):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use PPOCRV5ServerDetSegmentationHead to inherit from modular here? Structure seems very similar

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +292 to +293
if getattr(self.config, "output_hidden_states", False): # get output_hidden_states from config
kwargs["output_hidden_states"] = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if getattr(self.config, "output_hidden_states", False): # get output_hidden_states from config
kwargs["output_hidden_states"] = True

not needed, it is forced within the backbone either way

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, test_hidden_states_output would fail. Cause I found that simply setting config.output_hidden_states = Truein def test_hidden_states_output doesn’t propagate to the backbone’s config. Do you have any suggestions for a better way to handle this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out def _set_subconfig_attributes in test_modeling_common - it recursively sets it to all subconfigs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works, thanks.



@auto_docstring
@requires(backends=("torch",))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 🤗

Comment on lines +152 to +154
# @is_flaky()
# def test_batching_equivalence(self, atol=5e-2, rtol=5e-2):
# super().test_batching_equivalence(atol=atol, rtol=rtol)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# @is_flaky()
# def test_batching_equivalence(self, atol=5e-2, rtol=5e-2):
# super().test_batching_equivalence(atol=atol, rtol=rtol)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 13, 2026

run-slow: pp_lcnet, pp_lcnet_v3, pp_ocrv5_mobile_det

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 13, 2026

@zhang-prog running the slow tests meanwhile, you can push changes then after the results

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/pp_lcnet", "models/pp_lcnet_v3", "models/pp_ocrv5_mobile_det"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 3d315365 workflow commit (merge commit)
PR 05f75611 branch commit (from PR)
main 3dd82faf base commit (on main)

Model CI Report

1 new failed tests from this PR 😭

  • pp_lcnet:
    tests/models/pp_lcnet/test_modeling_pp_lcnet.py::PPLCNetModelIntegrationTest::test_inference_image_classification_head (✅ ⟹ ❌)

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, pp_lcnet, pp_lcnet_v3, pp_ocrv5_mobile_det, pp_ocrv5_server_det

@zhang-prog
Copy link
Copy Markdown
Contributor

@vasqu PTAL. please run-slow: pp_lcnet agagin

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 13, 2026

run-slow: pp_lcnet, pp_lcnet_v3, pp_ocrv5_mobile_det

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/pp_lcnet", "models/pp_lcnet_v3", "models/pp_ocrv5_mobile_det"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 50a01fe0 workflow commit (merge commit)
PR 7ac220e9 branch commit (from PR)
main 2548d0db base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 13, 2026

Some weird errors on CI, checking again but would merge if it turns green - don't think it's related to you

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@vasqu vasqu enabled auto-merge March 13, 2026 15:05
@vasqu vasqu added this pull request to the merge queue Mar 13, 2026
@zhang-prog
Copy link
Copy Markdown
Contributor

@vasqu Thank you so much for your thorough reviews. Your feedback has been incredibly professional and efficient, and it’s been a pleasure collaborating with you.

Looking ahead, our team hope merging five models next week (before March 20th):
PP-OCRv5_server_rec, PP-OCRv5_mobile_rec, PP-Chart2Table, UVDoc, and SLANeXt.

These PRs will be created by other members of my team, and I will be involved to ensure the code meets your standards, aiming to make the review process as smooth as possible for you.

I was wondering if you would be available to help us review them as well.

Looking forward to your reply.

Thanks again! 🤗

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Mar 13, 2026
@vasqu vasqu merged commit 036340b into huggingface:main Mar 13, 2026
28 checks passed
@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 13, 2026

@zhang-prog @XingweiDeng Thanks a lot for all these iterations and pulling through, has been a pleasure as well from my side! 🫡 Also congrats, that's a lot of models in a short timespan already :)

I will definitely check around, I also raised some discussion internally so that other members in the team can help out (cc @yonigozlan @zucchini-nlp @ArthurZucker @Cyrilvallez). In any case, you can always ping me on slack on our shared slack channel.

Let us know if there is anything else we can do 🤗

@zhang-prog
Copy link
Copy Markdown
Contributor

@vasqu This is great. We’ll be in touch next week once our PRs are ready for your review.
Hope you have a great weekend!

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 13, 2026

Same, have a great weekend and rest up for next week 🤗

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Mar 14, 2026

Hi @XingweiDeng and @zhang-prog

Thank you for this work 🚀 !

I saw there is

checkpoint="PaddlePaddle/Not_yet_released",

in PPLCNetV3Config

and the test file

transformers/models/pp_lcnet_v3/configuration_pp_lcnet_v3.py

don't have any integration test.

Look forward for them being added once the checkpoint is released 🙏 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants