Fix MaskFormer/Mask2Former fast image processors by yonigozlan · Pull Request #41393 · huggingface/transformers

yonigozlan · 2025-10-06T21:58:56Z

What does this PR do?

Depends on #41391.
These two fast image processors had issues and were not properly tested:

There was an issue where the processors would crash if do_resize-False
After conversion to binary masks, the grouped masks cannot be stacked anymore, as their channels dimensions are not the same. This fix uses the method introduced in Add MLlama fast image processor #41391 to group the masks according to the shapes of the corresponding images.

This PR fixes the issues and ensure that the integration tests are also ran with the fast image processors

…rocessor

…puts in group by shape

HuggingFaceDocBuilderDev · 2025-10-06T22:07:29Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

molbap

Did an initial review and will take another look when the parent PR is merged!

molbap · 2025-10-08T13:23:19Z


-def _group_images_by_shape(nested_images, is_nested: bool = False):
-    """Helper function to flatten a single level of nested image structures and group by shape."""
+def _group_images_by_shape(nested_images, *paired_inputs, is_nested: bool = False):


I'd prefer to leave variadic args out unless we have no choice!

As this is more of an internal tool not really exposed to users, I think it should be ok

It's more that it's harder to read, even for internal use: at a glance, on this method I don't know what is paired_inputs and how to structure it from the get-go. Just my 2 cents, though, we can merge

paired_inputs is documented in group_images_by_shape, but I can add the docs here as well

…ge-proc

molbap

Thanks! I suggested another modifs on the image transforms, let's get this merged soon 🚀

molbap · 2025-10-20T12:24:56Z

+    paired_inputs_lists = []
    paired_grouped_values = [defaultdict(list) for _ in paired_inputs]
-
-    # Normalize inputs to consistent nested structure
-    normalized_images = [nested_images] if not is_nested else nested_images
-    normalized_paired = []
    for paired_input in paired_inputs:
-        normalized_paired.append([paired_input] if not is_nested else paired_input)
-
-    # Process each image and group by shape
-    for i, (sublist, *paired_sublists) in enumerate(zip(normalized_images, *normalized_paired)):
+        paired_inputs_lists.append([paired_input]) if not is_nested else paired_inputs_lists.append(paired_input)
+    for i, (sublist, *paired_sublists) in enumerate(zip(nested_images, *paired_inputs_lists)):
        for j, (image, *paired_values) in enumerate(zip(sublist, *paired_sublists)):


It's clearer, I think adding a doc about expected shapes/dimensions of tensors here would make the API crystal clear 👌 we can use typing as a safety net here

The idea here is that the paired inputs don't have to be tensors, they can be anything. They just have to be paired 1-1 with the images (follow the same nesting)

…rmer-fast-im-proc

…2former-fast-im-proc

yonigozlan · 2025-11-07T19:17:34Z

Hey @molbap ! This should be ready to merge if you can approve it :)

molbap

Sure, sounds good!

molbap · 2025-11-10T08:46:39Z


-def _group_images_by_shape(nested_images, is_nested: bool = False):
-    """Helper function to flatten a single level of nested image structures and group by shape."""
+def _group_images_by_shape(nested_images, *paired_inputs, is_nested: bool = False):


It's more that it's harder to read, even for internal use: at a glance, on this method I don't know what is paired_inputs and how to structure it from the get-go. Just my 2 cents, though, we can merge

…2former-fast-im-proc

github-actions · 2025-11-10T16:38:02Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: mask2former, maskformer

* Merge conflict * add fast processor * add fast processor * make style * add new convert rgb * use nested group by shape in mllama fast, add support for multiple inputs in group by shape * fix maskformer mask2 former fast im proc and add tests * refactor after review * add _iterate_items utility * Fix failing tests * fix copies and improve docs --------- Co-authored-by: Vincent <phamvinh257@gmail.com>

rootonchair and others added 8 commits April 15, 2025 23:46

Merge conflict

f2e7da2

add fast processor

9c44566

add fast processor

d8c9133

make style

4afa022

add new convert rgb

b37409b

Merge remote-tracking branch 'upstream/main' into mllama_fast_image_p…

8652eb2

…rocessor

use nested group by shape in mllama fast, add support for multiple in…

b65c1d2

…puts in group by shape

fix maskformer mask2 former fast im proc and add tests

1ee6991

yonigozlan requested review from ArthurZucker, Cyrilvallez and molbap October 6, 2025 21:59

molbap reviewed Oct 8, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into add-mllama-fast-ima…

5cfa430

…ge-proc

yonigozlan mentioned this pull request Oct 8, 2025

Add MLlama fast image processor #41391

Merged

refactor after review

f4843e3

molbap reviewed Oct 14, 2025

View reviewed changes

Comment thread src/transformers/models/detr/modeling_detr.py

yonigozlan force-pushed the fix-maskformer-mask2former-fast-im-proc branch from 113b35d to 1ee6991 Compare October 14, 2025 13:33

molbap reviewed Oct 20, 2025

View reviewed changes

yonigozlan added 3 commits November 7, 2025 19:10

add _iterate_items utility

f55026c

Merge branch 'add-mllama-fast-image-proc' into fix-maskformer-mask2fo…

ab473f0

…rmer-fast-im-proc

Merge remote-tracking branch 'upstream/main' into fix-maskformer-mask…

43cc004

…2former-fast-im-proc

Fix failing tests

590dcc3

molbap approved these changes Nov 10, 2025

View reviewed changes

yonigozlan added 2 commits November 10, 2025 16:36

fix copies and improve docs

bb930c5

Merge remote-tracking branch 'upstream/main' into fix-maskformer-mask…

b652213

…2former-fast-im-proc

yonigozlan enabled auto-merge (squash) November 10, 2025 16:37

yonigozlan merged commit 21913b2 into huggingface:main Nov 10, 2025
23 checks passed

Conversation

yonigozlan commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 6, 2025

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

molbap Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

yonigozlan Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

molbap Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

yonigozlan Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

molbap Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

yonigozlan Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yonigozlan commented Nov 7, 2025

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

molbap Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yonigozlan commented Oct 6, 2025 •

edited

Loading