Skip to content

Fix MaskFormer/Mask2Former fast image processors#41393

Merged
yonigozlan merged 16 commits intohuggingface:mainfrom
yonigozlan:fix-maskformer-mask2former-fast-im-proc
Nov 10, 2025
Merged

Fix MaskFormer/Mask2Former fast image processors#41393
yonigozlan merged 16 commits intohuggingface:mainfrom
yonigozlan:fix-maskformer-mask2former-fast-im-proc

Conversation

@yonigozlan
Copy link
Copy Markdown
Member

@yonigozlan yonigozlan commented Oct 6, 2025

What does this PR do?

Depends on #41391.
These two fast image processors had issues and were not properly tested:

  • There was an issue where the processors would crash if do_resize-False
  • After conversion to binary masks, the grouped masks cannot be stacked anymore, as their channels dimensions are not the same. This fix uses the method introduced in Add MLlama fast image processor #41391 to group the masks according to the shapes of the corresponding images.

This PR fixes the issues and ensure that the integration tests are also ran with the fast image processors

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did an initial review and will take another look when the parent PR is merged!


def _group_images_by_shape(nested_images, is_nested: bool = False):
"""Helper function to flatten a single level of nested image structures and group by shape."""
def _group_images_by_shape(nested_images, *paired_inputs, is_nested: bool = False):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to leave variadic args out unless we have no choice!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is more of an internal tool not really exposed to users, I think it should be ok

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more that it's harder to read, even for internal use: at a glance, on this method I don't know what is paired_inputs and how to structure it from the get-go. Just my 2 cents, though, we can merge

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paired_inputs is documented in group_images_by_shape, but I can add the docs here as well

Comment thread src/transformers/models/detr/modeling_detr.py
@yonigozlan yonigozlan force-pushed the fix-maskformer-mask2former-fast-im-proc branch from 113b35d to 1ee6991 Compare October 14, 2025 13:33
Copy link
Copy Markdown
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I suggested another modifs on the image transforms, let's get this merged soon 🚀

Comment thread src/transformers/image_transforms.py Outdated
Comment on lines 811 to 816
paired_inputs_lists = []
paired_grouped_values = [defaultdict(list) for _ in paired_inputs]

# Normalize inputs to consistent nested structure
normalized_images = [nested_images] if not is_nested else nested_images
normalized_paired = []
for paired_input in paired_inputs:
normalized_paired.append([paired_input] if not is_nested else paired_input)

# Process each image and group by shape
for i, (sublist, *paired_sublists) in enumerate(zip(normalized_images, *normalized_paired)):
paired_inputs_lists.append([paired_input]) if not is_nested else paired_inputs_lists.append(paired_input)
for i, (sublist, *paired_sublists) in enumerate(zip(nested_images, *paired_inputs_lists)):
for j, (image, *paired_values) in enumerate(zip(sublist, *paired_sublists)):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's clearer, I think adding a doc about expected shapes/dimensions of tensors here would make the API crystal clear 👌 we can use typing as a safety net here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is that the paired inputs don't have to be tensors, they can be anything. They just have to be paired 1-1 with the images (follow the same nesting)

Comment thread src/transformers/image_transforms.py Outdated
@yonigozlan
Copy link
Copy Markdown
Member Author

Hey @molbap ! This should be ready to merge if you can approve it :)

Copy link
Copy Markdown
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, sounds good!


def _group_images_by_shape(nested_images, is_nested: bool = False):
"""Helper function to flatten a single level of nested image structures and group by shape."""
def _group_images_by_shape(nested_images, *paired_inputs, is_nested: bool = False):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more that it's harder to read, even for internal use: at a glance, on this method I don't know what is paired_inputs and how to structure it from the get-go. Just my 2 cents, though, we can merge

@yonigozlan yonigozlan enabled auto-merge (squash) November 10, 2025 16:37
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: mask2former, maskformer

@yonigozlan yonigozlan merged commit 21913b2 into huggingface:main Nov 10, 2025
23 checks passed
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* Merge conflict

* add fast processor

* add fast processor

* make style

* add new convert rgb

* use nested group by shape in mllama fast, add support for multiple inputs in group by shape

* fix maskformer mask2 former fast im proc and add tests

* refactor after review

* add _iterate_items utility

* Fix failing tests

* fix copies and improve docs

---------

Co-authored-by: Vincent <phamvinh257@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants