Fix MaskFormer/Mask2Former fast image processors#41393
Fix MaskFormer/Mask2Former fast image processors#41393yonigozlan merged 16 commits intohuggingface:mainfrom
Conversation
…puts in group by shape
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
molbap
left a comment
There was a problem hiding this comment.
Did an initial review and will take another look when the parent PR is merged!
|
|
||
| def _group_images_by_shape(nested_images, is_nested: bool = False): | ||
| """Helper function to flatten a single level of nested image structures and group by shape.""" | ||
| def _group_images_by_shape(nested_images, *paired_inputs, is_nested: bool = False): |
There was a problem hiding this comment.
I'd prefer to leave variadic args out unless we have no choice!
There was a problem hiding this comment.
As this is more of an internal tool not really exposed to users, I think it should be ok
There was a problem hiding this comment.
It's more that it's harder to read, even for internal use: at a glance, on this method I don't know what is paired_inputs and how to structure it from the get-go. Just my 2 cents, though, we can merge
There was a problem hiding this comment.
paired_inputs is documented in group_images_by_shape, but I can add the docs here as well
113b35d to
1ee6991
Compare
molbap
left a comment
There was a problem hiding this comment.
Thanks! I suggested another modifs on the image transforms, let's get this merged soon 🚀
| paired_inputs_lists = [] | ||
| paired_grouped_values = [defaultdict(list) for _ in paired_inputs] | ||
|
|
||
| # Normalize inputs to consistent nested structure | ||
| normalized_images = [nested_images] if not is_nested else nested_images | ||
| normalized_paired = [] | ||
| for paired_input in paired_inputs: | ||
| normalized_paired.append([paired_input] if not is_nested else paired_input) | ||
|
|
||
| # Process each image and group by shape | ||
| for i, (sublist, *paired_sublists) in enumerate(zip(normalized_images, *normalized_paired)): | ||
| paired_inputs_lists.append([paired_input]) if not is_nested else paired_inputs_lists.append(paired_input) | ||
| for i, (sublist, *paired_sublists) in enumerate(zip(nested_images, *paired_inputs_lists)): | ||
| for j, (image, *paired_values) in enumerate(zip(sublist, *paired_sublists)): |
There was a problem hiding this comment.
It's clearer, I think adding a doc about expected shapes/dimensions of tensors here would make the API crystal clear 👌 we can use typing as a safety net here
There was a problem hiding this comment.
The idea here is that the paired inputs don't have to be tensors, they can be anything. They just have to be paired 1-1 with the images (follow the same nesting)
|
Hey @molbap ! This should be ready to merge if you can approve it :) |
|
|
||
| def _group_images_by_shape(nested_images, is_nested: bool = False): | ||
| """Helper function to flatten a single level of nested image structures and group by shape.""" | ||
| def _group_images_by_shape(nested_images, *paired_inputs, is_nested: bool = False): |
There was a problem hiding this comment.
It's more that it's harder to read, even for internal use: at a glance, on this method I don't know what is paired_inputs and how to structure it from the get-go. Just my 2 cents, though, we can merge
|
[For maintainers] Suggested jobs to run (before merge) run-slow: mask2former, maskformer |
* Merge conflict * add fast processor * add fast processor * make style * add new convert rgb * use nested group by shape in mllama fast, add support for multiple inputs in group by shape * fix maskformer mask2 former fast im proc and add tests * refactor after review * add _iterate_items utility * Fix failing tests * fix copies and improve docs --------- Co-authored-by: Vincent <phamvinh257@gmail.com>
What does this PR do?
Depends on #41391.
These two fast image processors had issues and were not properly tested:
do_resize-FalseThis PR fixes the issues and ensure that the integration tests are also ran with the fast image processors