Add Fast Image Processor for mobileViT by MinJu-Ha · Pull Request #37143 · huggingface/transformers

MinJu-Ha · 2025-03-31T14:05:07Z

I added Fast image processor for mobileViT and I noticed a noticeable difference between the outputs after preprocessing.

Here’s the code I used to compare them:

diff = (encoding_slow.pixel_values - encoding_fast.pixel_values).abs()
print(f"\n📊 Difference statistics:")
print(f"  Max difference: {diff.max().item():.10f}")
print(f"  Mean difference: {diff.mean().item():.10f}")
print(f"  Slow min/max: {encoding_slow.pixel_values.min().item():.10f} ~ {encoding_slow.pixel_values.max().item():.10f}")
print(f"  Fast min/max: {encoding_fast.pixel_values.min().item():.10f} ~ {encoding_fast.pixel_values.max().item():.10f}")
print(f"Slow implementation dtype: {encoding_slow.pixel_values.dtype}")
print(f"Fast implementation dtype: {encoding_fast.pixel_values.dtype}")

results:
📊 Difference statistics:
Max difference: 0.3411765397
Mean difference: 0.1117687449
Slow min/max: 0.0313725509 ~ 0.9764705896
Fast min/max: 0.0313725509 ~ 0.9764706492
Slow implementation dtype: torch.float32
Fast implementation dtype: torch.float32


Even though the size configs look the same ({'shortest_edge': 20}), and both use torch.float32, the output difference seems quite significant for a slow/fast equivalence test.

yonigozlan

Hi @MinJu-Ha ! Thanks for contributing this! Look slike you're missing the do_flip_channel_order arg that is present in the slow image processor, which could explain the differences you're getting (We should have ~1e-05 in mean diff). Could you try to implement the necessarry changes? Thanks!

MinJu-Ha · 2025-04-09T11:51:47Z

It really was—thank you so much for the advice! I'm still working on this issue and have finally passed 18 tests. This week, I’ll be focusing on the remaining 2 skipped ones, which are related to batched image processing and CUDA.

…isfy repo checks

MinJu-Ha · 2025-04-17T08:47:27Z

@yonigozlan, waiting for your review!

MinJu-Ha · 2025-05-11T08:11:56Z

@yonigozlan , I'm still waiting for your review!

and just to clarify:
The test failures in tests/models/mistral3/test_processor_mistral3.py seem to be caused by external image download issues (e.g., PIL.UnidentifiedImageError when loading from https://www.ilankelman.org/stopsigns/australia.jpg).
Since my changes don’t touch this file or its logic, I believe these failures are unrelated to my PR.

Let me know if you’d like me to skip the tests or mock the image loading as a workaround!

yonigozlan

Hey @MinJu-Ha , thanks for iterating :) . You are still missing the logic for handling segmentation_maps

yonigozlan · 2025-05-12T18:54:26Z

    ]
 else:
    _import_structure["image_processing_utils_fast"] = ["BaseImageProcessorFast"]
-


revert changes here

yonigozlan · 2025-05-12T19:03:48Z

+    def flip_channel_order(self, image):
+        # Check if we have 3 or more channels
+        if image.shape[0] >= 3:
+            # Flip only the first 3 channels (RGB → BGR)
+            flipped = image.clone()
+            flipped[0:3] = image[[2, 1, 0], ...]
+            return flipped
+        # For grayscale or other formats, return as is
+        return image


This is not used

yonigozlan · 2025-05-12T19:06:17Z

+    def post_process_semantic_segmentation(self, *args, **kwargs):
+        raise NotImplementedError("This method is not implemented for MobileViTImageProcessorFast.")


You need to implement it

yonigozlan · 2025-05-12T19:06:50Z

    def test_image_processor_from_dict_with_kwargs(self):
        image_processor = self.image_processing_class.from_dict(self.image_processor_dict)


This test and the following one need to include both image processors

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

…ast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

MinJu-Ha · 2025-05-29T07:40:50Z

@yonigozlan, Thank you for your kind comments and advice! I’ve completed the changes — could you please review them? Many thanks in advance for your time and help!

yonigozlan

Hi @MinJu-Ha , thanks for iterating, it's looking a lot better! Still left some things to modify and some tests to add, but almost ready yo merge :)

yonigozlan · 2025-06-02T20:29:40Z

+        self,
+        images,
+        do_resize=True,
+        size=None,
+        interpolation=None,
+        do_rescale=True,
+        rescale_factor=None,
+        do_center_crop=True,
+        crop_size=None,
+        do_flip_channel_order=True,
+        do_convert_rgb=False,
+        return_tensors=None,
+        do_normalize=None,
+        image_mean=None,
+        image_std=None,


let's not have default value for args in _preprocess, like in other fast image processors. Also would be nice to type these args. you don't need to have do_normalize, image_mean, image_std here. Just add a **kwargs at the end of the signature

Thank you for your advice! I reflected in commit add **kwargs and remove default values in _preprocess

yonigozlan · 2025-06-02T20:33:21Z

@@ -18,7 +18,7 @@
 from datasets import load_dataset


Let's also have slow_fast equivalence tests for segmentation maps by overriding test_slow_fast_equivalence and test_slow_fast_equivalence_batched in MobileViTImageProcessingTest

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

yonigozlan · 2025-06-18T14:58:42Z

Hello @MinJu-Ha , don't hesitate to ping me if you need help or if the PR is ready for review!

MinJu-Ha · 2025-06-24T06:33:08Z

Hi @yonigozlan ! Sorry for the delay — I've been busy recently due to my U.S. visa process.
Thanks so much for your patience!
I've just finished reflecting your suggestions in the code.
Please have a look when you have time, and let me know if any further updates are needed!

HuggingFaceDocBuilderDev · 2025-06-26T21:09:05Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yonigozlan

Thanks @MinJu-Ha for iterating on this! Had to make some small changes mainly because of recent updates in Transformers, but LGTM! Waiting for the PR to be green then I'll merge

* Add image_processing_mobilevit_fast.py * Fix copies * update _preprocess for channel_flip * Update for batched image processing * Resolve merge conflicts with main * Fix import order and remove trailing whitespace (ruff clean-up) * Fix copy inconsistencies * Add NotImplementedError for post_process_semantic_segmentation to satisfy repo checks * Add auto_docstring * Adjust style * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Update src/transformers/models/mobilevit/image_processing_mobilevit_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Update src/transformers/models/mobilevit/image_processing_mobilevit_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Delete not used function * test: add missing tests for and * Add post_process_semantic_segmentation to mobilevit_fast.py * Add preprocess function to image_processing_mobilebit_fast.py * ruff check for formatting * fix: modify preprocess method to handle BatchFeature correctly * Remove logic for default value assignment Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Remove normalization adn RGB conversion logic not used in slow processor Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Simplify return_tensors logic using one-liner conditional expression Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Remove unused normalization and format parameters Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * add **kwargs and remove default values in _preprocess * add slow_fast equivalence tests for segmentation * style: autoformat code with ruff * Fix slow_fast equivalence test * merge + remove skipped test --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>

Add image_processing_mobilevit_fast.py

440a0fd

MinJu-Ha force-pushed the mobilevit branch from 98b8b71 to 440a0fd Compare March 31, 2025 14:05

MinJu-Ha marked this pull request as ready for review March 31, 2025 14:11

github-actions Bot requested review from ydshieh and yonigozlan March 31, 2025 14:11

yonigozlan reviewed Mar 31, 2025

View reviewed changes

yonigozlan mentioned this pull request Mar 31, 2025

[Contributions Welcome] Add Fast Image Processors #36978

Closed

81 tasks

MinJu-Ha and others added 2 commits April 7, 2025 21:18

Merge branch 'main' into mobilevit

1301b36

Fix copies

4afc594

MinJu-Ha added 7 commits April 9, 2025 23:30

update _preprocess for channel_flip

4c98e6b

Update for batched image processing

8a86bca

Resolve merge conflicts with main

a7d07b0

Resolve merge conflict between mobilevit and main

5dec720

Fix import order and remove trailing whitespace (ruff clean-up)

d5b7cfb

Fix copy inconsistencies

dbc6438

Add NotImplementedError for post_process_semantic_segmentation to sat…

d1f7d41

…isfy repo checks

MinJu-Ha requested a review from yonigozlan April 14, 2025 08:01

MinJu-Ha added 3 commits May 11, 2025 16:21

Resolve conflicts

325d1a6

Add auto_docstring

735ed76

Adjust style

70ce171

yonigozlan reviewed May 12, 2025

View reviewed changes

MinJu-Ha and others added 6 commits May 17, 2025 14:48

Update docs/source/en/model_doc/mobilevit.md

3c88b22

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

Update src/transformers/models/mobilevit/image_processing_mobilevit_f…

6f86ab3

…ast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

Update src/transformers/models/mobilevit/image_processing_mobilevit_f…

05a49d8

…ast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

Delete not used function

ad723bb

test: add missing tests for and

d60118d

Add post_process_semantic_segmentation to mobilevit_fast.py

09614ab

MinJu-Ha added 2 commits May 29, 2025 16:24

fix: modify preprocess method to handle BatchFeature correctly

e39f7fa

Resolve merge conflict in __init__.py

75d2228

yonigozlan reviewed Jun 2, 2025

View reviewed changes

MinJu-Ha and others added 4 commits June 9, 2025 19:44

Remove logic for default value assignment

2e30f25

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

Remove normalization adn RGB conversion logic not used in slow processor

ce2b0fb

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

Simplify return_tensors logic using one-liner conditional expression

d6f6c0b

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

Remove unused normalization and format parameters

2526b16

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

yonigozlan mentioned this pull request Jun 18, 2025

Add MobileViT fast image processor #38859

Open

MinJu-Ha added 2 commits June 22, 2025 19:02

add **kwargs and remove default values in _preprocess

c0295bf

add slow_fast equivalence tests for segmentation

342d220

MinJu-Ha and others added 4 commits June 24, 2025 15:37

style: autoformat code with ruff

319efe4

Fix slow_fast equivalence test

2d7f2c0

Merge remote-tracking branch 'upstream/main' into mobilevit

46b1c05

merge + remove skipped test

e6dfaca

Merge branch 'main' into mobilevit

b1117f2

yonigozlan approved these changes Jun 26, 2025

View reviewed changes

Merge branch 'main' into mobilevit

d676896

yonigozlan enabled auto-merge (squash) June 27, 2025 14:27

yonigozlan merged commit 49d9fd4 into huggingface:main Jun 27, 2025
20 checks passed

		def post_process_semantic_segmentation(self, args, *kwargs):
		raise NotImplementedError("This method is not implemented for MobileViTImageProcessorFast.")

		def test_image_processor_from_dict_with_kwargs(self):
		image_processor = self.image_processing_class.from_dict(self.image_processor_dict)

Conversation

MinJu-Ha commented Mar 31, 2025

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

MinJu-Ha commented Apr 9, 2025

Uh oh!

MinJu-Ha commented Apr 17, 2025

Uh oh!

MinJu-Ha commented May 11, 2025

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yonigozlan May 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yonigozlan May 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yonigozlan May 12, 2025

Choose a reason for hiding this comment

Uh oh!

yonigozlan May 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MinJu-Ha commented May 29, 2025

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yonigozlan Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

MinJu-Ha Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yonigozlan Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

yonigozlan commented Jun 18, 2025

Uh oh!

MinJu-Ha commented Jun 24, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jun 26, 2025

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants