Bridgetower fast image processor by rootonchair · Pull Request #37373 · huggingface/transformers

rootonchair · 2025-04-08T16:26:53Z

What does this PR do?

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

github-actions · 2025-04-08T16:27:05Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

Rocketknight1 · 2025-04-09T11:05:24Z

cc @yonigozlan

yonigozlan

Thanks for working on this @rootonchair ! Some small issues but overall very nice!

yonigozlan · 2025-04-14T11:26:00Z

        image_std = image_std if image_std is not None else self.image_std
        do_pad = do_pad if do_pad is not None else self.do_pad
-        do_center_crop if do_center_crop is not None else self.do_center_crop
+        do_center_crop = do_center_crop if do_center_crop is not None else self.do_center_crop


Nice catch!

yonigozlan · 2025-04-14T11:27:50Z

        if not is_batched(images):
            images = [images]


This can be removed if we use make_flat_list_of_images

yonigozlan · 2025-04-14T11:30:51Z

+def get_max_height_width(images: List["torch.Tensor"]) -> List[int]:
+    """
+    Get the maximum height and width across all images in a batch.
+    """
+    _, max_height, max_width = max_across_indices([img.shape for img in images])
+    return (max_height, max_width)


This can be imported from image_processing_utils_fast

yonigozlan · 2025-04-14T11:34:12Z

+            processed_masks = torch.stack(processed_masks, dim=0) if return_tensors else processed_masks
+            data["pixel_mask"] = processed_masks
+
+        return BatchFeature(data=data, tensor_type=return_tensors)


better to build the BatchFeature outside this function (in _preprocess). let's just return processed_images and processed_masks here

yonigozlan · 2025-04-14T11:35:25Z

+    def to_dict(self):
+        encoder_dict = super().to_dict()
+        encoder_dict.pop("_valid_processor_keys", None)
+        encoder_dict.pop("crop_size", None)
+        return encoder_dict


I don't think this is needed?

In slow processor, they don't use crop_size for center cropping but using size instead. It would be better to have it assign to crop_size instead. But it would be redundant for slow processor and test_save_load_fast_slow will fail

yonigozlan · 2025-04-14T11:37:07Z

    def get_expected_values(self, image_inputs, batched=False):
-        """
-        This function computes the expected height and width when providing images to BridgeTowerImageProcessor,
-        assuming do_resize is set to True with a scalar size and size_divisor.
-        """
-        if not batched:
-            size = self.size["shortest_edge"]
-            image = image_inputs[0]
-            if isinstance(image, Image.Image):
-                w, h = image.size
-            elif isinstance(image, np.ndarray):
-                h, w = image.shape[0], image.shape[1]
-            else:
-                h, w = image.shape[1], image.shape[2]
-            scale = size / min(w, h)
-            if h < w:
-                newh, neww = size, scale * w
-            else:
-                newh, neww = scale * h, size
-
-            max_size = int((1333 / 800) * size)
-            if max(newh, neww) > max_size:
-                scale = max_size / max(newh, neww)
-                newh = newh * scale
-                neww = neww * scale
-
-            newh, neww = int(newh + 0.5), int(neww + 0.5)
-            expected_height, expected_width = (
-                newh // self.size_divisor * self.size_divisor,
-                neww // self.size_divisor * self.size_divisor,
-            )
-
-        else:
-            expected_values = []
-            for image in image_inputs:
-                expected_height, expected_width = self.get_expected_values([image])
-                expected_values.append((expected_height, expected_width))
-            expected_height = max(expected_values, key=lambda item: item[0])[0]
-            expected_width = max(expected_values, key=lambda item: item[1])[1]
-
-        return expected_height, expected_width
+        return self.size["shortest_edge"], self.size["shortest_edge"]


Why the changes here?

It is expected that center_crop need to be performed before returning to have all images to 288x288. But the code fails to assign the default value of do_center_crop (do_center_crop is None all the time) so the slow processor would just resize all the images to shortest_edge then return. It makes these old expected size, which mimic the behavior of resize, to have the wrong values, correct expected size would be just 288x288

yonigozlan · 2025-04-14T11:37:55Z

+    @require_vision
+    @require_torch
+    def test_slow_fast_equivalence(self):
+        if not self.test_slow_image_processor or not self.test_fast_image_processor:
+            self.skipTest(reason="Skipping slow/fast equivalence test")
+
+        if self.image_processing_class is None or self.fast_image_processing_class is None:
+            self.skipTest(reason="Skipping slow/fast equivalence test as one of the image processors is not defined")
+
+        dummy_image = Image.open(
+            requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw
+        )
+        image_processor_slow = self.image_processing_class(**self.image_processor_dict)
+        image_processor_fast = self.fast_image_processing_class(**self.image_processor_dict)
+
+        encoding_slow = image_processor_slow(dummy_image, return_tensors="pt")
+        encoding_fast = image_processor_fast(dummy_image, return_tensors="pt")
+
+        self._assertEquivalence(encoding_slow.pixel_values, encoding_fast.pixel_values)
+        self._assertEquivalence(encoding_slow.pixel_mask.float(), encoding_fast.pixel_mask.float())


Nice. Would be great to do the same for test_slow_fast_equivalence_batched

yonigozlan · 2025-04-16T20:17:51Z

Sorry hijacking this PR to relax slow_fast_equivalence mean diff as there are some issue with CI

HuggingFaceDocBuilderDev · 2025-04-16T20:44:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* add support for fast tokenizer * make style * fix according to reviews * make style * relax slow_fast_equivalence mean diff --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>

rootonchair added 2 commits April 8, 2025 23:19

add support for fast tokenizer

9a545c0

make style

41b35bf

github-actions Bot marked this pull request as draft April 8, 2025 16:27

rootonchair marked this pull request as ready for review April 8, 2025 16:28

github-actions Bot requested review from ydshieh and yonigozlan April 8, 2025 16:28

rootonchair added 2 commits April 10, 2025 14:22

Merge branch 'main' into bridge_tower_fast_image_processor

94a4414

Merge branch 'main' into bridge_tower_fast_image_processor

4b9fd81

yonigozlan reviewed Apr 14, 2025

View reviewed changes

yonigozlan mentioned this pull request Apr 14, 2025

[Contributions Welcome] Add Fast Image Processors #36978

Closed

81 tasks

rootonchair added 2 commits April 15, 2025 23:04

fix according to reviews

2ef32c3

make style

d5f96e5

yonigozlan approved these changes Apr 16, 2025

View reviewed changes

yonigozlan and others added 3 commits April 16, 2025 22:01

Merge branch 'main' into bridge_tower_fast_image_processor

4829e9d

relax slow_fast_equivalence mean diff

a55a614

Merge branch 'main' into bridge_tower_fast_image_processor

a81ec4c

yonigozlan merged commit 0a83588 into huggingface:main Apr 16, 2025
20 checks passed

Conversation

rootonchair commented Apr 8, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions Bot commented Apr 8, 2025

Uh oh!

Rocketknight1 commented Apr 9, 2025

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rootonchair Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yonigozlan commented Apr 16, 2025

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rootonchair Apr 15, 2025 •

edited

Loading