Bridgetower fast image processor#37373
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
|
cc @yonigozlan |
yonigozlan
left a comment
There was a problem hiding this comment.
Thanks for working on this @rootonchair ! Some small issues but overall very nice!
| image_std = image_std if image_std is not None else self.image_std | ||
| do_pad = do_pad if do_pad is not None else self.do_pad | ||
| do_center_crop if do_center_crop is not None else self.do_center_crop | ||
| do_center_crop = do_center_crop if do_center_crop is not None else self.do_center_crop |
| if not is_batched(images): | ||
| images = [images] |
There was a problem hiding this comment.
This can be removed if we use make_flat_list_of_images
| def get_max_height_width(images: List["torch.Tensor"]) -> List[int]: | ||
| """ | ||
| Get the maximum height and width across all images in a batch. | ||
| """ | ||
| _, max_height, max_width = max_across_indices([img.shape for img in images]) | ||
| return (max_height, max_width) |
There was a problem hiding this comment.
This can be imported from image_processing_utils_fast
| processed_masks = torch.stack(processed_masks, dim=0) if return_tensors else processed_masks | ||
| data["pixel_mask"] = processed_masks | ||
|
|
||
| return BatchFeature(data=data, tensor_type=return_tensors) |
There was a problem hiding this comment.
better to build the BatchFeature outside this function (in _preprocess). let's just return processed_images and processed_masks here
| def to_dict(self): | ||
| encoder_dict = super().to_dict() | ||
| encoder_dict.pop("_valid_processor_keys", None) | ||
| encoder_dict.pop("crop_size", None) | ||
| return encoder_dict |
There was a problem hiding this comment.
I don't think this is needed?
There was a problem hiding this comment.
In slow processor, they don't use crop_size for center cropping but using size instead. It would be better to have it assign to crop_size instead. But it would be redundant for slow processor and test_save_load_fast_slow will fail
| def get_expected_values(self, image_inputs, batched=False): | ||
| """ | ||
| This function computes the expected height and width when providing images to BridgeTowerImageProcessor, | ||
| assuming do_resize is set to True with a scalar size and size_divisor. | ||
| """ | ||
| if not batched: | ||
| size = self.size["shortest_edge"] | ||
| image = image_inputs[0] | ||
| if isinstance(image, Image.Image): | ||
| w, h = image.size | ||
| elif isinstance(image, np.ndarray): | ||
| h, w = image.shape[0], image.shape[1] | ||
| else: | ||
| h, w = image.shape[1], image.shape[2] | ||
| scale = size / min(w, h) | ||
| if h < w: | ||
| newh, neww = size, scale * w | ||
| else: | ||
| newh, neww = scale * h, size | ||
|
|
||
| max_size = int((1333 / 800) * size) | ||
| if max(newh, neww) > max_size: | ||
| scale = max_size / max(newh, neww) | ||
| newh = newh * scale | ||
| neww = neww * scale | ||
|
|
||
| newh, neww = int(newh + 0.5), int(neww + 0.5) | ||
| expected_height, expected_width = ( | ||
| newh // self.size_divisor * self.size_divisor, | ||
| neww // self.size_divisor * self.size_divisor, | ||
| ) | ||
|
|
||
| else: | ||
| expected_values = [] | ||
| for image in image_inputs: | ||
| expected_height, expected_width = self.get_expected_values([image]) | ||
| expected_values.append((expected_height, expected_width)) | ||
| expected_height = max(expected_values, key=lambda item: item[0])[0] | ||
| expected_width = max(expected_values, key=lambda item: item[1])[1] | ||
|
|
||
| return expected_height, expected_width | ||
| return self.size["shortest_edge"], self.size["shortest_edge"] |
There was a problem hiding this comment.
It is expected that center_crop need to be performed before returning to have all images to 288x288. But the code fails to assign the default value of do_center_crop (do_center_crop is None all the time) so the slow processor would just resize all the images to shortest_edge then return. It makes these old expected size, which mimic the behavior of resize, to have the wrong values, correct expected size would be just 288x288
| @require_vision | ||
| @require_torch | ||
| def test_slow_fast_equivalence(self): | ||
| if not self.test_slow_image_processor or not self.test_fast_image_processor: | ||
| self.skipTest(reason="Skipping slow/fast equivalence test") | ||
|
|
||
| if self.image_processing_class is None or self.fast_image_processing_class is None: | ||
| self.skipTest(reason="Skipping slow/fast equivalence test as one of the image processors is not defined") | ||
|
|
||
| dummy_image = Image.open( | ||
| requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw | ||
| ) | ||
| image_processor_slow = self.image_processing_class(**self.image_processor_dict) | ||
| image_processor_fast = self.fast_image_processing_class(**self.image_processor_dict) | ||
|
|
||
| encoding_slow = image_processor_slow(dummy_image, return_tensors="pt") | ||
| encoding_fast = image_processor_fast(dummy_image, return_tensors="pt") | ||
|
|
||
| self._assertEquivalence(encoding_slow.pixel_values, encoding_fast.pixel_values) | ||
| self._assertEquivalence(encoding_slow.pixel_mask.float(), encoding_fast.pixel_mask.float()) |
There was a problem hiding this comment.
Nice. Would be great to do the same for test_slow_fast_equivalence_batched
|
Sorry hijacking this PR to relax slow_fast_equivalence mean diff as there are some issue with CI |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* add support for fast tokenizer * make style * fix according to reviews * make style * relax slow_fast_equivalence mean diff --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* add support for fast tokenizer * make style * fix according to reviews * make style * relax slow_fast_equivalence mean diff --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
What does this PR do?
Related #36978
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.