added fast image processor for ZoeDepth and expanded tests accordingly#38515
Conversation
yonigozlan
left a comment
There was a problem hiding this comment.
Hi @henrikm11 ! Thanks for working on this, looks almost ready to go! There's still a few things left to simplify in the fast image processor
| def _resize( | ||
| self, | ||
| images: "torch.Tensor", | ||
| size: SizeDict, | ||
| keep_aspect_ratio: bool = False, | ||
| ensure_multiple_of: int = 1, | ||
| resample: PILImageResampling = PILImageResampling.BILINEAR, | ||
| ) -> "torch.Tensor": | ||
| """ | ||
| Resize an image or batchd images to target size `(size["height"], size["width"])`. If `keep_aspect_ratio` is `True`, the image | ||
| is resized to the largest possible size such that the aspect ratio is preserved. If `ensure_multiple_of` is | ||
| set, the image is resized to a size that is a multiple of this value. | ||
|
|
||
| Args: | ||
| images (`torch.Tensor`): | ||
| Images to resize. | ||
| size (`Dict[str, int]`): | ||
| Target size of the output image. | ||
| keep_aspect_ratio (`bool`, *optional*, defaults to `False`): | ||
| If `True`, the image is resized to the largest possible size such that the aspect ratio is preserved. | ||
| ensure_multiple_of (`int`, *optional*, defaults to 1): | ||
| The image is resized to a size that is a multiple of this value. | ||
| interpoation (`F.InterpolationMode`, *optional*, defaults to `InterpolationMode.BILINEAR`): | ||
| Defines the resampling filter to use if resizing the image. Otherwise, the image is resized to size | ||
| specified in `size`. | ||
| """ | ||
| if not size.height or not size.width: | ||
| raise ValueError(f"The size dictionary must contain the keys 'height' and 'width'. Got {size}") | ||
| output_size = get_resize_output_image_size( | ||
| images, | ||
| output_size=(size.height, size.width), | ||
| keep_aspect_ratio=keep_aspect_ratio, | ||
| multiple=ensure_multiple_of, | ||
| input_data_format=ChannelDimension.FIRST, | ||
| ) | ||
| height, width = output_size | ||
|
|
||
| resample_to_mode = {PILImageResampling.BILINEAR: "bilinear", PILImageResampling.BICUBIC: "bicubic"} | ||
| mode = resample_to_mode[resample] | ||
|
|
||
| resized_images = torch.nn.functional.interpolate( | ||
| images, (int(height), int(width)), mode=mode, align_corners=True | ||
| ) | ||
|
|
||
| return resized_images |
There was a problem hiding this comment.
Why not use the resize function from BaseImageProcessorFast?
There was a problem hiding this comment.
Isn't align_corners = True an issue here? Afaik that's simply not available in torchvision.transforms.functional / is set to False and thus one gets significantly different results.
There was a problem hiding this comment.
I see, in this case let's override the resize function from BaseImageProcessorFast. Also you shouldn't need to define resample_to_mode, just use the interpolation arg in _preprocess instead of resample, as it is already the converted PILImageResampling to torch InterpolationMode. And still no need to override the whole preprocess function
There was a problem hiding this comment.
Indeed, thanks for pointing that out, not even sure why I overwrote preprocess in the first place now... All other suggested changes make perfect sense to me, sorry for not being as diligent with cleaning and simplifying the code as I should have been. Will make all changes tomorrow.
| def _pad_images( | ||
| self, | ||
| images: "torch.Tensor", | ||
| mode: TorchPaddingMode = TorchPaddingMode.REFLECT, |
There was a problem hiding this comment.
Let's just have
| mode: TorchPaddingMode = TorchPaddingMode.REFLECT, | |
| mode="reflect", |
So that we can remove TorchPaddingMode altogether
| @auto_docstring | ||
| def preprocess( | ||
| self, | ||
| images: ImageInput, | ||
| *args, | ||
| **kwargs: Unpack[ZoeDepthFastImageProcessorKwargs], | ||
| ) -> BatchFeature: | ||
| validate_kwargs(captured_kwargs=kwargs.keys(), valid_processor_keys=self.valid_kwargs.__annotations__.keys()) | ||
| # Set default kwargs from self. This ensures that if a kwarg is not provided | ||
| # by the user, it gets its default value from the instance, or is set to None. | ||
|
|
||
| for kwarg_name in self.valid_kwargs.__annotations__: | ||
| kwargs.setdefault(kwarg_name, getattr(self, kwarg_name, None)) | ||
|
|
||
| # Extract parameters that are only used for preparing the input images | ||
| do_convert_rgb = kwargs.pop("do_convert_rgb") | ||
| input_data_format = kwargs.pop("input_data_format") | ||
| device = kwargs.pop("device") | ||
|
|
||
| # Prepare input images | ||
| images = self._prepare_input_images( | ||
| images=images, do_convert_rgb=do_convert_rgb, input_data_format=input_data_format, device=device | ||
| ) | ||
| # return list of torch.Tensor in ChannelDimension.FIRST format | ||
| # Update kwargs that need further processing before being validated | ||
| kwargs = self._further_process_kwargs(**kwargs) | ||
|
|
||
| # Validate kwargs | ||
| self._validate_preprocess_kwargs(**kwargs) | ||
|
|
||
| # Pop kwargs that are not needed in _preprocess | ||
| kwargs.pop("default_to_square") | ||
| kwargs.pop("data_format") | ||
| kwargs.pop("do_center_crop") | ||
| kwargs.pop("crop_size") | ||
|
|
||
| return self._preprocess(images, *args, **kwargs) |
There was a problem hiding this comment.
I don't think there's a need to override anything here, especially if we use BaseImageProcessorFast resize. You can just have this to account for ZoeDepthFastImageProcessorKwargs in the signature
| @auto_docstring | |
| def preprocess( | |
| self, | |
| images: ImageInput, | |
| *args, | |
| **kwargs: Unpack[ZoeDepthFastImageProcessorKwargs], | |
| ) -> BatchFeature: | |
| validate_kwargs(captured_kwargs=kwargs.keys(), valid_processor_keys=self.valid_kwargs.__annotations__.keys()) | |
| # Set default kwargs from self. This ensures that if a kwarg is not provided | |
| # by the user, it gets its default value from the instance, or is set to None. | |
| for kwarg_name in self.valid_kwargs.__annotations__: | |
| kwargs.setdefault(kwarg_name, getattr(self, kwarg_name, None)) | |
| # Extract parameters that are only used for preparing the input images | |
| do_convert_rgb = kwargs.pop("do_convert_rgb") | |
| input_data_format = kwargs.pop("input_data_format") | |
| device = kwargs.pop("device") | |
| # Prepare input images | |
| images = self._prepare_input_images( | |
| images=images, do_convert_rgb=do_convert_rgb, input_data_format=input_data_format, device=device | |
| ) | |
| # return list of torch.Tensor in ChannelDimension.FIRST format | |
| # Update kwargs that need further processing before being validated | |
| kwargs = self._further_process_kwargs(**kwargs) | |
| # Validate kwargs | |
| self._validate_preprocess_kwargs(**kwargs) | |
| # Pop kwargs that are not needed in _preprocess | |
| kwargs.pop("default_to_square") | |
| kwargs.pop("data_format") | |
| kwargs.pop("do_center_crop") | |
| kwargs.pop("crop_size") | |
| return self._preprocess(images, *args, **kwargs) | |
| @auto_docstring | |
| def preprocess(self,images: ImageInput, *args, **kwargs: Unpack[ZoeDepthFastImageProcessorKwargs]) -> BatchFeature: | |
| return super().preprocess(images, **kwargs) |
| if do_rescale: | ||
| stacked_images = self.rescale(stacked_images, rescale_factor) | ||
| if do_pad: | ||
| stacked_images = self._pad_images(images=stacked_images, mode=PaddingMode.REFLECT) |
There was a problem hiding this comment.
No need to provide PaddingMode.REFLECT, as there's no way to specify the mode in the slow image processor.
| stacked_images = self._pad_images(images=stacked_images, mode=PaddingMode.REFLECT) | ||
| if do_resize: | ||
| stacked_images = self._resize(stacked_images, size, keep_aspect_ratio, ensure_multiple_of, resample) | ||
| print(stacked_images.dtype) |
There was a problem hiding this comment.
| print(stacked_images.dtype) |
| print(stacked_images.dtype) | ||
| if do_normalize: | ||
| stacked_images = self.normalize(stacked_images, image_mean, image_std) | ||
| print(stacked_images.dtype) |
There was a problem hiding this comment.
| print(stacked_images.dtype) |
| # Group images by size for further processing | ||
| # Needed in case do_resize is False, or resize returns images with different sizes | ||
| grouped_images, grouped_images_index = group_images_by_shape(resized_images) |
There was a problem hiding this comment.
| # Group images by size for further processing | |
| # Needed in case do_resize is False, or resize returns images with different sizes | |
| grouped_images, grouped_images_index = group_images_by_shape(resized_images) |
|
|
||
| return F.pad(images, padding=(pad_width, pad_height), padding_mode=mode) | ||
|
|
||
| @filter_out_non_signature_kwargs() |
There was a problem hiding this comment.
| @filter_out_non_signature_kwargs() |
| modified_dict = self.image_processor_dict | ||
| modified_dict["size"] = 42 | ||
| image_processor = image_processing_class(**modified_dict) | ||
| print(self.image_processor_dict) |
There was a problem hiding this comment.
| print(self.image_processor_dict) |
| self.assertEqual(list(pixel_values.shape), [1, 3, 512, 672]) | ||
| self.assertTrue(pixel_values.shape[2] % multiple == 0) | ||
| self.assertTrue(pixel_values.shape[3] % multiple == 0) | ||
| self.image_processor_tester.do_pad = True |
There was a problem hiding this comment.
| self.image_processor_tester.do_pad = True |
8fe819e to
bde23db
Compare
yonigozlan
left a comment
There was a problem hiding this comment.
Very nice! two very small things left to change, after that LGTM!
| size: SizeDict, | ||
| keep_aspect_ratio: bool = False, | ||
| ensure_multiple_of: int = 1, | ||
| interpolation: "F.InterpolationMode" = InterpolationMode.BILINEAR, |
There was a problem hiding this comment.
I think having InterpolationMode.BILINEAR as a default can cause some issues with the CI when torchvision is not available... Since we have resample = PILImageResampling.BILINEAR, as default class attributes, we shouldn't need to set a default here and can just set it to None. That way we can also remove the from torchvision.transforms import InterpolationMode above
There was a problem hiding this comment.
Removing the default is no problem at all, the import is still required, however, because post_process_depth_estimation uses it.
| If `True`, the image is resized to the largest possible size such that the aspect ratio is preserved. | ||
| ensure_multiple_of (`int`, *optional*, defaults to 1): | ||
| The image is resized to a size that is a multiple of this value. | ||
| interpoation (`F.InterpolationMode`, *optional*, defaults to `InterpolationMode.BILINEAR`): |
There was a problem hiding this comment.
| interpoation (`F.InterpolationMode`, *optional*, defaults to `InterpolationMode.BILINEAR`): | |
| interpolation (`F.InterpolationMode`, *optional*, defaults to `InterpolationMode.BILINEAR`): |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…y, hopefully fixed repo consistency issue too now
64444e5 to
be46058
Compare
|
The two failed test are entirely unrelated to the changes I have made, do you have any suggestions how to deal with this? - Even worse, one fails because of a request timeout, the other one because of a max difference of (I assume random) tensors being 5e-05 and not 1e-05, both of which do not seem to be fully deterministic conditions...maybe you can just rerun the tests @yonigozlan? |
The CI can be a bit flaky sometimes, running the tests again and merging when they pass! |
huggingface#38515) * added fast image processor for ZoeDepth and expanded tests accordingly * added fast image processor for ZoeDepth and expanded tests accordingly, hopefully fixed repo consistency issue too now * final edits for zoedept fast image processor * final minor edit for zoedepth fast imate procesor
potential reviewer: @yonigozlan