36978 | Fast image processor for DPT model#37481
36978 | Fast image processor for DPT model#37481yonigozlan merged 43 commits intohuggingface:mainfrom
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
28f274b to
df9c831
Compare
|
cc @yonigozlan |
ac2ba5a to
9931944
Compare
72f08d5 to
7c559ea
Compare
|
@yonigozlan I've looked at the test failures in the pipeline and they seem unrelated to my change. The branch was updated from upstream main just before the tests ran. Any advice on what to do? |
|
|
ff665c6 to
c746e3b
Compare
yonigozlan
left a comment
There was a problem hiding this comment.
Hey @samrae7 ! Thanks for iterating :). The Beit fast image processor was recently merged, and it seems quite similar to this one, could you have a look to try and uniformize this PR a bit based on it? Especially the handling of segmentation maps, as I'm not a fan of dragging them as ImageInput inside the _preprocess function
| if size.shortest_edge and size.longest_edge: | ||
| # Resize the image so that the shortest edge or the longest edge is of the given size | ||
| # while maintaining the aspect ratio of the original image. | ||
| new_size = get_size_with_aspect_ratio( | ||
| image.size()[-2:], | ||
| size.shortest_edge, | ||
| size.longest_edge, | ||
| ) | ||
| elif size.shortest_edge: | ||
| new_size = get_resize_output_image_size( | ||
| image, | ||
| size=size.shortest_edge, | ||
| default_to_square=False, | ||
| input_data_format=ChannelDimension.FIRST, | ||
| ) | ||
| elif size.max_height and size.max_width: | ||
| new_size = get_image_size_for_max_height_width(image.size()[-2:], size.max_height, size.max_width) |
There was a problem hiding this comment.
I think we might not need that, as the slow processor enforce the use of height and width in the size dict
| isBatched = False | ||
| if isinstance(segmentation_maps, list): | ||
| isBatched = True | ||
| if not is_pil_image(segmentation_maps[0]) and segmentation_maps[0].ndim == 2: | ||
| segmentation_maps = [map.unsqueeze(0) for map in segmentation_maps] | ||
| elif not is_pil_image(segmentation_maps) and segmentation_maps.ndim == 2: | ||
| segmentation_maps = segmentation_maps.unsqueeze(0) | ||
|
|
||
| segmentation_maps = self._prepare_input_images(segmentation_maps) | ||
|
|
||
| if isBatched: | ||
| segmentation_maps = [map.squeeze(0) for map in segmentation_maps] | ||
| else: | ||
| segmentation_maps = segmentation_maps[0] | ||
| return segmentation_maps |
There was a problem hiding this comment.
Could you have a look at how this is handled in the newly merged BeitImageProcessorFast?
There was a problem hiding this comment.
Sure I'll do that, thanks 👍
There was a problem hiding this comment.
thanks for this suggestion. I've refactored based on the approach used in Beit Fast Processor
7bb9a27 to
f854e58
Compare
yonigozlan
left a comment
There was a problem hiding this comment.
Thanks for iterating! I'd prefer having this be closer to Beit fast image processor, as the logic is almost the same
|
|
||
|
|
||
| @auto_docstring | ||
| class DPTImageProcessorFast(BaseImageProcessorFast, SemanticSegmentationMixin): |
There was a problem hiding this comment.
The SemanticSegmentationMixin should have been deprecated already, sorry about that. Could you copy the post_process_semantic_segmentation here instead?
| class DPTImageProcessorFast(BaseImageProcessorFast, SemanticSegmentationMixin): | |
| class DPTImageProcessorFast(BaseImageProcessorFast): |
| processed_segmentation_maps = processed_segmentation_maps.to(torch.int64) | ||
| return processed_segmentation_maps | ||
|
|
||
| @auto_docstring |
There was a problem hiding this comment.
| @auto_docstring |
| segmentation_maps = make_list_of_images(images=segmentation_maps, expected_ndims=2) | ||
| segmentation_maps = self._preprocess_segmentation_maps( | ||
| segmentation_maps=segmentation_maps, | ||
| return_tensors=return_tensors, | ||
| **kwargs, | ||
| ) |
There was a problem hiding this comment.
This is kind of weird to have here, I'd prefer if we handle segmentation_maps by overriding preprocess instead, like for Beit
There was a problem hiding this comment.
ok will make this and the other changes asap. thanks
| ensure_multiple_of: Optional[int] | ||
| size_divisor: Optional[int] | ||
| do_pad: Optional[bool] | ||
| keep_aspect_ratio: Optional[bool] | ||
| segmentation_maps: Optional[ImageInput] = (None,) |
There was a problem hiding this comment.
You are missing the do_reduce_labels logic
There was a problem hiding this comment.
added now thanks for spotting that!
af8e5e1 to
1d83e60
Compare
- copied from Beit Fast processor
- refactor preprocess logic to make it consistent with other processors - add missing reduce labels logic
471e8a6 to
7eefd5d
Compare
|
@yonigozlan this is ready for re-review |
yonigozlan
left a comment
There was a problem hiding this comment.
Hey @samrae7 ! Thanks a lot for working on this, looks great!
I made some final changes, mainly to use modular as a large part of the processing code was the same as Beit. Waiting for the CI then LGTM!
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
What does this PR do?
Add Fast Image Processor for DPT model
Fixes #36978
Before submitting
Pull Request section?
to it if that's the case. Link to issue
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.