Skip to content

36978 | Fast image processor for DPT model#37481

Merged
yonigozlan merged 43 commits intohuggingface:mainfrom
samrae7:36978/dpt-fast-image
Jun 18, 2025
Merged

36978 | Fast image processor for DPT model#37481
yonigozlan merged 43 commits intohuggingface:mainfrom
samrae7:36978/dpt-fast-image

Conversation

@samrae7
Copy link
Copy Markdown
Contributor

@samrae7 samrae7 commented Apr 14, 2025

What does this PR do?

Add Fast Image Processor for DPT model

Fixes #36978

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@github-actions github-actions Bot marked this pull request as draft April 14, 2025 07:03
@github-actions
Copy link
Copy Markdown
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@samrae7 samrae7 changed the title 36978 | Fast image processor for DPT model [WIP] 36978 | Fast image processor for DPT model Apr 14, 2025
Comment thread src/transformers/models/dpt/image_processing_dpt_fast.py Outdated
@Rocketknight1
Copy link
Copy Markdown
Member

cc @yonigozlan

Comment thread src/transformers/models/dpt/image_processing_dpt_fast.py Outdated
@samrae7 samrae7 force-pushed the 36978/dpt-fast-image branch from ac2ba5a to 9931944 Compare April 15, 2025 07:50
Comment thread src/transformers/models/dpt/image_processing_dpt_fast.py Outdated
Comment thread src/transformers/models/dpt/image_processing_dpt_fast.py Outdated
@samrae7 samrae7 force-pushed the 36978/dpt-fast-image branch from 72f08d5 to 7c559ea Compare April 17, 2025 21:15
@samrae7 samrae7 changed the title [WIP] 36978 | Fast image processor for DPT model 36978 | Fast image processor for DPT model Apr 17, 2025
@samrae7 samrae7 marked this pull request as ready for review April 17, 2025 22:17
@samrae7
Copy link
Copy Markdown
Contributor Author

samrae7 commented Apr 19, 2025

@yonigozlan I've looked at the test failures in the pipeline and they seem unrelated to my change. The branch was updated from upstream main just before the tests ran. Any advice on what to do?

Uploading Screenshot 2025-04-19 at 22.13.48.png…

@samrae7
Copy link
Copy Markdown
Contributor Author

samrae7 commented Apr 30, 2025

Update: I was on holiday for a week and can now pick this up again. There were apparently-unrelated tests failing in the pipeline. I will investigate Pipeline is passing now and this is ready for review

@samrae7 samrae7 force-pushed the 36978/dpt-fast-image branch from ff665c6 to c746e3b Compare May 2, 2025 06:59
Copy link
Copy Markdown
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @samrae7 ! Thanks for iterating :). The Beit fast image processor was recently merged, and it seems quite similar to this one, could you have a look to try and uniformize this PR a bit based on it? Especially the handling of segmentation maps, as I'm not a fan of dragging them as ImageInput inside the _preprocess function

Comment on lines +377 to +393
if size.shortest_edge and size.longest_edge:
# Resize the image so that the shortest edge or the longest edge is of the given size
# while maintaining the aspect ratio of the original image.
new_size = get_size_with_aspect_ratio(
image.size()[-2:],
size.shortest_edge,
size.longest_edge,
)
elif size.shortest_edge:
new_size = get_resize_output_image_size(
image,
size=size.shortest_edge,
default_to_square=False,
input_data_format=ChannelDimension.FIRST,
)
elif size.max_height and size.max_width:
new_size = get_image_size_for_max_height_width(image.size()[-2:], size.max_height, size.max_width)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might not need that, as the slow processor enforce the use of height and width in the size dict

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this

Comment on lines +290 to +304
isBatched = False
if isinstance(segmentation_maps, list):
isBatched = True
if not is_pil_image(segmentation_maps[0]) and segmentation_maps[0].ndim == 2:
segmentation_maps = [map.unsqueeze(0) for map in segmentation_maps]
elif not is_pil_image(segmentation_maps) and segmentation_maps.ndim == 2:
segmentation_maps = segmentation_maps.unsqueeze(0)

segmentation_maps = self._prepare_input_images(segmentation_maps)

if isBatched:
segmentation_maps = [map.squeeze(0) for map in segmentation_maps]
else:
segmentation_maps = segmentation_maps[0]
return segmentation_maps
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you have a look at how this is handled in the newly merged BeitImageProcessorFast?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'll do that, thanks 👍

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this suggestion. I've refactored based on the approach used in Beit Fast Processor

@samrae7 samrae7 force-pushed the 36978/dpt-fast-image branch from 7bb9a27 to f854e58 Compare May 10, 2025 16:42
Copy link
Copy Markdown
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! I'd prefer having this be closer to Beit fast image processor, as the logic is almost the same



@auto_docstring
class DPTImageProcessorFast(BaseImageProcessorFast, SemanticSegmentationMixin):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SemanticSegmentationMixin should have been deprecated already, sorry about that. Could you copy the post_process_semantic_segmentation here instead?

Suggested change
class DPTImageProcessorFast(BaseImageProcessorFast, SemanticSegmentationMixin):
class DPTImageProcessorFast(BaseImageProcessorFast):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. thanks

processed_segmentation_maps = processed_segmentation_maps.to(torch.int64)
return processed_segmentation_maps

@auto_docstring
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@auto_docstring

Comment on lines +171 to +218
segmentation_maps = make_list_of_images(images=segmentation_maps, expected_ndims=2)
segmentation_maps = self._preprocess_segmentation_maps(
segmentation_maps=segmentation_maps,
return_tensors=return_tensors,
**kwargs,
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of weird to have here, I'd prefer if we handle segmentation_maps by overriding preprocess instead, like for Beit

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will make this and the other changes asap. thanks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +127 to +131
ensure_multiple_of: Optional[int]
size_divisor: Optional[int]
do_pad: Optional[bool]
keep_aspect_ratio: Optional[bool]
segmentation_maps: Optional[ImageInput] = (None,)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are missing the do_reduce_labels logic

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added now thanks for spotting that!

@samrae7 samrae7 force-pushed the 36978/dpt-fast-image branch from af8e5e1 to 1d83e60 Compare May 16, 2025 07:19
@samrae7 samrae7 force-pushed the 36978/dpt-fast-image branch from 471e8a6 to 7eefd5d Compare June 16, 2025 06:31
@samrae7
Copy link
Copy Markdown
Contributor Author

samrae7 commented Jun 16, 2025

@yonigozlan this is ready for re-review

Copy link
Copy Markdown
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @samrae7 ! Thanks a lot for working on this, looks great!
I made some final changes, mainly to use modular as a large part of the processing code was the same as Beit. Waiting for the CI then LGTM!

@yonigozlan yonigozlan enabled auto-merge (squash) June 18, 2025 17:32
@yonigozlan yonigozlan merged commit b922b22 into huggingface:main Jun 18, 2025
20 checks passed
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Contributions Welcome] Add Fast Image Processors

4 participants