36978 | Fast image processor for DPT model by samrae7 · Pull Request #37481 · huggingface/transformers

samrae7 · 2025-04-14T07:03:06Z

What does this PR do?

Add Fast Image Processor for DPT model

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. Link to issue
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

github-actions · 2025-04-14T07:03:18Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

Rocketknight1 · 2025-04-14T13:33:16Z

cc @yonigozlan

samrae7 · 2025-04-19T21:17:18Z

@yonigozlan I've looked at the test failures in the pipeline and they seem unrelated to my change. The branch was updated from upstream main just before the tests ran. Any advice on what to do?

samrae7 · 2025-04-30T08:24:16Z

~~Update: I was on holiday for a week and can now pick this up again. There were apparently-unrelated tests failing in the pipeline. I will investigate~~ Pipeline is passing now and this is ready for review

yonigozlan

Hey @samrae7 ! Thanks for iterating :). The Beit fast image processor was recently merged, and it seems quite similar to this one, could you have a look to try and uniformize this PR a bit based on it? Especially the handling of segmentation maps, as I'm not a fan of dragging them as ImageInput inside the _preprocess function

yonigozlan · 2025-05-07T15:45:32Z

+        if size.shortest_edge and size.longest_edge:
+            # Resize the image so that the shortest edge or the longest edge is of the given size
+            # while maintaining the aspect ratio of the original image.
+            new_size = get_size_with_aspect_ratio(
+                image.size()[-2:],
+                size.shortest_edge,
+                size.longest_edge,
+            )
+        elif size.shortest_edge:
+            new_size = get_resize_output_image_size(
+                image,
+                size=size.shortest_edge,
+                default_to_square=False,
+                input_data_format=ChannelDimension.FIRST,
+            )
+        elif size.max_height and size.max_width:
+            new_size = get_image_size_for_max_height_width(image.size()[-2:], size.max_height, size.max_width)


I think we might not need that, as the slow processor enforce the use of height and width in the size dict

removed this

yonigozlan · 2025-05-07T16:25:25Z

+        isBatched = False
+        if isinstance(segmentation_maps, list):
+            isBatched = True
+            if not is_pil_image(segmentation_maps[0]) and segmentation_maps[0].ndim == 2:
+                segmentation_maps = [map.unsqueeze(0) for map in segmentation_maps]
+        elif not is_pil_image(segmentation_maps) and segmentation_maps.ndim == 2:
+            segmentation_maps = segmentation_maps.unsqueeze(0)
+
+        segmentation_maps = self._prepare_input_images(segmentation_maps)
+
+        if isBatched:
+            segmentation_maps = [map.squeeze(0) for map in segmentation_maps]
+        else:
+            segmentation_maps = segmentation_maps[0]
+        return segmentation_maps


Could you have a look at how this is handled in the newly merged BeitImageProcessorFast?

Sure I'll do that, thanks 👍

thanks for this suggestion. I've refactored based on the approach used in Beit Fast Processor

yonigozlan

Thanks for iterating! I'd prefer having this be closer to Beit fast image processor, as the logic is almost the same

yonigozlan · 2025-05-12T19:28:44Z

+
+
+@auto_docstring
+class DPTImageProcessorFast(BaseImageProcessorFast, SemanticSegmentationMixin):


The SemanticSegmentationMixin should have been deprecated already, sorry about that. Could you copy the post_process_semantic_segmentation here instead?

Suggested change

class DPTImageProcessorFast(BaseImageProcessorFast, SemanticSegmentationMixin):

class DPTImageProcessorFast(BaseImageProcessorFast):

done. thanks

yonigozlan · 2025-05-12T19:29:26Z

+        processed_segmentation_maps = processed_segmentation_maps.to(torch.int64)
+        return processed_segmentation_maps
+
+    @auto_docstring


Suggested change

@auto_docstring

yonigozlan · 2025-05-12T19:35:49Z

+            segmentation_maps = make_list_of_images(images=segmentation_maps, expected_ndims=2)
+            segmentation_maps = self._preprocess_segmentation_maps(
+                segmentation_maps=segmentation_maps,
+                return_tensors=return_tensors,
+                **kwargs,
+            )


This is kind of weird to have here, I'd prefer if we handle segmentation_maps by overriding preprocess instead, like for Beit

ok will make this and the other changes asap. thanks

yonigozlan · 2025-05-12T19:36:17Z

+    ensure_multiple_of: Optional[int]
+    size_divisor: Optional[int]
+    do_pad: Optional[bool]
+    keep_aspect_ratio: Optional[bool]
+    segmentation_maps: Optional[ImageInput] = (None,)


You are missing the do_reduce_labels logic

added now thanks for spotting that!

- copied from Beit Fast processor

- refactor preprocess logic to make it consistent with other processors - add missing reduce labels logic

samrae7 · 2025-06-16T07:17:41Z

@yonigozlan this is ready for re-review

yonigozlan

Hey @samrae7 ! Thanks a lot for working on this, looks great!
I made some final changes, mainly to use modular as a large part of the processing code was the same as Beit. Waiting for the CI then LGTM!

HuggingFaceDocBuilderDev · 2025-06-18T17:37:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions Bot marked this pull request as draft April 14, 2025 07:03

samrae7 changed the title ~~36978 | Fast image processor for DPT model~~ [WIP] 36978 | Fast image processor for DPT model Apr 14, 2025

samrae7 commented Apr 14, 2025

View reviewed changes

Comment thread src/transformers/models/dpt/image_processing_dpt_fast.py Outdated

samrae7 force-pushed the 36978/dpt-fast-image branch from 28f274b to df9c831 Compare April 14, 2025 08:08

samrae7 mentioned this pull request Apr 14, 2025

[Contributions Welcome] Add Fast Image Processors #36978

Closed

81 tasks

samrae7 commented Apr 15, 2025

View reviewed changes

Comment thread src/transformers/models/dpt/image_processing_dpt_fast.py Outdated

samrae7 force-pushed the 36978/dpt-fast-image branch from ac2ba5a to 9931944 Compare April 15, 2025 07:50

samrae7 commented Apr 15, 2025

View reviewed changes

Comment thread src/transformers/models/dpt/image_processing_dpt_fast.py Outdated

samrae7 commented Apr 15, 2025

View reviewed changes

Comment thread src/transformers/models/dpt/image_processing_dpt_fast.py Outdated

samrae7 force-pushed the 36978/dpt-fast-image branch from 72f08d5 to 7c559ea Compare April 17, 2025 21:15

samrae7 changed the title ~~[WIP] 36978 | Fast image processor for DPT model~~ 36978 | Fast image processor for DPT model Apr 17, 2025

samrae7 marked this pull request as ready for review April 17, 2025 22:17

samrae7 force-pushed the 36978/dpt-fast-image branch from ff665c6 to c746e3b Compare May 2, 2025 06:59

yonigozlan reviewed May 7, 2025

View reviewed changes

samrae7 force-pushed the 36978/dpt-fast-image branch from 7bb9a27 to f854e58 Compare May 10, 2025 16:42

yonigozlan reviewed May 12, 2025

View reviewed changes

samrae7 force-pushed the 36978/dpt-fast-image branch from af8e5e1 to 1d83e60 Compare May 16, 2025 07:19

Samuel Rae added 9 commits June 16, 2025 07:31

chore: ran codegen script

266f67f

test: test_image_processor_properties

1576330

test: test_image_processor_from_dict_with_kwargs

27c7f48

test: wip - test_padding

119fe99

test: test_padding

10e7ff1

test: test_keep_aspect_ratio

0aa8bf1

wip

00524f2

test

3e64863

test: wip

db12729

Samuel Rae added 20 commits June 16, 2025 07:31

refactor: no need to infer channel dimesnion

3dd63d2

refactor: encapsulate logic for preparing segmentation maps

6865865

refactor: improve readability of segmentation_map preparation

ffafb2e

improvement: batched version of pad_image

a900857

chore: fixup

a48d27d

docs

ac6d2a0

chore: make quality

ffa3671

chore: remove unecessary comment

b440ddc

fix: add SemanticSegmentationMixin

6c61734

feat: add post_process_depth_estimation to fast dpt image processor

2816d66

chore: fix formatting

17ba1e1

remove max_height, max_width

392434b

fix: better way of processin segmentation maps

7809f8a

- copied from Beit Fast processor

chore: formatting + remove TODO

83dcf79

chore: fixup styles

2c24df8

chore: remove unecessary line break

88ef42f

chore: core review suggestion to remove autodocstring

1ba5232

fix: add do_reduce_labels logic + refactor

62cf235

- refactor preprocess logic to make it consistent with other processors - add missing reduce labels logic

refactor: remove deprecated mixin

f08e06d

chore: fixup

7eefd5d

samrae7 force-pushed the 36978/dpt-fast-image branch from 471e8a6 to 7eefd5d Compare June 16, 2025 06:31

Merge branch 'main' into 36978/dpt-fast-image

400e565

yonigozlan added 3 commits June 18, 2025 16:55

use modular for dpt + final nit changes

ee9c39f

Merge remote-tracking branch 'upstream/main' into 36978/dpt-fast-image

7f467fa

fix style

8a2af61

yonigozlan approved these changes Jun 18, 2025

View reviewed changes

yonigozlan enabled auto-merge (squash) June 18, 2025 17:32

yonigozlan merged commit b922b22 into huggingface:main Jun 18, 2025
20 checks passed



		@auto_docstring
		class DPTImageProcessorFast(BaseImageProcessorFast, SemanticSegmentationMixin):

Conversation

samrae7 commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions Bot commented Apr 14, 2025

Uh oh!

Uh oh!

Rocketknight1 commented Apr 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

samrae7 commented Apr 19, 2025

Uh oh!

samrae7 commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samrae7 commented Jun 16, 2025

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

samrae7 commented Apr 14, 2025 •

edited

Loading

samrae7 commented Apr 30, 2025 •

edited

Loading