[Fast Processor] BEiT by ariG23498 · Pull Request #37005 · huggingface/transformers

ariG23498 · 2025-03-26T15:24:27Z

Adding fast processor for BEiT.

Here is how I tested for image classification:

from transformers import (
    BeitImageProcessor, BeitImageProcessorFast, BeitForImageClassification
)
import torch
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image", trust_remote_code=True)
image = dataset["test"]["image"][0]

image_processor = BeitImageProcessor.from_pretrained("microsoft/beit-base-patch16-224")
image_processor_fast = BeitImageProcessorFast.from_pretrained("microsoft/beit-base-patch16-224")

model = BeitForImageClassification.from_pretrained("microsoft/beit-base-patch16-224")

inputs = image_processor(image, return_tensors="pt")
inputs_fast = image_processor_fast(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

with torch.no_grad():
    logits_fast = model(**inputs_fast).logits


torch.testing.assert_close(
    actual=inputs_fast["pixel_values"],
    expected=inputs["pixel_values"],
    rtol=1e-2,
    atol=1e-2,
)

torch.testing.assert_close(
    actual=logits,
    expected=logits_fast,
    rtol=1e-2,
    atol=1e-2,
)

# model predicts one of the 1000 ImageNet classes
predicted_label = logits.argmax(-1).item()
predicted_label_fast = logits_fast.argmax(-1).item()

print(model.config.id2label[predicted_label])
print(model.config.id2label[predicted_label_fast])

While the inputs and inputs_fast pass the assertion with 1e-2 the logits do not pass. Can I get some advice on the next steps of debugging?

CC: @yonigozlan

github-actions · 2025-03-26T15:24:47Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

OK

yonigozlan

Hi @ariG23498 ! Thanks a lot for taking this on!
There seems to be a lot of processing functions in the slow image processors that are not standard and should be adapted here, notably the handling of segmentation maps.

As for the precision issues, it is expected that the processed image outputs are slightly different from the slow image processor outputs, but usually not more than 1e-04 in mean. Can you check what the average diff is in your case?
However if the model is sensible to small variations, it could be "normal" that your second test on the logits doesn't pass.

ariG23498 · 2025-03-27T13:06:59Z

@yonigozlan doing this

from transformers import (
    BeitImageProcessor, BeitImageProcessorFast
)
import torch
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image", trust_remote_code=True)
image = dataset["test"]["image"][0]

image_processor = BeitImageProcessor.from_pretrained("microsoft/beit-base-patch16-224")
image_processor_fast = BeitImageProcessorFast.from_pretrained("microsoft/beit-base-patch16-224")

inputs = image_processor(image, return_tensors="pt")
inputs_fast = image_processor_fast(image, return_tensors="pt")

print(
    torch.mean(inputs_fast["pixel_values"] - inputs["pixel_values"])
)

Results in

tensor(-9.0261e-05)

OK

yonigozlan

Nice work! Definitely not a standard processor with the processing of segmentation_maps, but I think the way you handled it is great.
Last things left to address is testing the fast image processor on all existing tests.
It would also be great to override test_slow_fast_equivalence and test_slow_fast_equivalence_batched from ImageProcessingTestMixin to also test segmentation_maps/labels

ariG23498 · 2025-03-27T16:30:50Z

@yonigozlan could I get a quick review on the tests to let me know if I am proceeding in the right direction?

There are some tests breaking, but an overview of the way I am implementing it would be nice to have 🤗

yonigozlan · 2025-03-27T17:16:35Z

@ariG23498 Looks great to me! Also looks like the failing tests are unrelated from what I see. Are the beit specific image processing tests passing for you locally?

HuggingFaceDocBuilderDev · 2025-03-27T17:16:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ariG23498 · 2025-03-27T17:29:47Z

I have three failing tests locally

FAILED tests/models/beit/test_image_processing_beit.py::BeitImageProcessingTest::test_call_segmentation_maps - ValueError: Could not make a flat list of images from tensor([[0, 0, 0,  ..., 0, 0, 0],
FAILED tests/models/beit/test_image_processing_beit.py::BeitImageProcessingTest::test_can_compile_fast_image_processor - torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
FAILED tests/models/beit/test_image_processing_beit.py::BeitImageProcessingTest::test_removed_deprecated_kwargs - AssertionError: False != True

I might need a little help here 😅

yonigozlan

This should help with your errors. I'm not sure what's happening with the torch compile issue though. If it's still there with these fixes, could you give me the full error message?

yonigozlan · 2025-03-27T18:29:09Z

+
+        # Prepare segmentation maps
+        if segmentation_maps is not None:
+            segmentation_maps = self._prepare_input_images(


This is not really adapted to masks with dim 1 channels or no channels. I'll try to change this in a future PR, in the meantime you can use make_list_of_images from image_utils with expected_ndims=1.

Ah! Thanks for the tip. Will do that.

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

ariG23498 · 2025-04-08T07:20:16Z

Hi @yonigozlan, apologies for the late commits, with the current changes I have three failing tests all due to PIL segmentation maps issue due to make_list_of_images api.

FAILED tests/models/beit/test_image_processing_beit.py::BeitImageProcessingTest::test_call_segmentation_maps - AttributeError: 'PngImageFile' object has no attribute 'shape'. Did you mean: 'save'?
FAILED tests/models/beit/test_image_processing_beit.py::BeitImageProcessingTest::test_reduce_labels - AttributeError: 'PngImageFile' object has no attribute 'shape'. Did you mean: 'save'?
FAILED tests/models/beit/test_image_processing_beit.py::BeitImageProcessingTest::test_slow_fast_equivalence - AttributeError: 'PngImageFile' object has no attribute 'shape'. Did you mean: 'save'?

Is there a better way to handle this?

yonigozlan

Hey @ariG23498 ! Thanks a lot for iterating on this. I had missed the fact that the segmentation_maps weren't being converted to tensors, so solving this should solve the test errors you're getting

yonigozlan · 2025-04-15T10:44:37Z

+        if segmentation_maps is not None:
+            segmentation_maps = make_list_of_images(images=segmentation_maps, expected_ndims=2)


Yes my bad sorry, we do need to convert the segmentation maps to tensors before doing this, and also we should handle the added_dimension logic as in the slow processor

Do you think the best way would be to use self._prepare_input_images instead of make_list_of_images?

yonigozlan · 2025-04-15T10:45:10Z

+                segmentation_maps=segmentation_maps,
+                **kwargs,
+            )
+            data["labels"] = segmentation_maps


Here we will need to squeeze or not the channel dimension depending on added_dimension

ariG23498 · 2025-04-15T15:39:53Z

@yonigozlan all the tests pass on my end!

ariG23498 · 2025-04-17T10:36:47Z

@yonigozlan I was able to make the style tests pass, but the current CI issues seem irrelevant to the changes of the PR. Should I make a blank commit to check if they are flaky test failures?

OK

yonigozlan

Thanks @ariG23498 ! Almost ready to go :), just suggested some improvements when converting the segmentation_maps

yonigozlan · 2025-04-22T17:31:53Z

+from transformers import BeitImageProcessor, BeitImageProcessorFast
+
+im_pro = BeitImageProcessor(size={"height":20, "width":20})
+im_pro_fast = BeitImageProcessorFast(size={"height":20, "width": 20})
+
+print(im_pro)
+print(im_pro_fast)


file to delete :)

yonigozlan · 2025-04-22T17:35:27Z

+                if input_data_format is None:
+                    input_data_format = infer_channel_dimension_format(segmentation_map, num_channels=1)


Don't think this is useful as we force kwargs["input_data_format"] = ChannelDimension.FIRST in any case

yonigozlan · 2025-04-22T17:37:55Z

+            segmentation_map = to_numpy_array(segmentation_map)
+            # Add an axis to the segmentation maps for transformations.
+            if segmentation_map.ndim == 2:
+                segmentation_map = segmentation_map[None, ...]
+                added_dimension = True
+                input_data_format = ChannelDimension.FIRST
+            else:
+                added_dimension = False
+                if input_data_format is None:
+                    input_data_format = infer_channel_dimension_format(segmentation_map, num_channels=1)
+
+            processed_segmentation_maps.append(torch.tensor(segmentation_map))


It would be great to directly convert to tensor here, instead of having numpy arrays in intermediate steps. Something like _process_image in BaseImageProcessorFast, but with added logic to account for added_dimension

OK

yonigozlan

Very nice thanks for iterating! One tiny things to fix in the docs and we can merge!

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

ariG23498 · 2025-05-06T16:30:01Z

@yonigozlan done!

OK

simonreise · 2025-05-08T17:04:29Z

@yonigozlan @ariG23498

Looks like in case do_reduce_labels=True image processor reduces not only labels, but also the images

import transformers
import torch


processor = transformers.BeitImageProcessorFast(
    do_resize=False, 
    do_center_crop=False, 
    do_rescale=False, 
    do_normalize=False, 
    do_convert_rgb=False,
    do_reduce_labels=True,
)

image = torch.zeros([1, 3, 256, 256])

segmap = torch.zeros([1, 256, 256]) 

batch = processor.preprocess(
    images = image, 
    segmentation_maps = segmap, 
    return_tensors="pt", 
    do_reduce_labels=True,
)

pixel_values will be 255, should be 0

UPD: opened #38042 that fixes it

* adding fast processor for beit * adding resample * address review issues and add segmentation maps logic * style * chore: adding tests * reduce label test * adding batched tests * Update src/transformers/models/beit/image_processing_beit_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * fix imports and make segmentation masks * fix tests * build segmentation maps * all tests pass * style * style fix * style * chore: delete demo.py file * review suggestions * Update docs/source/en/model_doc/beit.md Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

ariG23498 added 2 commits March 26, 2025 16:13

adding fast processor for beit

204d643

adding resample

d266e74

github-actions Bot marked this pull request as draft March 26, 2025 15:24

Merge branch 'main' into aritra/fp-beit

8d7655b

OK

ariG23498 marked this pull request as ready for review March 26, 2025 15:26

github-actions Bot requested review from stevhliu and yonigozlan March 26, 2025 15:27

yonigozlan reviewed Mar 26, 2025

View reviewed changes

yonigozlan mentioned this pull request Mar 26, 2025

[Contributions Welcome] Add Fast Image Processors #36978

Closed

81 tasks

address review issues and add segmentation maps logic

9510e0f

ariG23498 requested a review from yonigozlan March 27, 2025 13:05

ariG23498 added 2 commits March 27, 2025 14:09

Merge branch 'main' into aritra/fp-beit

f893167

OK

style

a09b91c

yonigozlan reviewed Mar 27, 2025

View reviewed changes

Comment thread tests/models/beit/test_image_processing_beit.py Outdated

Comment thread src/transformers/models/beit/image_processing_beit_fast.py Outdated

Comment thread src/transformers/models/beit/image_processing_beit_fast.py

yonigozlan mentioned this pull request Mar 27, 2025

Add Fast Segformer Processor #37024

Merged

chore: adding tests

85ced8f

reduce label test

bb81748

adding batched tests

d8c4e13

yonigozlan reviewed Mar 27, 2025

View reviewed changes

ariG23498 and others added 5 commits April 8, 2025 10:49

Update src/transformers/models/beit/image_processing_beit_fast.py

77e3fcf

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

fix imports and make segmentation masks

4931ad7

Merge branch 'main' into aritra/fp-beit

da3fbfa

fix tests

6877afa

build segmentation maps

ce7e704

yonigozlan reviewed Apr 15, 2025

View reviewed changes

ariG23498 added 2 commits April 15, 2025 15:35

resolve conflicts

647090c

all tests pass

0bd8b2d

ariG23498 added 4 commits April 15, 2025 17:48

style

3383d8c

style fix

49cc6bd

resolve conflicts

35e5c94

style

0a5e38c

Merge branch 'main' into aritra/fp-beit

094308a

OK

yonigozlan reviewed Apr 22, 2025

View reviewed changes

ariG23498 added 3 commits April 29, 2025 04:01

Merge branch 'main' into aritra/fp-beit

72486fb

OK

chore: delete demo.py file

37e3827

review suggestions

4f5f002

yonigozlan approved these changes May 6, 2025

View reviewed changes

Comment thread docs/source/en/model_doc/beit.md Outdated

Update docs/source/en/model_doc/beit.md

bb71922

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

ariG23498 and others added 2 commits May 6, 2025 18:39

Merge branch 'main' into aritra/fp-beit

e5b5cf7

OK

Merge branch 'main' into aritra/fp-beit

f63b71f

yonigozlan merged commit 3c0796a into huggingface:main May 6, 2025
20 checks passed

		if segmentation_maps is not None:
		segmentation_maps = make_list_of_images(images=segmentation_maps, expected_ndims=2)

		if input_data_format is None:
		input_data_format = infer_channel_dimension_format(segmentation_map, num_channels=1)

Conversation

ariG23498 commented Mar 26, 2025

Uh oh!

github-actions Bot commented Mar 26, 2025

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ariG23498 commented Mar 27, 2025

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ariG23498 commented Mar 27, 2025

Uh oh!

yonigozlan commented Mar 27, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 27, 2025

Uh oh!

ariG23498 commented Mar 27, 2025

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ariG23498 commented Apr 8, 2025

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ariG23498 commented Apr 15, 2025

Uh oh!

ariG23498 commented Apr 17, 2025

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ariG23498 commented May 6, 2025

Uh oh!

Uh oh!

simonreise commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

simonreise commented May 8, 2025 •

edited

Loading