Skip to content

[Fast Processor] BEiT#37005

Merged
yonigozlan merged 27 commits intohuggingface:mainfrom
ariG23498:aritra/fp-beit
May 6, 2025
Merged

[Fast Processor] BEiT#37005
yonigozlan merged 27 commits intohuggingface:mainfrom
ariG23498:aritra/fp-beit

Conversation

@ariG23498
Copy link
Copy Markdown
Contributor

Linked: #36978

Adding fast processor for BEiT.

Here is how I tested for image classification:

from transformers import (
    BeitImageProcessor, BeitImageProcessorFast, BeitForImageClassification
)
import torch
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image", trust_remote_code=True)
image = dataset["test"]["image"][0]

image_processor = BeitImageProcessor.from_pretrained("microsoft/beit-base-patch16-224")
image_processor_fast = BeitImageProcessorFast.from_pretrained("microsoft/beit-base-patch16-224")

model = BeitForImageClassification.from_pretrained("microsoft/beit-base-patch16-224")

inputs = image_processor(image, return_tensors="pt")
inputs_fast = image_processor_fast(image, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

with torch.no_grad():
    logits_fast = model(**inputs_fast).logits


torch.testing.assert_close(
    actual=inputs_fast["pixel_values"],
    expected=inputs["pixel_values"],
    rtol=1e-2,
    atol=1e-2,
)

torch.testing.assert_close(
    actual=logits,
    expected=logits_fast,
    rtol=1e-2,
    atol=1e-2,
)

# model predicts one of the 1000 ImageNet classes
predicted_label = logits.argmax(-1).item()
predicted_label_fast = logits_fast.argmax(-1).item()

print(model.config.id2label[predicted_label])
print(model.config.id2label[predicted_label_fast])

While the inputs and inputs_fast pass the assertion with 1e-2 the logits do not pass. Can I get some advice on the next steps of debugging?

CC: @yonigozlan

@github-actions github-actions Bot marked this pull request as draft March 26, 2025 15:24
@github-actions
Copy link
Copy Markdown
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@ariG23498 ariG23498 marked this pull request as ready for review March 26, 2025 15:26
@github-actions github-actions Bot requested review from stevhliu and yonigozlan March 26, 2025 15:27
Copy link
Copy Markdown
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ariG23498 ! Thanks a lot for taking this on!
There seems to be a lot of processing functions in the slow image processors that are not standard and should be adapted here, notably the handling of segmentation maps.

As for the precision issues, it is expected that the processed image outputs are slightly different from the slow image processor outputs, but usually not more than 1e-04 in mean. Can you check what the average diff is in your case?
However if the model is sensible to small variations, it could be "normal" that your second test on the logits doesn't pass.

Comment thread tests/models/beit/test_image_processing_beit.py
Comment thread src/transformers/models/beit/image_processing_beit_fast.py Outdated
Comment thread src/transformers/models/beit/image_processing_beit_fast.py Outdated
Comment thread src/transformers/models/beit/image_processing_beit_fast.py
@ariG23498 ariG23498 requested a review from yonigozlan March 27, 2025 13:05
@ariG23498
Copy link
Copy Markdown
Contributor Author

@yonigozlan doing this

from transformers import (
    BeitImageProcessor, BeitImageProcessorFast
)
import torch
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image", trust_remote_code=True)
image = dataset["test"]["image"][0]

image_processor = BeitImageProcessor.from_pretrained("microsoft/beit-base-patch16-224")
image_processor_fast = BeitImageProcessorFast.from_pretrained("microsoft/beit-base-patch16-224")

inputs = image_processor(image, return_tensors="pt")
inputs_fast = image_processor_fast(image, return_tensors="pt")

print(
    torch.mean(inputs_fast["pixel_values"] - inputs["pixel_values"])
)

Results in

tensor(-9.0261e-05)

Copy link
Copy Markdown
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Definitely not a standard processor with the processing of segmentation_maps, but I think the way you handled it is great.
Last things left to address is testing the fast image processor on all existing tests.
It would also be great to override test_slow_fast_equivalence and test_slow_fast_equivalence_batched from ImageProcessingTestMixin to also test segmentation_maps/labels

Comment thread tests/models/beit/test_image_processing_beit.py Outdated
Comment thread src/transformers/models/beit/image_processing_beit_fast.py Outdated
Comment thread src/transformers/models/beit/image_processing_beit_fast.py
@ariG23498
Copy link
Copy Markdown
Contributor Author

@yonigozlan could I get a quick review on the tests to let me know if I am proceeding in the right direction?

There are some tests breaking, but an overview of the way I am implementing it would be nice to have 🤗

@yonigozlan
Copy link
Copy Markdown
Member

@ariG23498 Looks great to me! Also looks like the failing tests are unrelated from what I see. Are the beit specific image processing tests passing for you locally?

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ariG23498
Copy link
Copy Markdown
Contributor Author

I have three failing tests locally

FAILED tests/models/beit/test_image_processing_beit.py::BeitImageProcessingTest::test_call_segmentation_maps - ValueError: Could not make a flat list of images from tensor([[0, 0, 0,  ..., 0, 0, 0],
FAILED tests/models/beit/test_image_processing_beit.py::BeitImageProcessingTest::test_can_compile_fast_image_processor - torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
FAILED tests/models/beit/test_image_processing_beit.py::BeitImageProcessingTest::test_removed_deprecated_kwargs - AssertionError: False != True

I might need a little help here 😅

Copy link
Copy Markdown
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should help with your errors. I'm not sure what's happening with the torch compile issue though. If it's still there with these fixes, could you give me the full error message?


# Prepare segmentation maps
if segmentation_maps is not None:
segmentation_maps = self._prepare_input_images(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really adapted to masks with dim 1 channels or no channels. I'll try to change this in a future PR, in the meantime you can use make_list_of_images from image_utils with expected_ndims=1.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Thanks for the tip. Will do that.

Comment thread src/transformers/models/beit/image_processing_beit_fast.py Outdated
@ariG23498
Copy link
Copy Markdown
Contributor Author

Hi @yonigozlan, apologies for the late commits, with the current changes I have three failing tests all due to PIL segmentation maps issue due to make_list_of_images api.

FAILED tests/models/beit/test_image_processing_beit.py::BeitImageProcessingTest::test_call_segmentation_maps - AttributeError: 'PngImageFile' object has no attribute 'shape'. Did you mean: 'save'?
FAILED tests/models/beit/test_image_processing_beit.py::BeitImageProcessingTest::test_reduce_labels - AttributeError: 'PngImageFile' object has no attribute 'shape'. Did you mean: 'save'?
FAILED tests/models/beit/test_image_processing_beit.py::BeitImageProcessingTest::test_slow_fast_equivalence - AttributeError: 'PngImageFile' object has no attribute 'shape'. Did you mean: 'save'?

Is there a better way to handle this?

Copy link
Copy Markdown
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ariG23498 ! Thanks a lot for iterating on this. I had missed the fact that the segmentation_maps weren't being converted to tensors, so solving this should solve the test errors you're getting

Comment on lines +196 to +197
if segmentation_maps is not None:
segmentation_maps = make_list_of_images(images=segmentation_maps, expected_ndims=2)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes my bad sorry, we do need to convert the segmentation maps to tensors before doing this, and also we should handle the added_dimension logic as in the slow processor

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the best way would be to use self._prepare_input_images instead of make_list_of_images?

segmentation_maps=segmentation_maps,
**kwargs,
)
data["labels"] = segmentation_maps
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we will need to squeeze or not the channel dimension depending on added_dimension

@ariG23498
Copy link
Copy Markdown
Contributor Author

@yonigozlan all the tests pass on my end!

@ariG23498
Copy link
Copy Markdown
Contributor Author

@yonigozlan I was able to make the style tests pass, but the current CI issues seem irrelevant to the changes of the PR. Should I make a blank commit to check if they are flaky test failures?

Copy link
Copy Markdown
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ariG23498 ! Almost ready to go :), just suggested some improvements when converting the segmentation_maps

Comment thread demo.py Outdated
Comment on lines +1 to +7
from transformers import BeitImageProcessor, BeitImageProcessorFast

im_pro = BeitImageProcessor(size={"height":20, "width":20})
im_pro_fast = BeitImageProcessorFast(size={"height":20, "width": 20})

print(im_pro)
print(im_pro_fast) No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file to delete :)

Comment on lines +160 to +161
if input_data_format is None:
input_data_format = infer_channel_dimension_format(segmentation_map, num_channels=1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this is useful as we force kwargs["input_data_format"] = ChannelDimension.FIRST in any case

Comment on lines +152 to +163
segmentation_map = to_numpy_array(segmentation_map)
# Add an axis to the segmentation maps for transformations.
if segmentation_map.ndim == 2:
segmentation_map = segmentation_map[None, ...]
added_dimension = True
input_data_format = ChannelDimension.FIRST
else:
added_dimension = False
if input_data_format is None:
input_data_format = infer_channel_dimension_format(segmentation_map, num_channels=1)

processed_segmentation_maps.append(torch.tensor(segmentation_map))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to directly convert to tensor here, instead of having numpy arrays in intermediate steps. Something like _process_image in BaseImageProcessorFast, but with added logic to account for added_dimension

Copy link
Copy Markdown
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice thanks for iterating! One tiny things to fix in the docs and we can merge!

Comment thread docs/source/en/model_doc/beit.md Outdated
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
@ariG23498
Copy link
Copy Markdown
Contributor Author

@yonigozlan done!

@yonigozlan yonigozlan merged commit 3c0796a into huggingface:main May 6, 2025
20 checks passed
@simonreise
Copy link
Copy Markdown
Contributor

simonreise commented May 8, 2025

@yonigozlan @ariG23498

Looks like in case do_reduce_labels=True image processor reduces not only labels, but also the images

import transformers
import torch


processor = transformers.BeitImageProcessorFast(
    do_resize=False, 
    do_center_crop=False, 
    do_rescale=False, 
    do_normalize=False, 
    do_convert_rgb=False,
    do_reduce_labels=True,
)

image = torch.zeros([1, 3, 256, 256])

segmap = torch.zeros([1, 256, 256]) 

batch = processor.preprocess(
    images = image, 
    segmentation_maps = segmap, 
    return_tensors="pt", 
    do_reduce_labels=True,
)

pixel_values will be 255, should be 0

UPD: opened #38042 that fixes it

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* adding fast processor for beit

* adding resample

* address review issues and add segmentation maps logic

* style

* chore: adding tests

* reduce label test

* adding batched tests

* Update src/transformers/models/beit/image_processing_beit_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* fix imports and make segmentation masks

* fix tests

* build segmentation maps

* all tests pass

* style

* style fix

* style

* chore: delete demo.py file

* review suggestions

* Update docs/source/en/model_doc/beit.md

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants