[Fast Processor] BEiT#37005
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
yonigozlan
left a comment
There was a problem hiding this comment.
Hi @ariG23498 ! Thanks a lot for taking this on!
There seems to be a lot of processing functions in the slow image processors that are not standard and should be adapted here, notably the handling of segmentation maps.
As for the precision issues, it is expected that the processed image outputs are slightly different from the slow image processor outputs, but usually not more than 1e-04 in mean. Can you check what the average diff is in your case?
However if the model is sensible to small variations, it could be "normal" that your second test on the logits doesn't pass.
|
@yonigozlan doing this Results in |
yonigozlan
left a comment
There was a problem hiding this comment.
Nice work! Definitely not a standard processor with the processing of segmentation_maps, but I think the way you handled it is great.
Last things left to address is testing the fast image processor on all existing tests.
It would also be great to override test_slow_fast_equivalence and test_slow_fast_equivalence_batched from ImageProcessingTestMixin to also test segmentation_maps/labels
|
@yonigozlan could I get a quick review on the tests to let me know if I am proceeding in the right direction? There are some tests breaking, but an overview of the way I am implementing it would be nice to have 🤗 |
|
@ariG23498 Looks great to me! Also looks like the failing tests are unrelated from what I see. Are the beit specific image processing tests passing for you locally? |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
I have three failing tests locally I might need a little help here 😅 |
yonigozlan
left a comment
There was a problem hiding this comment.
This should help with your errors. I'm not sure what's happening with the torch compile issue though. If it's still there with these fixes, could you give me the full error message?
|
|
||
| # Prepare segmentation maps | ||
| if segmentation_maps is not None: | ||
| segmentation_maps = self._prepare_input_images( |
There was a problem hiding this comment.
This is not really adapted to masks with dim 1 channels or no channels. I'll try to change this in a future PR, in the meantime you can use make_list_of_images from image_utils with expected_ndims=1.
There was a problem hiding this comment.
Ah! Thanks for the tip. Will do that.
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
|
Hi @yonigozlan, apologies for the late commits, with the current changes I have three failing tests all due to PIL segmentation maps issue due to Is there a better way to handle this? |
yonigozlan
left a comment
There was a problem hiding this comment.
Hey @ariG23498 ! Thanks a lot for iterating on this. I had missed the fact that the segmentation_maps weren't being converted to tensors, so solving this should solve the test errors you're getting
| if segmentation_maps is not None: | ||
| segmentation_maps = make_list_of_images(images=segmentation_maps, expected_ndims=2) |
There was a problem hiding this comment.
Yes my bad sorry, we do need to convert the segmentation maps to tensors before doing this, and also we should handle the added_dimension logic as in the slow processor
There was a problem hiding this comment.
Do you think the best way would be to use self._prepare_input_images instead of make_list_of_images?
| segmentation_maps=segmentation_maps, | ||
| **kwargs, | ||
| ) | ||
| data["labels"] = segmentation_maps |
There was a problem hiding this comment.
Here we will need to squeeze or not the channel dimension depending on added_dimension
|
@yonigozlan all the tests pass on my end! |
|
@yonigozlan I was able to make the style tests pass, but the current CI issues seem irrelevant to the changes of the PR. Should I make a blank commit to check if they are flaky test failures? |
yonigozlan
left a comment
There was a problem hiding this comment.
Thanks @ariG23498 ! Almost ready to go :), just suggested some improvements when converting the segmentation_maps
| from transformers import BeitImageProcessor, BeitImageProcessorFast | ||
|
|
||
| im_pro = BeitImageProcessor(size={"height":20, "width":20}) | ||
| im_pro_fast = BeitImageProcessorFast(size={"height":20, "width": 20}) | ||
|
|
||
| print(im_pro) | ||
| print(im_pro_fast) No newline at end of file |
| if input_data_format is None: | ||
| input_data_format = infer_channel_dimension_format(segmentation_map, num_channels=1) |
There was a problem hiding this comment.
Don't think this is useful as we force kwargs["input_data_format"] = ChannelDimension.FIRST in any case
| segmentation_map = to_numpy_array(segmentation_map) | ||
| # Add an axis to the segmentation maps for transformations. | ||
| if segmentation_map.ndim == 2: | ||
| segmentation_map = segmentation_map[None, ...] | ||
| added_dimension = True | ||
| input_data_format = ChannelDimension.FIRST | ||
| else: | ||
| added_dimension = False | ||
| if input_data_format is None: | ||
| input_data_format = infer_channel_dimension_format(segmentation_map, num_channels=1) | ||
|
|
||
| processed_segmentation_maps.append(torch.tensor(segmentation_map)) |
There was a problem hiding this comment.
It would be great to directly convert to tensor here, instead of having numpy arrays in intermediate steps. Something like _process_image in BaseImageProcessorFast, but with added logic to account for added_dimension
yonigozlan
left a comment
There was a problem hiding this comment.
Very nice thanks for iterating! One tiny things to fix in the docs and we can merge!
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
|
@yonigozlan done! |
|
Looks like in case import transformers
import torch
processor = transformers.BeitImageProcessorFast(
do_resize=False,
do_center_crop=False,
do_rescale=False,
do_normalize=False,
do_convert_rgb=False,
do_reduce_labels=True,
)
image = torch.zeros([1, 3, 256, 256])
segmap = torch.zeros([1, 256, 256])
batch = processor.preprocess(
images = image,
segmentation_maps = segmap,
return_tensors="pt",
do_reduce_labels=True,
)pixel_values will be 255, should be 0 UPD: opened #38042 that fixes it |
* adding fast processor for beit * adding resample * address review issues and add segmentation maps logic * style * chore: adding tests * reduce label test * adding batched tests * Update src/transformers/models/beit/image_processing_beit_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * fix imports and make segmentation masks * fix tests * build segmentation maps * all tests pass * style * style fix * style * chore: delete demo.py file * review suggestions * Update docs/source/en/model_doc/beit.md Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Linked: #36978
Adding fast processor for BEiT.
Here is how I tested for image classification:
While the
inputsandinputs_fastpass the assertion with1e-2the logits do not pass. Can I get some advice on the next steps of debugging?CC: @yonigozlan