Add Fast Image Processor for Flava#37135
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
yonigozlan
left a comment
There was a problem hiding this comment.
Hi again @rootonchair , very nice work! Quite an exotic image processor, but very nicely handled. Left a few comments on some things that could be simplified, and on writing a torch FlavaMaskingGenerator. Other than that, LGTM!
| resample = kwargs.pop("codebook_resample") | ||
| kwargs["codebook_interpolation"] = ( | ||
| pil_torch_interpolation_mapping[resample] if isinstance(resample, (PILImageResampling, int)) else resample | ||
| ) |
There was a problem hiding this comment.
It could be better to do that by overriding self._further_process_kwargs, to avoid overriding the whole preprocess function
There was a problem hiding this comment.
Sure. Changed accordingly
| mask_group_min_aspect_ratio, | ||
| mask_group_max_aspect_ratio, | ||
| ) -> FlavaMaskingGenerator: | ||
| return FlavaMaskingGenerator( |
There was a problem hiding this comment.
The FlavaMaskingGenerator in the slow processing file generates numpy arrays masks, it would be great to write a FlavaMaskingGenerator generating torch tensors masks in this file
There was a problem hiding this comment.
I have written another FlavaMaskingGenerator that operate on tensor and optimize redundant loops. However, I don't think we can optimize further to have them operate on batch input
yonigozlan
left a comment
There was a problem hiding this comment.
Thanks for iterating @rootonchair ! Looks ready to go after adding a short comment about the Bicubic/Lanczos issue. Let's wait for @ArthurZucker final approval then LGTM
| codebook_do_resize = True | ||
| codebook_size = {"height": 112, "width": 112} | ||
| codebook_resample = PILImageResampling.LANCZOS | ||
| codebook_resample = PILImageResampling.BICUBIC |
There was a problem hiding this comment.
As I said in other PRs, ideally we would keep Lanczos here, and add a warning that fast image processors don't support Lanczos before forcing Bicubic in preprocessing. Seeing that this is only for codebook pixels, and that return_codebook_pixels is False by default, a short comment explaining why we have Bicubic here instead of Lanczos might be enough.
There was a problem hiding this comment.
Changed accordingly!
| encoding_slow = image_processor_slow( | ||
| dummy_image, return_tensors="pt", return_codebook_pixels=True, return_image_mask=True | ||
| ) | ||
| encoding_fast = image_processor_fast( | ||
| dummy_image, return_tensors="pt", return_codebook_pixels=True, return_image_mask=True |
|
Thanks for iterating! Updating the branch and running the full CI. If everything pass I'll merge :) |
* support flava fast image processor * run style and quality * update test * update according to reviews * make style * update comment on BICUBIC * make style --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* support flava fast image processor * run style and quality * update test * update according to reviews * make style * update comment on BICUBIC * make style --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
What does this PR do?
Related #36978
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.