Skip to content

Add Fast Image Processor for Flava#37135

Merged
yonigozlan merged 10 commits intohuggingface:mainfrom
rootonchair:flava_fast_image_processor
Apr 14, 2025
Merged

Add Fast Image Processor for Flava#37135
yonigozlan merged 10 commits intohuggingface:mainfrom
rootonchair:flava_fast_image_processor

Conversation

@rootonchair
Copy link
Copy Markdown
Contributor

What does this PR do?

Related #36978

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@github-actions
Copy link
Copy Markdown
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@github-actions github-actions Bot marked this pull request as draft March 31, 2025 11:46
@rootonchair rootonchair marked this pull request as ready for review March 31, 2025 11:46
@github-actions github-actions Bot requested review from ydshieh and yonigozlan March 31, 2025 11:47
Copy link
Copy Markdown
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi again @rootonchair , very nice work! Quite an exotic image processor, but very nicely handled. Left a few comments on some things that could be simplified, and on writing a torch FlavaMaskingGenerator. Other than that, LGTM!

Comment on lines +267 to +270
resample = kwargs.pop("codebook_resample")
kwargs["codebook_interpolation"] = (
pil_torch_interpolation_mapping[resample] if isinstance(resample, (PILImageResampling, int)) else resample
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be better to do that by overriding self._further_process_kwargs, to avoid overriding the whole preprocess function

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Changed accordingly

mask_group_min_aspect_ratio,
mask_group_max_aspect_ratio,
) -> FlavaMaskingGenerator:
return FlavaMaskingGenerator(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FlavaMaskingGenerator in the slow processing file generates numpy arrays masks, it would be great to write a FlavaMaskingGenerator generating torch tensors masks in this file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have written another FlavaMaskingGenerator that operate on tensor and optimize redundant loops. However, I don't think we can optimize further to have them operate on batch input

Copy link
Copy Markdown
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating @rootonchair ! Looks ready to go after adding a short comment about the Bicubic/Lanczos issue. Let's wait for @ArthurZucker final approval then LGTM

codebook_do_resize = True
codebook_size = {"height": 112, "width": 112}
codebook_resample = PILImageResampling.LANCZOS
codebook_resample = PILImageResampling.BICUBIC
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said in other PRs, ideally we would keep Lanczos here, and add a warning that fast image processors don't support Lanczos before forcing Bicubic in preprocessing. Seeing that this is only for codebook pixels, and that return_codebook_pixels is False by default, a short comment explaining why we have Bicubic here instead of Lanczos might be enough.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed accordingly!

Comment on lines +415 to +419
encoding_slow = image_processor_slow(
dummy_image, return_tensors="pt", return_codebook_pixels=True, return_image_mask=True
)
encoding_fast = image_processor_fast(
dummy_image, return_tensors="pt", return_codebook_pixels=True, return_image_mask=True
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks for that

@yonigozlan
Copy link
Copy Markdown
Member

Thanks for iterating! Updating the branch and running the full CI. If everything pass I'll merge :)

@yonigozlan yonigozlan merged commit 49b9a69 into huggingface:main Apr 14, 2025
19 of 20 checks passed
@rootonchair rootonchair deleted the flava_fast_image_processor branch April 15, 2025 18:02
cyr0930 pushed a commit to cyr0930/transformers that referenced this pull request Apr 18, 2025
* support flava fast image processor

* run style and quality

* update test

* update according to reviews

* make style

* update comment on BICUBIC

* make style

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* support flava fast image processor

* run style and quality

* update test

* update according to reviews

* make style

* update comment on BICUBIC

* make style

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants