Add Fast Image Processor for vilt#37304
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
73dc130 to
a3777b6
Compare
40019bf to
80fa011
Compare
80fa011 to
53a4b63
Compare
53a4b63 to
fe9cff4
Compare
yonigozlan
left a comment
There was a problem hiding this comment.
Hi @devxaitist , thanks for contributing, Looks great! Some things to modifiy to make the processor cleaner
Hi @yonigozlan, thank you for pointing out the need to separate functionality according to the single responsibility principle. Breaking down the _preprocess function into specialized methods like _resize and _pad_batch was definitely the right approach. I appreciate your guidance on improving the code structure. I've implemented all the requested changes, so please review the updated code when you have a chance. |
51a9a2b to
d5a82e8
Compare
d5a82e8 to
817cc72
Compare
# Conflicts: # src/transformers/__init__.py # src/transformers/utils/dummy_torchvision_objects.py
|
Hi @yonigozlan, I've updated the branch to reflect the latest changes in the main branch. When you have a moment, I'd appreciate your review on the updated PR. Thank you! |
yonigozlan
left a comment
There was a problem hiding this comment.
Thanks for iterating @devxaitist ! Still changes to be done, also it would be great to override the test_slow_fast_equivalence functions in the test to also compare the padding masks.
cbff2ff to
b6193ad
Compare
d500b11 to
cbc4201
Compare
…and fast implementations
cbc4201 to
4c981ec
Compare
|
Thank you so much for your valuable feedback, @yonigozlan ! I've implemented all the requested changes, including overriding the test_slow_fast_equivalence functions to compare padding masks. I greatly appreciate your time and guidance throughout this process. This is my first open-source contribution, and your feedback has been incredibly helpful for my learning. When you have a moment, I'd be grateful if you could review these latest changes. Thank you again for your patience and support! |
yonigozlan
left a comment
There was a problem hiding this comment.
Thanks for iterating @devxaitist looks much better! I suggested a small improvement in the padding function, other than that LGTM!
You'll also have to rebase on main, and change the way the docstrings are handled, as this was recently updated. You'll need to replace the @add_start_docstrings with @auto_docstring from ...utils, and move the docstrings of the custom args to the ViltFastImageProcessorKwargs class. You can look at other fast image processors to see how it's done.
Thanks again for the great work!
|
Thank you for your continuous support and detailed reviews, @yonigozlan ! I've implemented all the suggested changes, including the reorder_images approach for both masks and images. The tests are passing smoothly, and the code looks much cleaner now. I'm both excited and a bit nostalgic as we're approaching the final review. Your thorough feedback throughout this process has been incredibly valuable, and I've learned so much from our interactions. It's been a wonderful journey contributing to the transformers library, and I'm grateful for your patience and guidance. Could you please take a look at these final changes when you have a moment? Thank you again for making this first open-source contribution such a meaningful experience! |
yonigozlan
left a comment
There was a problem hiding this comment.
Very nice, LGTM!
I'm glad to hear you enjoyed working on this PR, and congrats on your first open-source contribution! 🤗
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* init vilt image processor fast * Refactor image processor tests to use loop for all processors * Add ViltImageProcessorFast with PyTorch-based optimized image processing * Change made automatically by make fixup command * Change made automatically by make fix-copies command * Fix type hints in ViltImageProcessorFast for Python compatibility * Define constants for image resizing based on COCO dataset aspect ratio * Add missing property initializations to ViltImageProcessorFast * Extract resize logic into dedicated method in ViltImageProcessorFast * Extract padding logic into dedicated method * Implement shape-based image grouping for optimized processing in Vilt * Update test suite to verify ViltImageProcessorFast attributes * Move variable declarations to _preprocess method parameters * Remove unused parameters * Rename _resize method to resize to override existing function * Remove whitespace * Remove unnecessary type check and conversion for stacked_images * Remove redundant loop and apply padding directly to stacked images * Refactor pad function to return images and mask as tuple instead of dict * Add tests comparing padding masks in slow and fast implementations * Update ViltImageProcessor tests to ensure compatibility between slow and fast implementations * Replace add_start_docstrings with auto_docstring in ViltImageProcessorFast * Move docstrings of custom args to ViltFastImageProcessorKwargs * Use reorder_images function for both masks and images --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
What does this PR do?
Add ViltImageProcessorFast implementation for faster image processing using PyTorch. (#36978)
This PR adds a fast image processor for the Vilt model that leverages PyTorch and torchvision functions instead of PIL/numpy. The implementation improves performance by using tensor operations and enabling GPU processing.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@yonigozlan
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.