Use torchvision decode_image to load images in the torchvision backend#45195
Use torchvision decode_image to load images in the torchvision backend#45195yonigozlan merged 7 commits intohuggingface:mainfrom
decode_image to load images in the torchvision backend#45195Conversation
ec78ace to
e5a7a0f
Compare
e5a7a0f to
733480e
Compare
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
vasqu
left a comment
There was a problem hiding this comment.
thanks 🫡 that makes 100% sense to avoid intermediate conversions too much
- For completeness sake can you add your benchmark script in the PR description
- I assume this is not sensitive to the torch version but just asking in case
| else: | ||
| pil_torch_interpolation_mapping = {} | ||
| torch_pil_interpolation_mapping = {} | ||
| if is_torchvision_available(): |
There was a problem hiding this comment.
I prefer not to have nested imports here but that's just me 😅, I can change back if needed
There was a problem hiding this comment.
All good, shouldn't be too bad
|
Lets figure the one failing processor test as well 👀 |
|
Thanks for the review! The issue with the test should be resolved, and added the source code for the benchmarks |
|
@vasqu re your review question: no, this is not sensitive to the torch version. |
| elif isinstance(image, PIL.Image.Image): | ||
| image = PIL.ImageOps.exif_transpose(image) | ||
| return pil_to_tensor(image.convert("RGB")) |
There was a problem hiding this comment.
hm do we need to convert from PIL when loading? I'd suppose return no-op if a decoded image is provided
There was a problem hiding this comment.
This is just to be consistent with the helper function's name load_image_as_tensor. In practice, in the processors we will have a no op (and a pil_to_tensor later on for torchvision backend) because of this:
https://github.com/yonigozlan/transformers/blob/2b5d481df844c19907c15fab7cd547ccf9f27c7f/src/transformers/image_processing_backends.py#L122-L125
-> We only enter if we have a str, and no op if we have a valid image
|
@yonigozlan sounds good to me, so we can merge? I want to avoid force merging now as CI is kind of flaky atm and has troubles with the network / hub. You can ping me again if it gets too annoying |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: colpali |
|
Ok just flaky CI ig, it merged itself |
…kend (huggingface#45195) * use torchvision's decode_image to load images for torchvision backend * fix video processor issue --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
What this PR does
Adds a new
load_image_as_tensorutility leveraging torchvision'sdecode_imagetoimage_utils.pyand overridesfetch_imagesinTorchvisionBackendto use it. Previously, all image loading went through PIL regardless of which backend was used.Benchmarks
Hardware: NVIDIA A10G + CPU (AWS)
Method: 20 repetitions, median, 3 warm-up runs
Image loading
Pixel values are identical between both paths for JPEG and PNG across all sizes tested.
End-to-end processor pipeline
The pipeline speedups exceed the standalone loading gains because the old path also ran
pil_to_tensorinsideprocess_image, which is skipped entirely when the image is already a tensor.Summary
Benchmarks script:
Click to expand code
Cc @zucchini-nlp @NicolasHug