Skip to content

Fix AMD CI: rebuild torchvision with libjpeg + refresh expectations#45533

Merged
Abdennacer-Badaoui merged 3 commits intohuggingface:mainfrom
Abdennacer-Badaoui:fix-ci-amd
Apr 21, 2026
Merged

Fix AMD CI: rebuild torchvision with libjpeg + refresh expectations#45533
Abdennacer-Badaoui merged 3 commits intohuggingface:mainfrom
Abdennacer-Badaoui:fix-ci-amd

Conversation

@Abdennacer-Badaoui
Copy link
Copy Markdown
Member

@Abdennacer-Badaoui Abdennacer-Badaoui commented Apr 20, 2026

Summary

  • Rebuild torchvision from source in the AMD CI image so torchvision.io.decode_image (used by load_image_as_tensor) has libjpeg/libpng support. This unblocks the wave of decode_jpeg: torchvision not compiled with libjpeg support failures introduced when image loading switched from PIL to torchvision.io in this PR. The build is working, see here.
  • Keep ROCm image ops (nms, roi_align, deform_conv) on GPU for CUDA parity by passing FORCE_CUDA=1 and filtering hipify-emitted *_hip.cpp breadcrumbs out of the sources globs.
  • Refresh expectations for qwen2_5_vl and modernbert. These drifted because the AMD image pins flash-attention to an older commit (6387433…) to keep the Docker build under the 6h GitHub Actions limit (~4h with the pin, >6h on newer heads), and that version produces slightly different numerics on the affected tests.

@Abdennacer-Badaoui Abdennacer-Badaoui marked this pull request as draft April 20, 2026 14:40
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: modernbert, qwen2_5_vl

@Abdennacer-Badaoui Abdennacer-Badaoui marked this pull request as ready for review April 20, 2026 15:46
@tarekziade
Copy link
Copy Markdown
Collaborator

run-slow: modernbert, qwen2_5_vl

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/modernbert", "models/qwen2_5_vl"]
quantizations: []


# Rebuild torchvision so decode_image has libjpeg and ROCm image ops stay on GPU.
RUN python3 -m pip install --no-cache-dir "setuptools<81" pybind11
RUN TV_VERSION=$(python3 -c "import torchvision; print(torchvision.__version__.split('+')[0])") && \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, what did install torchvision prior to this line so we have to remove it? could we skip that initial install?

Copy link
Copy Markdown
Collaborator

@tarekziade tarekziade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to land just a question about the torchvision dep

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN d3ff7892 workflow commit (merge commit)
PR 5a4d5390 branch commit (from PR)
main 9dff7ca5 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@Abdennacer-Badaoui Abdennacer-Badaoui added this pull request to the merge queue Apr 21, 2026
Merged via the queue into huggingface:main with commit 67fb8bb Apr 21, 2026
24 checks passed
@Abdennacer-Badaoui Abdennacer-Badaoui deleted the fix-ci-amd branch April 21, 2026 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants