ci: replace deprecated zmq with pyzmq in CI scripts#3007
ci: replace deprecated zmq with pyzmq in CI scripts#3007
Conversation
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
There was a problem hiding this comment.
Pull request overview
This PR updates CI dependency installation to use pyzmq (the actual ZeroMQ Python bindings) instead of the deprecated zmq meta-package, addressing intermittent CI setup failures during Triton test shard runs.
Changes:
- Replace
pip install ... zmq ...withpip install ... pyzmq ...in CI workflows. - Update the Triton CI build helper script to install
pyzmqinstead ofzmq.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
.github/workflows/aiter-test.yaml |
Switch CI dependency install from zmq to pyzmq in the main test job(s). |
.github/workflows/vllm_benchmark.yaml |
Switch benchmark workflow dependency install from zmq to pyzmq. |
.github/scripts/build_aiter_triton.sh |
Switch Triton setup script dependency install from zmq to pyzmq. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The `zmq` meta-package fails to install on some CI runners because it cannot resolve the `pyzmq` dependency. Use `pyzmq` directly, which is the actual package providing ZeroMQ bindings for Python. Fixes Triton Test Shard 7 setup failures.
Set pip global retries=15 and timeout=120s in build_aiter_triton.sh to handle transient PyPI network failures on self-hosted runners. Shard 5/7 failures were caused by RemoteDisconnected during pip install.
pyzmq is only used by aiter.dist.shm_broadcast, not by any triton test. When PyPI is unreachable on self-hosted runners, the pyzmq install failure should not block the entire CI shard. Split pyzmq into a separate pip install with || fallback so triton tests can proceed even when PyPI connectivity is degraded.
When batch pip install fails (e.g., PyPI connectivity issues on self-hosted runners), retry each package individually. Only pyzmq is allowed to fail silently since it's only used by aiter.dist.shm_broadcast and not required by any CI test suite. Critical packages (pandas, einops, numpy) must still succeed.
CI Status: ALL GREEN (32 pass / 0 fail / 0 pending)This PR fixes intermittent CI failures caused by
Root cause: PR #2897 introduced @lipeng-amd Ready for review. |
Summary
zmq(deprecated meta-package) withpyzmq(actual ZeroMQ Python bindings) across all CI workflow files and build scriptszmqpackage intermittently fails to resolvepyzmqdependency on CI runners, causing Triton Test Shard setup failures (e.g., PR [Silo] Bulk merge: kernel fixes and features (SplitK, MoE fixes, Qwen3-Next, pa_mqa OOB) #3005 Shard 7)Files changed
.github/scripts/build_aiter_triton.sh— triton test setup.github/workflows/aiter-test.yaml— standard + MI300X tests (3 occurrences).github/workflows/vllm_benchmark.yaml— vLLM benchmarkTest plan
pyzmqresolution errors