[CI][DO NOT MERGE] Test new Isaac Sim image latest-develop sha256:06197a67#5630
[CI][DO NOT MERGE] Test new Isaac Sim image latest-develop sha256:06197a67#5630hujc7 wants to merge 1 commit into
Conversation
Probe whether the newer Isaac Sim image (pushed 2026-05-15 03:57 UTC) ships a NumPy >= 2.4.1 that includes the OpenBLAS atfork fix from numpy/numpy#30132 (which bundles scipy-openblas 0.3.30.7 with the OpenMathLib/OpenBLAS#5520 patch). The current pin (sha256:0dd49a11..., from 2026-05-11) carries NumPy 2.3.5 + libscipy_openblas64_-fdde5778.so, which is the .so named in the SIGSEGV backtrace from the C06HLQ6CB41 Slack thread. This PR is purely diagnostic. The dep-manifest print (cherry-picked from isaac-sim#5626) will reveal whichever numpy/scipy/openblas the newer image ships in the GitHub Actions log, before pytest starts. If the newer image ships NumPy >= 2.4.1, my env-var workaround (isaac-sim#5625) can be reverted in a follow-up. If it still ships NumPy 2.3.5, the env-var workaround stays in place and we wait for the Isaac Sim base image to bump numpy. Do not merge.
There was a problem hiding this comment.
Review Summary
This is a well-structured diagnostic PR for testing whether the newer Isaac Sim container image (sha256:06197a67...) ships NumPy ≥ 2.4.1 with the OpenBLAS atfork fix.
✅ What Looks Good
-
Clear intent: PR is appropriately marked
[DO NOT MERGE]and the purpose is well-documented in the description. -
Non-intrusive diagnostics: The dependency manifest print in
action.ymluses defensive error handling (2>/dev/null || true) ensuring CI will not fail if the diagnostic commands encounter issues. -
Comprehensive context: The PR body provides excellent traceability to:
- Root cause issue: numpy/numpy#30092 (OpenBLAS atfork regression)
- Related scipy issue: scipy#23686
- OpenBLAS fix: OpenMathLib/OpenBLAS#5520
- Companion PRs (#5620, #5625, #5626) for workarounds and additional diagnostics
-
Actionable interpretation guide: The outcome table in the PR description makes it easy to determine next steps based on CI results.
📋 Minor Observations
-
Diagnostic complexity: The Python one-liner that scans for OpenBLAS .so files is functional but dense. For a non-merge diagnostic PR, this is acceptable.
-
CI impact: The additional
pip showand Python diagnostic commands add minimal overhead (~1-2s), acceptable for a diagnostic run.
🔍 What to Watch
Once CI completes, check the build logs for:
=== Dep manifest (numpy/scipy/openblas) ===
numpy <version>
scipy <version>
bundled openblas: ...
=== /Dep manifest ===
If numpy shows ≥ 2.4.1 with a changed OpenBLAS hash (not -fdde5778), the fix has propagated and a production PR can update the image pin.
No blocking issues identified. This diagnostic PR is appropriately scoped for its investigative purpose.
|
Closing — diagnostic proved the Isaac Sim base image isn't the source of the OpenBLAS atfork SIGSEGV (both pinned 5/11 and rolling 5/15 prebundle the safe numpy 2.3.1 + |
Purpose (updated 2026-05-15 with verified data)
Originally opened to test whether the newer Isaac Sim image (built today, 2026-05-15 03:57 UTC) ships a fixed OpenBLAS bundle that would let us drop the env-var workaround in #5625.
Verified result: no. The image bump alone cannot fix the SIGSEGV. The broken numpy comes from IsaacLab's own pip install, not from the Isaac Sim base image.
This PR is diagnostic and should not be merged as-is.
Verified library versions from
docker run(not from CI grep counts)sha256:0dd49a11(5/11)sha256:06197a67(5/15)libscipy_openblas64_-56d6093b.so← safe hashlibscipy_openblas64_-56d6093b.so← same safe hashlibscipy_openblas-6cdc3b4a.solibscipy_openblas-6cdc3b4a.so+e3a24436+6312fa25The only difference is the kit-archive packaging timestamp. Library binaries are bit-identical hashes. The safe hash
-56d6093bmatches the OpenBLAS bundle Piotr couldn't reproduce the crash with in his local environment (Slack thread reply 93).What the CI dep-manifest dump shows (different layer, different numpy)
The diagnostic print I added in this PR captured inside the running CI test container:
So there are two numpy installs in the running container:
extscache/omni.kit.pip_archive/pip_prebundle/numpy/→ numpy 2.3.1 + safe-56d6093b(from Isaac Sim base image)_isaac_sim/kit/python/lib/python3.12/site-packages/numpy/→ numpy 2.3.5 + broken-fdde5778(installed by IsaacLab's CI Docker layer when pip resolvesnumpy>=2fromsource/isaaclab/setup.py:21)The
site-packagesnumpy takes precedence at runtime → CI imports the broken one → atfork SIGSEGV.Why pip resolves to 2.3.5 and not 2.4.1
Bisected against the IsaacLab dependency graph:
pin-pink → pin (Pinocchio) → libpinocchio 3.9.0 → cmeel-boost ~=1.89.0transitively caps numpy at <2.4. Addingnumpy>=2.4 + pin>=2.6.3to pip produces:So
numpy>=2resolves to the highest 2.x compatible with cmeel-boost, which is 2.3.5.CI results on this PR (attempt 2 after re-queue, partial)
Across 8 completed jobs: 0 actual
signal 11/SIGSEGVevents. 5 pass, 3 fail (none OpenBLAS-related — pre-existing pink_ik NaN, Multirotor API rename, cartpole-integration test timeout). The SIGSEGV-prone heavy jobs (core[2/3], core[3/3], mimic, tasks[N/3]) are still running.Where the fix actually has to land
numpy>=2,!=2.3.5ornumpy>=2,<2.3.5) — pinned image stays, no Isaac Sim changesource/isaaclab/setup.py:21,source/isaaclab_tasks/setup.py:21,source/isaaclab_rl/setup.py:22,source/isaaclab_visualizers/setup.py:13cmeel-boostto lift its numpy<2.4 cap sonumpy>=2.4.1can be pulled.github/actions/run-tests/action.yml.github/workflows/config.yamlType of change