Skip to content

[CI][DO NOT MERGE] Test new Isaac Sim image latest-develop sha256:06197a67#5630

Closed
hujc7 wants to merge 1 commit into
isaac-sim:developfrom
hujc7:jichuanh/ci-test-newer-isaacsim
Closed

[CI][DO NOT MERGE] Test new Isaac Sim image latest-develop sha256:06197a67#5630
hujc7 wants to merge 1 commit into
isaac-sim:developfrom
hujc7:jichuanh/ci-test-newer-isaacsim

Conversation

@hujc7
Copy link
Copy Markdown
Collaborator

@hujc7 hujc7 commented May 15, 2026

Purpose (updated 2026-05-15 with verified data)

Originally opened to test whether the newer Isaac Sim image (built today, 2026-05-15 03:57 UTC) ships a fixed OpenBLAS bundle that would let us drop the env-var workaround in #5625.

Verified result: no. The image bump alone cannot fix the SIGSEGV. The broken numpy comes from IsaacLab's own pip install, not from the Isaac Sim base image.

This PR is diagnostic and should not be merged as-is.

Verified library versions from docker run (not from CI grep counts)

Pinned image sha256:0dd49a11 (5/11) New image sha256:06197a67 (5/15)
Base image numpy 2.3.1 2.3.1 ← same
Base image scipy 1.17.0 1.17.0 ← same
Base image bundled openblas (numpy.libs) libscipy_openblas64_-56d6093b.sosafe hash libscipy_openblas64_-56d6093b.sosame safe hash
Base image bundled openblas (scipy.libs) libscipy_openblas-6cdc3b4a.so libscipy_openblas-6cdc3b4a.so
Kit-archive path suffix +e3a24436 +6312fa25

The only difference is the kit-archive packaging timestamp. Library binaries are bit-identical hashes. The safe hash -56d6093b matches the OpenBLAS bundle Piotr couldn't reproduce the crash with in his local environment (Slack thread reply 93).

What the CI dep-manifest dump shows (different layer, different numpy)

The diagnostic print I added in this PR captured inside the running CI test container:

=== Dep manifest (numpy/scipy/openblas) ===
Name: numpy
Version: 2.3.5                                          ← different from base image!
Location: /workspace/isaaclab/_isaac_sim/kit/python/lib/python3.12/site-packages
Name: scipy
Version: 1.17.1
bundled openblas: .../numpy.libs/libscipy_openblas64_-fdde5778.so   ← BROKEN hash, matches Slack crash backtrace
=== /Dep manifest ===

So there are two numpy installs in the running container:

  • extscache/omni.kit.pip_archive/pip_prebundle/numpy/ → numpy 2.3.1 + safe -56d6093b (from Isaac Sim base image)
  • _isaac_sim/kit/python/lib/python3.12/site-packages/numpy/ → numpy 2.3.5 + broken -fdde5778 (installed by IsaacLab's CI Docker layer when pip resolves numpy>=2 from source/isaaclab/setup.py:21)

The site-packages numpy takes precedence at runtime → CI imports the broken one → atfork SIGSEGV.

Why pip resolves to 2.3.5 and not 2.4.1

Bisected against the IsaacLab dependency graph: pin-pink → pin (Pinocchio) → libpinocchio 3.9.0 → cmeel-boost ~=1.89.0 transitively caps numpy at <2.4. Adding numpy>=2.4 + pin>=2.6.3 to pip produces:

ERROR: ResolutionImpossible
cmeel-boost 1.83.0 depends on numpy~=1.26.0; python_version >= "3.9"

So numpy>=2 resolves to the highest 2.x compatible with cmeel-boost, which is 2.3.5.

CI results on this PR (attempt 2 after re-queue, partial)

Across 8 completed jobs: 0 actual signal 11/SIGSEGV events. 5 pass, 3 fail (none OpenBLAS-related — pre-existing pink_ik NaN, Multirotor API rename, cartpole-integration test timeout). The SIGSEGV-prone heavy jobs (core[2/3], core[3/3], mimic, tasks[N/3]) are still running.

Where the fix actually has to land

Approach Where Effort
Pin numpy in IsaacLab's own setup.py to a non-broken version (e.g. numpy>=2,!=2.3.5 or numpy>=2,<2.3.5) — pinned image stays, no Isaac Sim change source/isaaclab/setup.py:21, source/isaaclab_tasks/setup.py:21, source/isaaclab_rl/setup.py:22, source/isaaclab_visualizers/setup.py:13 One-line PR (× 4 files)
Wait for cmeel-boost to lift its numpy<2.4 cap so numpy>=2.4.1 can be pulled Upstream cmeel-boost Out of our hands
#5625 env-var workaround (CI-only) .github/actions/run-tests/action.yml Already open
Image bump (this PR) .github/workflows/config.yaml Doesn't work — proven by this PR

Type of change

  • Diagnostic / non-functional (do not merge)

Probe whether the newer Isaac Sim image (pushed 2026-05-15 03:57 UTC)
ships a NumPy >= 2.4.1 that includes the OpenBLAS atfork fix from
numpy/numpy#30132 (which bundles scipy-openblas 0.3.30.7 with the
OpenMathLib/OpenBLAS#5520 patch).

The current pin (sha256:0dd49a11..., from 2026-05-11) carries
NumPy 2.3.5 + libscipy_openblas64_-fdde5778.so, which is the .so
named in the SIGSEGV backtrace from the C06HLQ6CB41 Slack thread.

This PR is purely diagnostic.  The dep-manifest print (cherry-picked
from isaac-sim#5626) will reveal whichever numpy/scipy/openblas the newer
image ships in the GitHub Actions log, before pytest starts.

If the newer image ships NumPy >= 2.4.1, my env-var workaround
(isaac-sim#5625) can be reverted in a follow-up.  If it still ships NumPy
2.3.5, the env-var workaround stays in place and we wait for the
Isaac Sim base image to bump numpy.

Do not merge.
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This is a well-structured diagnostic PR for testing whether the newer Isaac Sim container image (sha256:06197a67...) ships NumPy ≥ 2.4.1 with the OpenBLAS atfork fix.

✅ What Looks Good

  1. Clear intent: PR is appropriately marked [DO NOT MERGE] and the purpose is well-documented in the description.

  2. Non-intrusive diagnostics: The dependency manifest print in action.yml uses defensive error handling (2>/dev/null || true) ensuring CI will not fail if the diagnostic commands encounter issues.

  3. Comprehensive context: The PR body provides excellent traceability to:

  4. Actionable interpretation guide: The outcome table in the PR description makes it easy to determine next steps based on CI results.

📋 Minor Observations

  1. Diagnostic complexity: The Python one-liner that scans for OpenBLAS .so files is functional but dense. For a non-merge diagnostic PR, this is acceptable.

  2. CI impact: The additional pip show and Python diagnostic commands add minimal overhead (~1-2s), acceptable for a diagnostic run.

🔍 What to Watch

Once CI completes, check the build logs for:

=== Dep manifest (numpy/scipy/openblas) ===
numpy <version>
scipy <version>
bundled openblas: ...
=== /Dep manifest ===

If numpy shows ≥ 2.4.1 with a changed OpenBLAS hash (not -fdde5778), the fix has propagated and a production PR can update the image pin.


No blocking issues identified. This diagnostic PR is appropriately scoped for its investigative purpose.

@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented May 15, 2026

Closing — diagnostic proved the Isaac Sim base image isn't the source of the OpenBLAS atfork SIGSEGV (both pinned 5/11 and rolling 5/15 prebundle the safe numpy 2.3.1 + libscipy_openblas64_-56d6093b.so). The broken numpy 2.3.5 enters via IsaacLab's own isaaclab.sh --install Docker layer. Real fix at the root cause is PR #5642 (pin numpy!=2.3.5 in IsaacLab's setup.py).

@hujc7 hujc7 closed this May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant