Skip to content

Fix: ci.py crash on macOS from duplicate libomp load#520

Open
ChaoWao wants to merge 1 commit intohw-native-sys:mainfrom
ChaoWao:fix/ci-py-crash-on-macos-due-to-duplicate-libomp
Open

Fix: ci.py crash on macOS from duplicate libomp load#520
ChaoWao wants to merge 1 commit intohw-native-sys:mainfrom
ChaoWao:fix/ci-py-crash-on-macos-due-to-duplicate-libomp

Conversation

@ChaoWao
Copy link
Copy Markdown
Collaborator

@ChaoWao ChaoWao commented Apr 11, 2026

Summary

On macOS, python ci.py -p a2a3sim (or a5sim) aborts every task with OMP: Error #15: Initializing libomp.dylib, but found libomp.dylib already initialized (SIGABRT) before any DeviceRunner code runs.

Root cause: Two distinct libomp.dylib copies get mapped into the single CI process:

  • Homebrew's /opt/homebrew/opt/libomp/lib/libomp.dylib pulled in by numpy → openblas
  • PyTorch's bundled .venv/.../torch/lib/libomp.dylib

They have different install names, so dyld loads both and Intel's libomp aborts on the second init. Surfaced after #493 collapsed sim CI into one long-lived Python process — now every golden's import numpy / import torch accumulates conflicting libomps in the same address space.

Changes

  • ci.py: Set KMP_DUPLICATE_LIB_OK=TRUE at the top of the file on darwin, before any import that can transitively pull in numpy or torch. This is Intel's documented escape hatch; safe for our workload where numpy/torch are only used for golden reference math, not parallel OMP regions.
  • docs/macos-libomp-collision.md (new): Full root cause analysis, debugging steps, reproducer, and explicit "what NOT to do" list so future contributors don't re-investigate the same rabbit hole. Linked from docs/ci.md.
  • examples/a2a3/{aicpu,host}_build_graph/bgemm/golden.py: Rewrite the two remaining numpy-based goldens in torch for style consistency with the rest of examples/. Note this does not avoid the libomp collision on its own — import torch transitively imports numpy.

Also investigated: alternatives like ctypes.CDLL(..., RTLD_GLOBAL) pre-loading and DYLD_INSERT_LIBRARIES do not fix this, because the two dylibs have distinct LC_ID_DYLIB install names and dyld resolves dependencies by install name, not by symbol. See the doc for details.

Test plan

  • python ci.py -p a2a3sim on macOS — 20/20 pass (previously 20/20 fail with SIGABRT)
  • python ci.py -p a5sim on macOS — 12/12 pass (previously 12/12 fail with SIGABRT)
  • python ci.py (both sims together) on macOS — 32/32 pass
  • Linux sim CI still green (unchanged path — KMP_DUPLICATE_LIB_OK is only set on sys.platform == "darwin")

On macOS, `python ci.py -p a2a3sim` (or a5sim) aborts every task with
"OMP: Error hw-native-sys#15: Initializing libomp.dylib, but found libomp.dylib
already initialized" (SIGABRT) before any DeviceRunner code runs.

Two distinct libomp.dylib copies get mapped into the single CI process:
homebrew's /opt/homebrew/opt/libomp/lib/libomp.dylib (via numpy ->
openblas) and pip torch's .venv/.../torch/lib/libomp.dylib. They have
different install names, so dyld loads them both and Intel's libomp
aborts on the second init. Surfaced after hw-native-sys#493 collapsed sim CI into
one long-lived Python process; each golden's `import numpy`/`import
torch` now accumulates conflicting libomps in the same address space.

- Set KMP_DUPLICATE_LIB_OK=TRUE at the top of ci.py on darwin, before
  any import that can transitively pull in numpy or torch. This is
  Intel's documented escape hatch; safe for our workload where numpy
  and torch are only used for golden reference math, not parallel
  OMP regions.
- Document the full root cause, debugging steps, and explicit
  "what not to do" list in docs/macos-libomp-collision.md so future
  contributors don't re-investigate. Link it from docs/ci.md.
- Rewrite the two remaining numpy-based goldens
  (a2a3/{aicpu,host}_build_graph/bgemm) in torch for style consistency
  with the rest of examples/. Note this does not avoid the libomp
  collision on its own -- `import torch` transitively imports numpy.

Verified: `python ci.py` passes 32/32 sim tests (20 a2a3sim +
12 a5sim) on macOS without KMP_DUPLICATE_LIB_OK needing to be set
manually.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a workaround for a libomp collision issue on macOS that causes SIGABRT when both numpy and torch are loaded in the same process. The fix involves setting KMP_DUPLICATE_LIB_OK=TRUE at the top of ci.py before other imports. Detailed documentation explaining the root cause and mitigation has been added in docs/macos-libomp-collision.md and referenced in docs/ci.md. Additionally, several golden reference scripts were updated to use torch instead of numpy for input generation and computation. I have no feedback to provide as there were no review comments to evaluate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant