Fix: ci.py crash on macOS from duplicate libomp load by ChaoWao · Pull Request #520 · hw-native-sys/simpler

ChaoWao · 2026-04-11T04:54:16Z

Summary

On macOS, python ci.py -p a2a3sim (or a5sim) aborts every task with OMP: Error #15: Initializing libomp.dylib, but found libomp.dylib already initialized (SIGABRT) before any DeviceRunner code runs.

Root cause: Two distinct libomp.dylib copies get mapped into the single CI process:

Homebrew's /opt/homebrew/opt/libomp/lib/libomp.dylib pulled in by numpy → openblas
PyTorch's bundled .venv/.../torch/lib/libomp.dylib

They have different install names, so dyld loads both and Intel's libomp aborts on the second init. Surfaced after #493 collapsed sim CI into one long-lived Python process — now every golden's import numpy / import torch accumulates conflicting libomps in the same address space.

Changes

ci.py: Set KMP_DUPLICATE_LIB_OK=TRUE at the top of the file on darwin, before any import that can transitively pull in numpy or torch. This is Intel's documented escape hatch; safe for our workload where numpy/torch are only used for golden reference math, not parallel OMP regions.
docs/macos-libomp-collision.md (new): Full root cause analysis, debugging steps, reproducer, and explicit "what NOT to do" list so future contributors don't re-investigate the same rabbit hole. Linked from docs/ci.md.
examples/a2a3/{aicpu,host}_build_graph/bgemm/golden.py: Rewrite the two remaining numpy-based goldens in torch for style consistency with the rest of examples/. Note this does not avoid the libomp collision on its own — import torch transitively imports numpy.

Also investigated: alternatives like ctypes.CDLL(..., RTLD_GLOBAL) pre-loading and DYLD_INSERT_LIBRARIES do not fix this, because the two dylibs have distinct LC_ID_DYLIB install names and dyld resolves dependencies by install name, not by symbol. See the doc for details.

Test plan

python ci.py -p a2a3sim on macOS — 20/20 pass (previously 20/20 fail with SIGABRT)
python ci.py -p a5sim on macOS — 12/12 pass (previously 12/12 fail with SIGABRT)
python ci.py (both sims together) on macOS — 32/32 pass
Linux sim CI still green (unchanged path — KMP_DUPLICATE_LIB_OK is only set on sys.platform == "darwin")

On macOS, `python ci.py -p a2a3sim` (or a5sim) aborts every task with "OMP: Error hw-native-sys#15: Initializing libomp.dylib, but found libomp.dylib already initialized" (SIGABRT) before any DeviceRunner code runs. Two distinct libomp.dylib copies get mapped into the single CI process: homebrew's /opt/homebrew/opt/libomp/lib/libomp.dylib (via numpy -> openblas) and pip torch's .venv/.../torch/lib/libomp.dylib. They have different install names, so dyld loads them both and Intel's libomp aborts on the second init. Surfaced after hw-native-sys#493 collapsed sim CI into one long-lived Python process; each golden's `import numpy`/`import torch` now accumulates conflicting libomps in the same address space. - Set KMP_DUPLICATE_LIB_OK=TRUE at the top of ci.py on darwin, before any import that can transitively pull in numpy or torch. This is Intel's documented escape hatch; safe for our workload where numpy and torch are only used for golden reference math, not parallel OMP regions. - Document the full root cause, debugging steps, and explicit "what not to do" list in docs/macos-libomp-collision.md so future contributors don't re-investigate. Link it from docs/ci.md. - Rewrite the two remaining numpy-based goldens (a2a3/{aicpu,host}_build_graph/bgemm) in torch for style consistency with the rest of examples/. Note this does not avoid the libomp collision on its own -- `import torch` transitively imports numpy. Verified: `python ci.py` passes 32/32 sim tests (20 a2a3sim + 12 a5sim) on macOS without KMP_DUPLICATE_LIB_OK needing to be set manually.

gemini-code-assist

Code Review

This pull request introduces a workaround for a libomp collision issue on macOS that causes SIGABRT when both numpy and torch are loaded in the same process. The fix involves setting KMP_DUPLICATE_LIB_OK=TRUE at the top of ci.py before other imports. Detailed documentation explaining the root cause and mitigation has been added in docs/macos-libomp-collision.md and referenced in docs/ci.md. Additionally, several golden reference scripts were updated to use torch instead of numpy for input generation and computation. I have no feedback to provide as there were no review comments to evaluate.

gemini-code-assist bot reviewed Apr 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: ci.py crash on macOS from duplicate libomp load#520

Fix: ci.py crash on macOS from duplicate libomp load#520
ChaoWao wants to merge 1 commit intohw-native-sys:mainfrom
ChaoWao:fix/ci-py-crash-on-macos-due-to-duplicate-libomp

ChaoWao commented Apr 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChaoWao commented Apr 11, 2026

Summary

Changes

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant