Skip to content

numpy.dot crashes when openBLAS is built against an older TARGET #2893

@zlscherr

Description

@zlscherr

Hi All,

I'm having issues using homebrew's bottled openBLAS library with numpy. When I execute

python3 -c "import numpy as np; x=np.random.randn(1000000,2); y=np.random.randn(2,2); x.dot(y)"`

then python crashes with a seg fault.

My initial suspicion is that since homebrew doesn't set TARGET but does set DYNAMIC_ARCH=1, that it was probably bottled on a machine with a newer cpu which is why it crashed on my computer.

If I build openBLAS from source without setting a TARGET variable then the above code runs correctly. However, if I build from source and set an older TARGET like TARGET=PRESCOTT and set DYNAMIC_ARCH=1 then I still get the same crash.

If I export OPENBLAS_VERBOSE=2 then openBLAS reports I have a HASWELL core. If I set export OPENBLAS_CORETYPE=Prescott then the above code doesn't crash, regardless of the TARGET so maybe this is somehow related to HASWELL specifically.

I've included some of the stack trace in case it is helpful:

Process: Python [75262]
Path: /usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/Resources/Python.app/Contents/MacOS/Python
Identifier: Python
Version: 3.8.6 (3.8.6)
Code Type: X86-64 (Native)
Parent Process: zsh [52413]
Responsible: iTerm2 [52408]
User ID: 501

Date/Time: 2020-10-13 12:24:06.518 -0400
OS Version: Mac OS X 10.15.7 (19H2)
Report Version: 12
Bridge OS Version: 4.6 (17P6610)
Anonymous UUID: D73D8776-8CCE-EF0E-5C9F-EC31EA1FE681

Sleep/Wake UUID: 501BF3EB-523A-47AC-89E7-7460810CD543

Time Awake Since Boot: 40000 seconds
Time Since Wake: 9700 seconds

System Integrity Protection: disabled

Crashed Thread: 4

Exception Type: EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000111d08008
Exception Note: EXC_CORPSE_NOTIFY

Termination Signal: Bus error: 10
Termination Reason: Namespace SIGNAL, Code 0xa
Terminating Process: exc handler [75262]

VM Regions Near 0x111d08008:
VM_ALLOCATE 000000010fc8c000-0000000111c8c000 [ 32.0M] rw-/rwx SM=PRV
--> __TEXT 0000000111c8c000-0000000111d0c000 [ 512K] r-x/rwx SM=COW /usr/local/Cellar/numpy/1.19.2/lib/python3.8/site-packages/numpy/random/_generator.cpython-38-darwin.so
__DATA_CONST 0000000111d0c000-0000000111d10000 [ 16K] r--/rwx SM=COW /usr/local/Cellar/numpy/1.19.2/lib/python3.8/site-packages/numpy/random/_generator.cpython-38-darwin.so

Thread 0:: Dispatch queue: com.apple.main-thread
0 libopenblas.0.dylib 0x0000000103ca6ad4 dgemm_oncopy_HASWELL + 1078
1 libopenblas.0.dylib 0x00000001011c4113 inner_thread + 1195
2 libopenblas.0.dylib 0x0000000101344eb1 exec_blas._omp_fn.0 + 727
3 libgomp.1.dylib 0x0000000104f40bd2 GOMP_parallel + 66
4 libopenblas.0.dylib 0x000000010134508a exec_blas + 76
5 libopenblas.0.dylib 0x00000001011c4891 gemm_driver.isra.0 + 754
6 libopenblas.0.dylib 0x00000001011c498a dgemm_thread_nn + 208
7 libopenblas.0.dylib 0x00000001010b9889 cblas_dgemm + 901
8 _multiarray_umath.cpython-38-darwin.so 0x0000000100ee1fcc cblas_matrixproduct + 4252
9 _multiarray_umath.cpython-38-darwin.so 0x0000000100d0a36c PyArray_MatrixProduct2 + 236
10 _multiarray_umath.cpython-38-darwin.so 0x0000000100d06051 array_dot + 177
11 org.python.python 0x00000001006f6097 method_vectorcall_VARARGS_KEYWORDS + 309
12 org.python.python 0x000000010078f67d call_function + 346
13 org.python.python 0x000000010078c23b _PyEval_EvalFrameDefault + 29833
14 org.python.python 0x0000000100790197 _PyEval_EvalCodeWithName + 1947
15 org.python.python 0x0000000100784d0f PyEval_EvalCode + 51
16 org.python.python 0x00000001007bdf9d run_eval_code_obj + 102
17 org.python.python 0x00000001007bd3ec run_mod + 82
18 org.python.python 0x00000001007bc3ec PyRun_StringFlags + 120
19 org.python.python 0x00000001007bc337 PyRun_SimpleStringFlags + 69
20 org.python.python 0x00000001007d2d91 Py_RunMain + 424
21 org.python.python 0x00000001007d367e pymain_main + 306
22 org.python.python 0x00000001007d36cc Py_BytesMain + 42
23 libdyld.dylib 0x00007fff6b7f4cc9 start + 1

Thread 1:
0 libopenblas.0.dylib 0x0000000103ca6ae0 dgemm_oncopy_HASWELL + 1090
1 libopenblas.0.dylib 0x00000001011c4113 inner_thread + 1195
2 libopenblas.0.dylib 0x0000000101344eb1 exec_blas._omp_fn.0 + 727
3 libgomp.1.dylib 0x0000000104f47fb1 gomp_thread_start + 369

Thread 2:
0 libopenblas.0.dylib 0x0000000103ca2a55 .L13_39 + 165

Thread 3:
0 libopenblas.0.dylib 0x0000000103ca2a20 .L13_39 + 112

Thread 4 Crashed:
0 libopenblas.0.dylib 0x0000000103ca6ad4 dgemm_oncopy_HASWELL + 1078
1 libopenblas.0.dylib 0x00000001011c4113 inner_thread + 1195
2 libopenblas.0.dylib 0x0000000101344eb1 exec_blas._omp_fn.0 + 727
3 libgomp.1.dylib 0x0000000104f47fb1 gomp_thread_start + 369

Thread 5:
0 libopenblas.0.dylib 0x0000000103ca0aa0 .L12_39 + 112

Thread 6:
0 libopenblas.0.dylib 0x0000000103ca6ae0 dgemm_oncopy_HASWELL + 1090
1 libopenblas.0.dylib 0x00000001011c4113 inner_thread + 1195
2 libopenblas.0.dylib 0x0000000101344eb1 exec_blas._omp_fn.0 + 727
3 libgomp.1.dylib 0x0000000104f47fb1 gomp_thread_start + 369

Thread 7:
0 libopenblas.0.dylib 0x0000000103ca6ad4 dgemm_oncopy_HASWELL + 1078
1 libopenblas.0.dylib 0x00000001011c4113 inner_thread + 1195
2 libopenblas.0.dylib 0x0000000101344eb1 exec_blas._omp_fn.0 + 727
3 libgomp.1.dylib 0x0000000104f47fb1 gomp_thread_start + 369

Thread 8:
0 libopenblas.0.dylib 0x0000000103ca6b16 dgemm_oncopy_HASWELL + 1144
1 libopenblas.0.dylib 0x00000001011c4113 inner_thread + 1195
2 libopenblas.0.dylib 0x0000000101344eb1 exec_blas._omp_fn.0 + 727
3 libgomp.1.dylib 0x0000000104f47fb1 gomp_thread_start + 369

Thread 9:
0 libopenblas.0.dylib 0x0000000103ca0a6f .L12_39 + 63

Thread 10:
0 libopenblas.0.dylib 0x0000000103ca2a20 .L13_39 + 112

Thread 11:
0 libopenblas.0.dylib 0x0000000103ca6b45 dgemm_oncopy_HASWELL + 1191
1 libopenblas.0.dylib 0x00000001011c4113 inner_thread + 1195
2 libopenblas.0.dylib 0x0000000101344eb1 exec_blas._omp_fn.0 + 727
3 libgomp.1.dylib 0x0000000104f47fb1 gomp_thread_start + 369

Thread 4 crashed with X86 Thread State (64-bit):
rax: 0x00007fb393476bd0 rbx: 0x00007fb393476ba0 rcx: 0x00007fb393476bb0 rdx: 0x00007fb393476b80
rdi: 0x0000000000000002 rsi: 0x00007fb393476b80 rbp: 0x00007fb393476b90 rsp: 0x0000700005fdbd68
r8: 0x0000000111d08000 r9: 0x00007fb393476bf0 r10: 0x00007fb393476be0 r11: 0x00007fb393476bc0
r12: 0x0000000000000000 r13: 0x0000000000000010 r14: 0x0000000111d08040 r15: 0x00007fb393476b90
rip: 0x0000000103ca6ad4 rfl: 0x0000000000010202 cr2: 0x0000000111d08008

Logical CPU: 9
Error Code: 0x00000007 (invalid protections for user data write)
Trap Number: 14

Thanks, please let me know if I can offer any more information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions