Skip to content

add error handling in fbridgecreate (fcheck.exe crashes if simd mode is not supported) #520

@valassi

Description

@valassi

fcheck.exe crashes if simd mode is not supported

if i try to run throughputX.sh removing the sanity checks, check.exe fails gently while fcheck.exe crashes

runExe /data/avalassi/gpu2021/madgraph4gpu/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/build.512z_d_inl0_hrd0/check.exe -p 64 256 1 OMP=
ERROR! The application is built for skylake-avx512 (AVX512VL) but the host does not support it
           652,423      cycles:u                  #    0.175 GHz                      (2.62%)
           172,400      instructions:u            #    0.26  insn per cycle           (24.62%)
       0.005887559 seconds time elapsed
=Symbols in CPPProcess.o= (~sse4:    0) (avx2: 1266) (512y:   60) (512z: 9903)
-------------------------------------------------------------------------
cmpExe /data/avalassi/gpu2021/madgraph4gpu/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/build.512z_d_inl0_hrd0/check.exe --common -p 2 64 2
cmpExe /data/avalassi/gpu2021/madgraph4gpu/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/build.512z_d_inl0_hrd0/fcheck.exe 2 64 2

Program received signal SIGILL: Illegal instruction.

Backtrace for this error:
#0  0x7fc518dac3ff in ???
#1  0x7fc519b20eb4 in ???
#2  0x7fc519b01e62 in ???
#3  0x7fc519b0c8a3 in ???
#4  0x40415e in ???
#5  0x40440a in ???
#6  0x7fc518d98554 in ???
#7  0x403adf in ???
#8  0xffffffffffffffff in ???
Avg ME (C++/C++)    = 
Avg ME (F77/C++)    = 
ERROR! Fortran calculation (F77/C++)  crashed

we should add gentle termination also in fcheck, and eventually in madevent (not yet clear how we do the bridging of different avxs, fo rthe moment madevent would crash in the same way!)

we should probably

  • add "if( !MatrixElementKernelHost::hostSupportsSIMD() )" in the Bridge constructor, and throw if this fails
  • catch an exception whenever we call the bridge constructor, eg in fbridge_create, and return 0 for bridge pointer in that case
  • check in driver.f and fcheck.f that fbridgecreate returned something >0, otherwise fail gently

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions