Fix segfault from constexpr_math.h and adapt testmisc.cc to higher tolerances when running through valgrind by valassi · Pull Request #908 · madgraph5/madgraph4gpu

valassi · 2024-07-12T12:35:52Z

This is a PR to

fix segfault from constexpr_math.h segfault in constexpr_sin_quad when running runTest.exe through valgrind #903
adapt testmisc.cc to higher tolerances when running through valgrind testmisc.cc tests require different tolerances under valgrind #906

This was motivated by the WIP on adding testst for channelid #896 within the WIP on master_june24 #882

Note: the latter is still blocked by another segfault #907

…n both CPU and GPU (prepare for madgraph5#896) - the C++ tests succeed but the CUDA tests segfaults madgraph5#903

…from release-1.11.0 to v1.14.0 to solve madgraph5#903, but the segfault remains - will revert

…ase-1.11.0 Revert "[gtest/june24] in CODEGEN cudacpp_test.mk, try to upgrade googletest from release-1.11.0 to v1.14.0 to solve madgraph5#903, but the segfault remains - will revert" This reverts commit 34cd623.

…cc build in CUDA while debugging madgraph5#903 With testmisc.cc, valgrind gives a confusing error ==2887713== Stack overflow in thread #1: can't grow stack to 0x1ffe801000 ==2887713== ==2887713== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==2887713== Access not within mapped region at address 0x1FFE801FF8 ==2887713== Stack overflow in thread #1: can't grow stack to 0x1ffe801000 ==2887713== at 0x449C06: mg5amcGpu::constexpr_sin_quad(long double, bool) (constexpr_math.h:156) ==2887713== If you believe this happened as a result of a stack ==2887713== overflow in your program's main thread (unlikely but ==2887713== possible), you can try to increase the size of the ==2887713== main thread stack using the --main-stacksize= flag. ==2887713== The main thread stack size used in this run was 8388608. ==2887713== ==2887713== HEAP SUMMARY: ==2887713== in use at exit: 21,309,363 bytes in 13,995 blocks ==2887713== total heap usage: 18,083 allocs, 4,088 frees, 51,971,780 bytes allocated ==2887713== ==2887713== LEAK SUMMARY: ==2887713== definitely lost: 0 bytes in 0 blocks ==2887713== indirectly lost: 0 bytes in 0 blocks ==2887713== possibly lost: 2,599,608 bytes in 825 blocks ==2887713== still reachable: 18,709,755 bytes in 13,170 blocks ==2887713== suppressed: 0 bytes in 0 blocks ==2887713== Rerun with --leak-check=full to see details of leaked memory ==2887713== ==2887713== For lists of detected and suppressed errors, rerun with: -s ==2887713== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Segmentation fault (core dumped) Without testmisc.cc instead [ RUN ] SIGMA_SM_GG_TTX_GPU2/MadgraphTest.CompareMomentaAndME/0 INFO: Opening reference file ../../test/ref/dump_CPUTest.Sigma_sm_gg_ttx.txt ==2889432== Invalid write of size 8 ==2889432== at 0x484E2DB: memmove (vg_replace_strmem.c:1385) ==2889432== by 0x41A6EA: double* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<double>(double const*, double const*, double*) (stl_algobase.h:431) ==2889432== by 0x41A49B: double* std::__copy_move_a2<false, double*, double*>(double*, double*, double*) (stl_algobase.h:494) ==2889432== by 0x41A1A5: double* std::__copy_move_a1<false, double*, double*>(double*, double*, double*) (stl_algobase.h:522) ==2889432== by 0x419F4D: double* std::__copy_move_a<false, __gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, double*>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, double*) (stl_algobase.h:529) ==2889432== by 0x419D0C: double* std::copy<__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, double*>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, __gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, double*) (stl_algobase.h:619) ==2889432== by 0x419950: mg5amcGpu::CommonRandomNumberKernel::generateRnarray() (CommonRandomNumberKernel.cc:34) ==2889432== by 0x44443D: CUDATest::prepareRandomNumbers(unsigned int) (runTest.cc:202) ==2889432== by 0x440D98: MadgraphTest_CompareMomentaAndME_Test::TestBody() (MadgraphTest.h:253) ==2889432== by 0x48790F: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2607) ==2889432== by 0x480EF8: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2643) ==2889432== by 0x459587: testing::Test::Run() (gtest.cc:2682) ==2889432== Address 0x2fc0f200 is not stack'd, malloc'd or (recently) free'd ==2889432== ==2889432== ==2889432== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==2889432== Access not within mapped region at address 0x2FC0F200 ==2889432== at 0x484E2DB: memmove (vg_replace_strmem.c:1385) ... Segmentation fault (core dumped)

…cc build while debugging madgraph5#903 also for C++ The test does not segfault without valgrind, but it does segfault in valgrind! (NB this all realted to debug builds, in C++ and in CUDA) And with testmisc.cc, valgrind gives a confusing error for C++ (cppnone here) as in CUDA: ==2893804== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==2893804== Access not within mapped region at address 0x1FFE801FF8 ==2893804== Stack overflow in thread #1: can't grow stack to 0x1ffe801000 ==2893804== at 0x431835: mg5amcCpu::constexpr_sin_quad(long double, bool) (in /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/runTest_cpp.exe) So I disable testmisc but now the C++ test (cppnone here) no longer segfaults...?!

…pp.exe by adding -no-pie madgraph5#904

…ng OMP only for clang16 madgraph5#904

…6 builds madgraph5#904 (disabling OMP only for clang16; add -no-pie for fcheck_cpp.exe)

…ng16 builds madgraph5#904

…pp.exe by adding -no-pie madgraph5#904

…ng OMP only for clang16 madgraph5#904

…6 builds madgraph5#904 (disabling OMP only for clang16; add -no-pie for fcheck_cpp.exe)

Revert "[gtest/june24] in gg_tt.mad cudacpp.mk, TEMPORARELY disable testmisc.cc build while debugging madgraph5#903 also for C++" This reverts commit 944caab. Will now test with clang16 (after recent fixes) and valgrind (after upgrading to 3.23)

…ster for easier merging git checkout upstream/master $(git ls-tree --name-only HEAD */CODEGEN*txt)

…ng OMP only for clang17 madgraph5#904

…7 builds madgraph5#904 (disable OMP also for clang17)

…ng17 builds madgraph5#904

…ng OMP only for clang17 madgraph5#904

…7 builds madgraph5#904 (disable OMP also for clang17)

…ster for easier merging git checkout upstream/master $(git ls-tree --name-only HEAD */CODEGEN*txt)

…ph5#904: remove link-time -no-pie, add compiler-time -fPIC to fortran

…5#904: remove link-time -no-pie, add compiler-time -fPIC to fortran

…adgraph5#904, adding -fPIC to fortran compilation

…ODEGEN logs from the latest upstream/master for easier merging git checkout upstream/master $(git ls-tree --name-only HEAD */CODEGEN*txt)

…g constexpr_sin: now valgrind on c++ runTest succeds again?! However cuda still fails (even without valgrind) madgraph5#903

… now valgrind runTest_cpp.exe will fail Revert "[gtest/june24] in gg_tt.mad testmisc.cc, comment out the section using constexpr_sin: now valgrind on c++ runTest succeds again?!" This reverts commit 975f7aacb8661807a329ec1f51b2d7d8dba45167.

…ph5#904: remove link-time -no-pie, add compiler-time -fPIC to fortran

…5#904: remove link-time -no-pie, add compiler-time -fPIC to fortran

…lerances when running on valgrind madgraph5#906 Also allow tan(x)=-inf if ctan(x)=+inf and viceversa when running on valgrind madgraph5#906

…ng is fixed)

…h.h madgraph5#903 and test_misc.cc/valgrind.h madgraph5#906 Add valgrind.h for all processes for d in $(git ls-tree --name-only HEAD */SubProcesses); do git add $d/valgrind.h $d/*/valgrind.h; done

…ster for easier merging git checkout upstream/master $(git ls-tree --name-only HEAD */CODEGEN*txt)

valassi · 2024-07-12T12:37:03Z

Hi @oliviermattelaer can you please review?

Hopefully not controversial.

Note: this includes also both PR #900 on omp and PR #905 on clang. So it might be better to merge those two independently first

Thanks

… (I had forgotten it enabled)

…math.h with debugging mode switched off for f in $(git ls-tree --name-only HEAD */src/constexpr_math.h); do \cp CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/constexpr_math.h $f; done

…tempts to duplicate tests using the current infrastructure The code is now identical to the current gtest branch for PR madgraph5#908. Will instead solve madgraph5#907 by using a much simpler test infrastructure with fewer template levels. Revert "[gtest2/june24] in gg_tt.mad runTest.cc, try a different way to duplicate the tests, it still segfaults - will revert" This reverts commit 24e00a2. Revert "[gtest2/june24] in gg_tt.mad runTest.cc, add debug printout for test ctor/dtor" This reverts commit 0700a85. Revert "[gtest2/june24] in gg_tt.mad runTest.cc, temporarely add back the duplicate test" This reverts commit 0cce7fb.

oliviermattelaer

Ok this sounds good.

Just like I said in #905, I'm confused why you forbid any activation the openmp mode.
Maybe the best is to "remove that particular" commit and then merge the rest.
Or wait that #905 is accepted before merging this... (or any other solution)

I put this as approve anyway,

Olivier

…h5#900 and submod madgraph5#897) into clang

…r if OpenMP builds are attempted on clang16/17 (as discussed with Olivier in madgraph5#905)

…s from the latest upstream/master for easier merging git checkout upstream/master $(git ls-tree --name-only HEAD */CODEGEN*txt)

…aster with OMP madgraph5#900 and submod madgraph5#897) into gtest Fix conflicts in epochX/cudacpp/gg_tt.mad/CODEGEN_mad_gg_tt_log.txt git checkout clang gg_tt.mad/CODEGEN_mad_gg_tt_log.txt Note: MG5AMC has been updated including mg5amcnlo#107

…s from the latest upstream/master for easier merging git checkout upstream/master $(git ls-tree --name-only HEAD */CODEGEN*txt)

valassi · 2024-07-16T12:21:16Z

Ok this sounds good.

Just like I said in #905, I'm confused why you forbid any activation the openmp mode. Maybe the best is to "remove that particular" commit and then merge the rest. Or wait that #905 is accepted before merging this... (or any other solution)

I put this as approve anyway,

Olivier

Thanks Olivier :-)

The clang #905 is now modified as agreed, and merged.

I have updated this, I will merge it when the CI has been executed

valassi · 2024-07-16T12:34:13Z

The CI completed with
163 successful and 6 failing checks
This is as expected

Merging now

…aster with clang madgraph5#905, OMP madgraph5#900 and submod madgraph5#897) into gtest Fix conflicts in epochX/cudacpp/gg_tt.mad/CODEGEN_mad_gg_tt_log.txt git checkout clang gg_tt.mad/CODEGEN_mad_gg_tt_log.txt Note: MG5AMC has been updated including mg5amcnlo#107

…ion, removing the attempts to add two tests madgraph5#896 My last commit was showing the segfault issue madgraph5#907 solved in upcoming PR madgraph5#909 (and bits of madgraph5#908). I will cherry pick the CODEGEN from madgraph5#909 (and madgraph5#908) first and try again. git checkout 3eb4c29 gg_tt.mad/SubProcesses/runTest.cc

…ng PR madgraph5#905, constexpr_math.h PR madgraph5#908 and runTest/cudaDeviceReset PR madgraph5#909 Add valgrind.h and its symlink in the repo for gg_tt.mad The new runTest.cc template now has a (commented out) proof of concept for including two tests (with/without multichannel) madgraph5#896, I will resume from there After building bldall, the following succeeds for bck in none sse4 avx2 512y 512z cuda; do echo $bck; ./build.${bck}_d_inl0_hrd0/runTest_*.exe; done This instead is crashing (again?) for some AVX values for bck in none sse4 avx2 512y 512z cuda; do echo $bck; valgrind ./build.${bck}_d_inl0_hrd0/runTest_*.exe; done On closer inspection, this is because valgrind does not support AVX512, so this is ok

valassi added 30 commits July 11, 2024 13:31

[gtest/june24] in gg_tt.mad runTest.cc, include two identical tests o…

8f2c16b

…n both CPU and GPU (prepare for madgraph5#896) - the C++ tests succeed but the CUDA tests segfaults madgraph5#903

[gtest/june24] in CODEGEN cudacpp_test.mk, try to upgrade googletest …

34cd623

…from release-1.11.0 to v1.14.0 to solve madgraph5#903, but the segfault remains - will revert

[clang/june24] in gg_tt.mad cudacpp.mk, fix clang16 build of fcheck_c…

5cdb95a

…pp.exe by adding -no-pie madgraph5#904

[clang/june24] in gg_tt.mad cudacpp.mk, fix clang16 builds by disabli…

a7efcb7

…ng OMP only for clang16 madgraph5#904

[clang/june24] in CODEGEN (backport gg_tt.mad) cudacpp.mk, fix clang1…

8a62982

…6 builds madgraph5#904 (disabling OMP only for clang16; add -no-pie for fcheck_cpp.exe)

[clang/june24] regenerate all processes with cudacpp.mk fixes for cla…

403180d

…ng16 builds madgraph5#904

[clang/june24] in gg_tt.mad cudacpp.mk, fix clang16 build of fcheck_c…

d7a5889

…pp.exe by adding -no-pie madgraph5#904

[clang/june24] in gg_tt.mad cudacpp.mk, fix clang16 builds by disabli…

82202d6

…ng OMP only for clang16 madgraph5#904

[clang/june24] in CODEGEN (backport gg_tt.mad) cudacpp.mk, fix clang1…

74ceaec

…6 builds madgraph5#904 (disabling OMP only for clang16; add -no-pie for fcheck_cpp.exe)

[clang/june24] in tools/compilers add clang17 wrappers for cvmfs

197c0fb

[clang/june24] move again to CODEGEN logs from the latest upstream/ma…

9ab418a

…ster for easier merging git checkout upstream/master $(git ls-tree --name-only HEAD */CODEGEN*txt)

[clang/june24] in tools/compilers add clang17 wrappers for cvmfs

acdac67

[clang/june24] in gg_tt.mad cudacpp.mk, fix clang17 builds by disabli…

ac2602f

…ng OMP only for clang17 madgraph5#904

[clang/june24] in CODEGEN (backport gg_tt.mad) cudacpp.mk, fix clang1…

7f112f8

…7 builds madgraph5#904 (disable OMP also for clang17)

[clang/june24] regenerate all processes with cudacpp.mk fixes for cla…

5c4c80f

…ng17 builds madgraph5#904

[clang/june24] in gg_tt.mad cudacpp.mk, fix clang17 builds by disabli…

c28a7bf

…ng OMP only for clang17 madgraph5#904

[clang/june24] in CODEGEN (backport gg_tt.mad) cudacpp.mk, fix clang1…

c95c43d

…7 builds madgraph5#904 (disable OMP also for clang17)

[clang/june24] move again to CODEGEN logs from the latest upstream/ma…

9ef9e8e

…ster for easier merging git checkout upstream/master $(git ls-tree --name-only HEAD */CODEGEN*txt)

[clang/june24] in gg_tt.mad cudacpp.mk, improve clang16/17 fix madgra…

80023f6

…ph5#904: remove link-time -no-pie, add compiler-time -fPIC to fortran

[clang/june24] in CODEGEN cudacpp.mk, improve clang16/17 fix madgraph…

0d9d036

…5#904: remove link-time -no-pie, add compiler-time -fPIC to fortran

[clang/june24] regenerate all processes with cudacpp.mk new fixes for m…

c673f21

…adgraph5#904, adding -fPIC to fortran compilation

[clang/june24] ** COMPLETE CLANG (clang16/clang17) ** move again to C…

0a8d8b3

…ODEGEN logs from the latest upstream/master for easier merging git checkout upstream/master $(git ls-tree --name-only HEAD */CODEGEN*txt)

[gtest/june24] in gg_tt.mad testmisc.cc, comment out the section usin…

66d6de9

…g constexpr_sin: now valgrind on c++ runTest succeds again?! However cuda still fails (even without valgrind) madgraph5#903

[clang/june24] in gg_tt.mad cudacpp.mk, improve clang16/17 fix madgra…

70aa0f5

…ph5#904: remove link-time -no-pie, add compiler-time -fPIC to fortran

[clang/june24] in CODEGEN cudacpp.mk, improve clang16/17 fix madgraph…

ee1118b

…5#904: remove link-time -no-pie, add compiler-time -fPIC to fortran

valassi added 4 commits July 12, 2024 14:18

[gtest/june24] in CODEGEN (from gg_tt.mad) testmisc.cc, use higher to…

5b523c8

…lerances when running on valgrind madgraph5#906 Also allow tan(x)=-inf if ctan(x)=+inf and viceversa when running on valgrind madgraph5#906

[gtest/june24] regenerate gg_tt.mad, check all ok (now clang formatti…

8c3e32e

…ng is fixed)

[gtest/june24] regenerate all processes, with fixes for constexpr_mat…

f09edc1

…h.h madgraph5#903 and test_misc.cc/valgrind.h madgraph5#906 Add valgrind.h for all processes for d in $(git ls-tree --name-only HEAD */SubProcesses); do git add $d/valgrind.h $d/*/valgrind.h; done

[gtest/june24] move again to CODEGEN logs from the latest upstream/ma…

2d06c03

…ster for easier merging git checkout upstream/master $(git ls-tree --name-only HEAD */CODEGEN*txt)

valassi requested a review from oliviermattelaer July 12, 2024 12:35

valassi self-assigned this Jul 12, 2024

valassi added 2 commits July 12, 2024 14:41

[gtest/june24] in CODEGEN constexpr_math.h, switch off debugging mode…

07275f5

… (I had forgotten it enabled)

[gtest/june24] ** COMPLETE GTEST ** in all processes, copy constexpr_…

90947eb

…math.h with debugging mode switched off for f in $(git ls-tree --name-only HEAD */src/constexpr_math.h); do \cp CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/madgraph/iolibs/template_files/gpu/constexpr_math.h $f; done

This was linked to issues Jul 12, 2024

segfault in constexpr_sin_quad when running runTest.exe through valgrind #903

Closed

testmisc.cc tests require different tolerances under valgrind #906

Closed

This was referenced Jul 12, 2024

testmisc.cc tests require different tolerances under valgrind #906

Closed

segfault in constexpr_sin_quad when running runTest.exe through valgrind #903

Closed

valassi mentioned this pull request Jul 12, 2024

Fix runTest segfault (remove cudaDeviceReset) and simplify googletest template usage #909

Merged

oliviermattelaer approved these changes Jul 16, 2024

View reviewed changes

oliviermattelaer mentioned this pull request Jul 16, 2024

Fix clang16 and clang17 builds #905

Merged

valassi added 7 commits July 16, 2024 13:44

Merge remote-tracking branch 'upstream/master' (including OMP madgrap…

fb7bd76

…h5#900 and submod madgraph5#897) into clang

[clang/june24] in gg_tt.mad and CODEGEN cudacpp.mk, fail with an erro…

284df28

…r if OpenMP builds are attempted on clang16/17 (as discussed with Olivier in madgraph5#905)

[clang/june24] regenerate all processes

9629535

[clang/june24] ** COMPLETE CLANG (again) ** move again to CODEGEN log…

c05ffdd

…s from the latest upstream/master for easier merging git checkout upstream/master $(git ls-tree --name-only HEAD */CODEGEN*txt)

[gtest/june24] regenerate all processes, check all is ok no changes

41485fd

[gtest/june24] ** COMPLETE GTEST (again) ** move again to CODEGEN log…

4703af2

…s from the latest upstream/master for easier merging git checkout upstream/master $(git ls-tree --name-only HEAD */CODEGEN*txt)

valassi merged commit 328581c into madgraph5:master Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix segfault from constexpr_math.h and adapt testmisc.cc to higher tolerances when running through valgrind#908

Fix segfault from constexpr_math.h and adapt testmisc.cc to higher tolerances when running through valgrind#908
valassi merged 61 commits intomadgraph5:masterfrom
valassi:gtest

valassi commented Jul 12, 2024

Uh oh!

valassi commented Jul 12, 2024 •

edited

Loading

Uh oh!

oliviermattelaer left a comment

Uh oh!

valassi commented Jul 16, 2024

Uh oh!

valassi commented Jul 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

valassi commented Jul 12, 2024

Uh oh!

valassi commented Jul 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oliviermattelaer left a comment

Choose a reason for hiding this comment

Uh oh!

valassi commented Jul 16, 2024

Uh oh!

valassi commented Jul 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

valassi commented Jul 12, 2024 •

edited

Loading