AlphaS against latest master, including code generation and all processes generated#434
AlphaS against latest master, including code generation and all processes generated#434valassi merged 310 commits intomadgraph5:masterfrom
Conversation
|
I am making progress on integrating this, but there is work to be done. I would say the main problems were:
What I have done so far:
Amongst the things tha are not yet ok
|
Was: computeDependentCouplings -> calls dependentCouplings kernel -> calls dependent_coupling Then: computeDependentCouplings -> calls computeCouplings kernel -> calls G2COUP Now: (in MEK merged with ComputeMe) -> calls computeDependentCouplings kernel -> calls G2COUP
…200 registers instead of 170 upstream/master
… fails (fptype/FORTRANFPTYPE issue?)
…oat nobridge fgcheck test succeeds
…ter fising fptype != FORTRANFPTYPE
|
Ok I have now
Still t do amongst other things
In addition
|
…gc11 are the dependent couplings)
…c11 are the dependent couplings)
(NB: now the naming convention is consistent, BufferCouplings and MemoryAccessCouplings)
…ncapsulate ndcoup in BufferCouplings (still to do: gc10 and gc11 are now derefrenced in MEK, one can do better)
|
Some progress, I transformed BufferCouplings into a buffer that holds an arbvitrary number ndcoup of coupling arrays. It works so far. Now I need to change the access functions. |
…ts.h, remove temporary cuda 11.1 hacks
|
My idea here was to let the code generation do the work, but if you want to change this its also fine with me. |
…ray MEs - previous one was using always one value in an array ("trivial"), strange that it worked?!
./tput/teeThroughputX.sh -flt -hrd -makej -makeclean -eemumu -ggtt -ggttg -ggttgg -ggttggg Note that it takes ~30 minutes for each of the four ggttggg tests to build (no inlining). Without inlining, all other processes are quite fast: ls -ltr ee_mumu/lib/build.none_*_inl0_hrd* gg_tt/lib/build.none_*_inl0_hrd* gg_tt*g/lib/build.none_*_inl0_hrd* | egrep -v '(total|\./|\.build|_common|^$)' ee_mumu/lib/build.none_d_inl0_hrd0: -rwxr-xr-x. 1 avalassi zg 95296 Apr 28 08:29 libmg5amc_epem_mupmum_cpp.so* -rwxr-xr-x. 1 avalassi zg 1074160 Apr 28 08:29 libmg5amc_epem_mupmum_cuda.so* ee_mumu/lib/build.none_d_inl0_hrd1: -rwxr-xr-x. 1 avalassi zg 94488 Apr 28 08:29 libmg5amc_epem_mupmum_cpp.so* -rwxr-xr-x. 1 avalassi zg 1069192 Apr 28 08:29 libmg5amc_epem_mupmum_cuda.so* ee_mumu/lib/build.none_f_inl0_hrd0: -rwxr-xr-x. 1 avalassi zg 99712 Apr 28 08:29 libmg5amc_epem_mupmum_cpp.so* -rwxr-xr-x. 1 avalassi zg 1071848 Apr 28 08:29 libmg5amc_epem_mupmum_cuda.so* ee_mumu/lib/build.none_f_inl0_hrd1: -rwxr-xr-x. 1 avalassi zg 94752 Apr 28 08:29 libmg5amc_epem_mupmum_cpp.so* -rwxr-xr-x. 1 avalassi zg 1062704 Apr 28 08:29 libmg5amc_epem_mupmum_cuda.so* gg_tt/lib/build.none_d_inl0_hrd0: -rwxr-xr-x. 1 avalassi zg 103904 Apr 28 08:29 libmg5amc_gg_ttx_cpp.so* -rwxr-xr-x. 1 avalassi zg 1238000 Apr 28 08:29 libmg5amc_gg_ttx_cuda.so* gg_tt/lib/build.none_d_inl0_hrd1: -rwxr-xr-x. 1 avalassi zg 98840 Apr 28 08:29 libmg5amc_gg_ttx_cpp.so* -rwxr-xr-x. 1 avalassi zg 1212608 Apr 28 08:29 libmg5amc_gg_ttx_cuda.so* gg_tt/lib/build.none_f_inl0_hrd0: -rwxr-xr-x. 1 avalassi zg 104224 Apr 28 08:29 libmg5amc_gg_ttx_cpp.so* -rwxr-xr-x. 1 avalassi zg 1211112 Apr 28 08:29 libmg5amc_gg_ttx_cuda.so* gg_tt/lib/build.none_f_inl0_hrd1: -rwxr-xr-x. 1 avalassi zg 103152 Apr 28 08:30 libmg5amc_gg_ttx_cpp.so* -rwxr-xr-x. 1 avalassi zg 1197872 Apr 28 08:31 libmg5amc_gg_ttx_cuda.so* gg_ttg/lib/build.none_d_inl0_hrd0: -rwxr-xr-x. 1 avalassi zg 112672 Apr 28 08:31 libmg5amc_gg_ttxg_cpp.so* -rwxr-xr-x. 1 avalassi zg 1868784 Apr 28 08:31 libmg5amc_gg_ttxg_cuda.so* gg_ttg/lib/build.none_d_inl0_hrd1: -rwxr-xr-x. 1 avalassi zg 111704 Apr 28 08:31 libmg5amc_gg_ttxg_cpp.so* -rwxr-xr-x. 1 avalassi zg 1847488 Apr 28 08:31 libmg5amc_gg_ttxg_cuda.so* gg_ttg/lib/build.none_f_inl0_hrd0: -rwxr-xr-x. 1 avalassi zg 117088 Apr 28 08:32 libmg5amc_gg_ttxg_cpp.so* -rwxr-xr-x. 1 avalassi zg 1813224 Apr 28 08:33 libmg5amc_gg_ttxg_cuda.so* gg_ttg/lib/build.none_f_inl0_hrd1: -rwxr-xr-x. 1 avalassi zg 111920 Apr 28 08:34 libmg5amc_gg_ttxg_cpp.so* -rwxr-xr-x. 1 avalassi zg 1799984 Apr 28 08:35 libmg5amc_gg_ttxg_cuda.so* gg_ttgg/lib/build.none_d_inl0_hrd0: -rwxr-xr-x. 1 avalassi zg 154432 Apr 28 08:37 libmg5amc_gg_ttxgg_cpp.so* -rwxr-xr-x. 1 avalassi zg 4281328 Apr 28 08:38 libmg5amc_gg_ttxgg_cuda.so* gg_ttgg/lib/build.none_d_inl0_hrd1: -rwxr-xr-x. 1 avalassi zg 149368 Apr 28 08:39 libmg5amc_gg_ttxgg_cpp.so* -rwxr-xr-x. 1 avalassi zg 4247744 Apr 28 08:40 libmg5amc_gg_ttxgg_cuda.so* gg_ttgg/lib/build.none_f_inl0_hrd0: -rwxr-xr-x. 1 avalassi zg 154752 Apr 28 08:42 libmg5amc_gg_ttxgg_cpp.so* -rwxr-xr-x. 1 avalassi zg 4119272 Apr 28 08:42 libmg5amc_gg_ttxgg_cuda.so* gg_ttgg/lib/build.none_f_inl0_hrd1: -rwxr-xr-x. 1 avalassi zg 145488 Apr 28 08:44 libmg5amc_gg_ttxgg_cpp.so* -rwxr-xr-x. 1 avalassi zg 4106032 Apr 28 08:45 libmg5amc_gg_ttxgg_cuda.so* gg_ttggg/lib/build.none_d_inl0_hrd0: -rwxr-xr-x. 1 avalassi zg 768848 Apr 28 08:48 libmg5amc_gg_ttxggg_cpp.so* -rwxr-xr-x. 1 avalassi zg 15758320 Apr 28 09:16 libmg5amc_gg_ttxggg_cuda.so* gg_ttggg/lib/build.none_d_inl0_hrd1: -rwxr-xr-x. 1 avalassi zg 759696 Apr 28 09:20 libmg5amc_gg_ttxggg_cpp.so* -rwxr-xr-x. 1 avalassi zg 15769792 Apr 28 09:46 libmg5amc_gg_ttxggg_cuda.so* gg_ttggg/lib/build.none_f_inl0_hrd0: -rwxr-xr-x. 1 avalassi zg 707728 Apr 28 09:51 libmg5amc_gg_ttxggg_cpp.so* -rwxr-xr-x. 1 avalassi zg 14711528 Apr 28 10:18 libmg5amc_gg_ttxggg_cuda.so* gg_ttggg/lib/build.none_f_inl0_hrd1: -rwxr-xr-x. 1 avalassi zg 702568 Apr 28 10:23 libmg5amc_gg_ttxggg_cpp.so* -rwxr-xr-x. 1 avalassi zg 14718768 Apr 28 10:49 libmg5amc_gg_ttxggg_cuda.so*
./tput/teeThroughputX.sh -flt -hrd -makej -makeclean -eemumu -ggtt -ggttgg -inlonly Note that inline builds take longer - and are now slower in c++ than in cuda! (For non inlined builds cuda is much slower than c++) ls -ltr ee_mumu/lib/build.none_*_inl1_hrd* gg_tt/lib/build.none_*_inl1_hrd* gg_tt*g/lib/build.none_*_inl1_hrd* | egrep -v '(total|\./|\.build|_common|^$)' ee_mumu/lib/build.none_d_inl1_hrd0: -rwxr-xr-x. 1 avalassi zg 107104 Apr 28 11:02 libmg5amc_epem_mupmum_cpp.so* -rwxr-xr-x. 1 avalassi zg 1074064 Apr 28 11:02 libmg5amc_epem_mupmum_cuda.so* ee_mumu/lib/build.none_d_inl1_hrd1: -rwxr-xr-x. 1 avalassi zg 98056 Apr 28 11:02 libmg5amc_epem_mupmum_cpp.so* -rwxr-xr-x. 1 avalassi zg 1069064 Apr 28 11:02 libmg5amc_epem_mupmum_cuda.so* ee_mumu/lib/build.none_f_inl1_hrd0: -rwxr-xr-x. 1 avalassi zg 99224 Apr 28 11:02 libmg5amc_epem_mupmum_cpp.so* -rwxr-xr-x. 1 avalassi zg 1071720 Apr 28 11:02 libmg5amc_epem_mupmum_cuda.so* ee_mumu/lib/build.none_f_inl1_hrd1: -rwxr-xr-x. 1 avalassi zg 94224 Apr 28 11:02 libmg5amc_epem_mupmum_cpp.so* -rwxr-xr-x. 1 avalassi zg 1062608 Apr 28 11:02 libmg5amc_epem_mupmum_cuda.so* gg_tt/lib/build.none_d_inl1_hrd0: -rwxr-xr-x. 1 avalassi zg 111152 Apr 28 11:02 libmg5amc_gg_ttx_cpp.so* -rwxr-xr-x. 1 avalassi zg 1237904 Apr 28 11:02 libmg5amc_gg_ttx_cuda.so* gg_tt/lib/build.none_d_inl1_hrd1: -rwxr-xr-x. 1 avalassi zg 98152 Apr 28 11:02 libmg5amc_gg_ttx_cpp.so* -rwxr-xr-x. 1 avalassi zg 1212480 Apr 28 11:02 libmg5amc_gg_ttx_cuda.so* gg_tt/lib/build.none_f_inl1_hrd0: -rwxr-xr-x. 1 avalassi zg 111472 Apr 28 11:03 libmg5amc_gg_ttx_cpp.so* -rwxr-xr-x. 1 avalassi zg 1210984 Apr 28 11:03 libmg5amc_gg_ttx_cuda.so* gg_tt/lib/build.none_f_inl1_hrd1: -rwxr-xr-x. 1 avalassi zg 102464 Apr 28 11:04 libmg5amc_gg_ttx_cpp.so* -rwxr-xr-x. 1 avalassi zg 1197776 Apr 28 11:04 libmg5amc_gg_ttx_cuda.so* gg_ttgg/lib/build.none_d_inl1_hrd0: -rwxr-xr-x. 1 avalassi zg 4416400 Apr 28 11:06 libmg5amc_gg_ttxgg_cuda.so* -rwxr-xr-x. 1 avalassi zg 607136 Apr 28 11:09 libmg5amc_gg_ttxgg_cpp.so* gg_ttgg/lib/build.none_d_inl1_hrd1: -rwxr-xr-x. 1 avalassi zg 4378688 Apr 28 11:11 libmg5amc_gg_ttxgg_cuda.so* -rwxr-xr-x. 1 avalassi zg 598160 Apr 28 11:14 libmg5amc_gg_ttxgg_cpp.so* gg_ttgg/lib/build.none_f_inl1_hrd0: -rwxr-xr-x. 1 avalassi zg 4201064 Apr 28 11:16 libmg5amc_gg_ttxgg_cuda.so* -rwxr-xr-x. 1 avalassi zg 627936 Apr 28 11:19 libmg5amc_gg_ttxgg_cpp.so* gg_ttgg/lib/build.none_f_inl1_hrd1: -rwxr-xr-x. 1 avalassi zg 4191952 Apr 28 11:21 libmg5amc_gg_ttxgg_cuda.so* -rwxr-xr-x. 1 avalassi zg 622952 Apr 28 11:24 libmg5amc_gg_ttxgg_cpp.so*
STARTED AT Thu Apr 28 08:28:54 CEST 2022 ENDED(1) AT Thu Apr 28 11:02:01 CEST 2022 ENDED(2) AT Thu Apr 28 11:32:47 CEST 2022 ENDED(3) AT Thu Apr 28 11:37:09 CEST 2022 ENDED(4) AT Thu Apr 28 11:40:19 CEST 2022 ENDED(5) AT Thu Apr 28 11:43:25 CEST 2022
…EFORE THE ALPHAS PR The typical build times were as for alpha, 30 minutes for each of 4 ggttggg tests (but some builds were cached) STARTED AT Thu Apr 28 12:13:58 CEST 2022 ENDED(1) AT Thu Apr 28 13:47:08 CEST 2022 ENDED(2) AT Thu Apr 28 14:24:29 CEST 2022 ENDED(3) AT Thu Apr 28 14:28:54 CEST 2022 ENDED(4) AT Thu Apr 28 14:32:07 CEST 2022 ENDED(5) AT Thu Apr 28 14:35:20 CEST 2022
Revert "[alphas] rerun all tests with allTees.sh USING UPSTREAM/MASTER CODE BEFORE THE ALPHAS PR" This reverts commit 08349369308297560e19a97277049f315cbba078.
…e as in upstream/master)
…remove inl1 data, confusing and irrelevant) Introducing the alphas memory access does not seem to have degraded performance significantly, good
|
This is FINALLY completed
|
… does not build yet, see madgraph5#439 I will merge this anyway as standalone cudacpp for SM physics works fine (there is one exception, uudd fails, see madgraph5#440 - I will fix that a posteriori). Note also that alphas from madevent are still not integrated, and so ggtt.mad fails to build for instance, madgraph5#441. This will be the next big thing.
|
Note the performance difference between pre and post alphas: Example There is almost no difference in cpp (as expected?). CUDA on GPU seems around 10% slower consistently. But I guess we need to live with that... |
|
(Well, about performance: not sure at all why eemumu would be slower... I opened #442 to investigate further if someone has the courage). This is now complete. I am self merging. |
…CCESS for independent couplings and CD_ACCESS for dependent couplings
…gpu#434 codegen, fails "output standalone_cudacpp CODEGEN_cudacpp_ee_mumu" with error: AttributeError : 'PLUGIN_GPUFOHelasCallWriter' object has no attribute 'model'
Hi @roiser I am creating here a PR superseding your #428 (to address the running of alphas issue #373). The main differences will be