Merge of master into master_june24 and channelid fixes/reimplementation#882
Merge of master into master_june24 and channelid fixes/reimplementation#882valassi merged 778 commits intomadgraph5:master_june24from
Conversation
|
I am trying to fix issues in MG5AMC. Will do a force push and file an issue |
|
I have tried to upgrade MG5AMC from the current eef200f94 to gpucpp_june24, but this fails codegen #886. I have reverted. I will instead create a branch where I merge gpucpp on top of the eef200f94 which is currently in master_june24. |
|
I have fixed a minor typo in unit_v for MAC #883. Marking it as fixed. Now there are only 45 CI errors instead of 49 |
|
I have fixed another minor issue #884 failing tghe builds for FPTYPE=m. There was one line forgotten from a previous implementation, it should have been removed for FPTYPE=m and was not. Now down from 45 to 39 errors, all related to #885 crashes in the new CI tests I think. The old CI tests are now all succeeding. |
|
I investigated #885 and found that the crash only happens when setting VECSIZE_USED different from VECSIZE_MEMMAX. In the CI in my initial tests VECSIZE_MEMMAX was 16384 and VECSIZE_USED was 32, so this crashed. Looking more into that, I realised that I was not sure what parameters I should use for NB_WARP and WARP_SIZE. This is discussed in #887. I gave it a try to use NB_WARP=512 with WARP_SIZE=32 ie VECSIZE_MEMAMX=16384. With VECSIZE_USED=32, this still crashes in #885. But in addition I also get a Fortran runtime error in symconf #888. |
|
I added a workaround (NOT a fix) for crash #885 just to allow the CI tests to proceed further. Essentially I put down NB_WARP=1 and WARP_SIZE=32 so that VECSIZE_MEMAMX=32 is the same as VECSIZE_USED=32. This avoids the crash (but avoids testing anything interesting in the new warp infrastructure, making it pointless). A proper fix for #885 (and for #888) is needed. Apart from other 'expected' failures, there is a xsec mismatch for ggttggg #889. This can be fixed by increasing tolerance There are ten failures overall. The other 9 are the usual #826 and #872 (pensing in master) and #856 (fixed in master, to be merged here). |
1 similar comment
|
Hi @oliviermattelaer as you see I made some progress here, but I am putting this work on hold. I need some answers on #887 (and it is possible that without VECSIZE_USED I go nowhere and I need to wait for that). (Or at least: probably I will continue merging bits of master into master_june24 so that we avoid a complete divergence, but I would argue against merging back master_june24 into master until many of these issues are fixed... there are just too many things that seem to not have been tested) |
…'make cleanavxs' target as suggested by Olivier
Improvements to complete PRs 979 and 980
PR where I only add CI but not trying to fix them
implement new CI and fixed the one related to the compilation issue.
…ne24) to 4ef15cab1 (current valassi_gpucpp_june24 including merge of mg5amcnlo#132 with Source makefile changes)
…ier merging git checkout upstream/master $(git ls-tree --name-only upstream/master */CODEGEN*txt)
… merging (except gg_tt.mad, to inspect it) git checkout upstream/master $(git ls-tree --name-only upstream/master */bin/internal/banner.py | grep -v ^gg_tt.mad/)
…, madgraph5#980, madgraph5#984 patches for the new CI and Source/makefile) into june24 Fix conflicts: - MG5aMC/mg5amcnlo (keep the current june24 version 4ef15cab1 i.e. current valassi_gpucpp_june24) - epochX/cudacpp/gg_tt.mad/bin/internal/banner.py (keep a debug printout)
…l commit message later (regenerating the patch changes nothing)
…g changes: I just want to mark that Source/makefile is no longer there) The only files that still need to be patched are - 2 in patch.common: Source/genps.inc, SubProcesses/makefile - 2 in patch.P1: driver.f, matrix1.f ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/genps.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad
|
Hi @oliviermattelaer I have just merged the latest master Including fixci and the Source/makefile stuff) into my june24 here. I am running some manual tests tonight. Then tomorrow morning I think we are good to go if those tests are ok and the CI is also ok. I would close instead your #981 which is essentialy a duplicate (and may be missing some pieces) |
STARTED AT Sun Sep 1 11:07:02 PM CEST 2024 ./tput/teeThroughputX.sh -mix -hrd -makej -eemumu -ggtt -ggttg -ggttgg -gqttq -ggttggg -makeclean ENDED(1) AT Sun Sep 1 11:30:56 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -flt -hrd -makej -eemumu -ggtt -ggttgg -inlonly -makeclean ENDED(2) AT Sun Sep 1 11:39:10 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -makej -eemumu -ggtt -ggttg -gqttq -ggttgg -ggttggg -flt -bridge -makeclean ENDED(3) AT Sun Sep 1 11:48:10 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -rmbhst ENDED(4) AT Sun Sep 1 11:50:55 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -curhst ENDED(5) AT Sun Sep 1 11:53:39 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -eemumu -ggtt -ggttgg -flt -common ENDED(6) AT Sun Sep 1 11:56:27 PM CEST 2024 [Status=0] ./tput/teeThroughputX.sh -mix -hrd -makej -susyggtt -susyggt1t1 -smeftggtttt -heftggbb -makeclean ENDED(7) AT Mon Sep 2 12:05:56 AM CEST 2024 [Status=0]
…0 on june24 branch - everything ok STARTED AT Mon Sep 2 06:58:36 AM CEST 2024 (SM tests) ENDED(1) AT Mon Sep 2 11:07:02 AM CEST 2024 [Status=0] (BSM tests) ENDED(1) AT Mon Sep 2 11:17:21 AM CEST 2024 [Status=0] 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_eemumu_mad/log_eemumu_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttggg_mad/log_ggttggg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttgg_mad/log_ggttgg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggttg_mad/log_ggttg_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_ggtt_mad/log_ggtt_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_gqttq_mad/log_gqttq_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_d_inl0_hrd0.txt 1 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_heftggbb_mad/log_heftggbb_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_smeftggtttt_mad/log_smeftggtttt_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggt1t1_mad/log_susyggt1t1_mad_m_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_d_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_f_inl0_hrd0.txt 24 /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/tmad/logs_susyggtt_mad/log_susyggtt_mad_m_inl0_hrd0.txt
|
Hi @oliviermattelaer I completed all my tests. I am good to go. As discussed privately, I would suggest the following:
Then you will be able to work on 360, while I will focus on goodhel Should I go ahead? |
|
Thanks @oliviermattelaer for your comments mg5amcnlo/mg5amcnlo#121 (comment) I put this in WIP again. I need to check if warp_used is used ocrrectly in dsample.f. This is #983 |
|
I checked #983 and I think that there is nothing else to do for nb_warp_used in dsample.f. But can you cross check please? Thanks Andrea |
…c9350 (latest gpucpp, including the merge of the former via mg5amcnlo#121) NB: the contents of 4ef15cab1 and a696c9350 are identical - but a696c9350 is pointing again to the HEAD of gpucpp
|
Ok here is the update
This is DONE. I have also upgraded the mg5amc link in this june24 branch to point to that gpucpp. The CI is as expected, 3 failing tests.
This I will do now. MERGING!
This I have done, closed.
This will be the next step. SO FINALLY MERGING! |
…annelid fixes/reimplementation madgraph5#882 and madgraph5#985) into runcard Fix conflicts (use ~runcard version): epochX/cudacpp/CODEGEN/PLUGIN/CUDACPP_SA_OUTPUT/MG5aMC_patches/PROD/patch.common







Hi @oliviermattelaer this is a WIP PR to start working towards the resync of master and master_june24. From what I understand this is one of the things you want to push with high priority.
(PS IMPORTANT a summary for review is in #882 (comment) below)
This is constructed as a merge into master_june24.
That is to say, I start from what you and Stefan had in master_june24 (as a result of Stefan's channelid PR #830, related to the warp issue #765), and I start porting a few of the master stuff, rather than going the opposite way. This allows me to go in steps with things I know (the various steps in master).
For the moment, here I am just merging the latest master CI (with tmad tests) into master_june24. Since the CI is enabled also for master_june24, I expect that the new tests should run, and the results may be interesting.
Speaking of which, @roiser @oliviermattelaer, how did you test the code in master_june24?
Thanks,
Andrea
PS For the context: master_june24 mainly differs from master because of the addition of Stefan's channelid MR #830 which is connected to Olivier's warp work in #765