Currently, the software layer is built on a host machine for that host machine (i.e. --march=native). However, on client side, we are potentially using those same binaries in a variety of architectures. For instance, we have an installation for haswell, but in archdetect (#187) any AVX2-only system will be identified as haswell and use those binaries. This can work sometimes (e.g. broadwell and haswell have 99% the same instruction set), but for others it will be problematic.
Since we will (probably) never have the resources to build binaries for all x86_64 CPU micro-architectures, I propose that we move to more generic builds and label those installations accordingly to avoid any confusion (i.e. not with an existing arch name, unless it will only be used in that CPU arch).
As discussed in Slack, a good starting point could be the standard CPU feature levels supported in gcc and clang:
We could replace the existing installations with the following:
- change the build for Haswells to use
-march=x86-64-v3 and distribute those binaries in x86_64/v3
- change the build for Skylakes to use
-march=x86-64-v4 and distribute those binaries in x86_64/v4
Technically, we could even use x86_64/v3 for AMD zen, zen2 and zen3. But if these generic definitions do not work well for certain CPU archs, we can always add new ones custom-made or have installations that specifically target a single CPU arch. For instance, based on the current situation, let's say that we wanted to keep zen3 separate to be able to further optimize that system
"x86_64/v3" "GenuineIntel" "avx2 fma" # Intel Haswell, Broadwell
"x86_64/v4" "GenuineIntel" "avx2 fma avx512f avx512bw avx512cd avx512dq avx512vl" # Intel Skylake, Cascade Lake
"x86_64/v3" "AuthenticAMD" "avx2 fma" # AMD Rome
"x86_64/amd/zen3" "AuthenticAMD" "avx2 fma vaes" # AMD Milan, Milan-X
What are your thoughts?
Currently, the software layer is built on a host machine for that host machine (i.e.
--march=native). However, on client side, we are potentially using those same binaries in a variety of architectures. For instance, we have an installation forhaswell, but inarchdetect(#187) any AVX2-only system will be identified ashaswelland use those binaries. This can work sometimes (e.g. broadwell and haswell have 99% the same instruction set), but for others it will be problematic.Since we will (probably) never have the resources to build binaries for all x86_64 CPU micro-architectures, I propose that we move to more generic builds and label those installations accordingly to avoid any confusion (i.e. not with an existing arch name, unless it will only be used in that CPU arch).
As discussed in Slack, a good starting point could be the standard CPU feature levels supported in gcc and clang:
We could replace the existing installations with the following:
-march=x86-64-v3and distribute those binaries inx86_64/v3-march=x86-64-v4and distribute those binaries inx86_64/v4Technically, we could even use
x86_64/v3for AMDzen,zen2andzen3. But if these generic definitions do not work well for certain CPU archs, we can always add new ones custom-made or have installations that specifically target a single CPU arch. For instance, based on the current situation, let's say that we wanted to keepzen3separate to be able to further optimize that systemWhat are your thoughts?