Q2k interleaving implementation - x86/x64 SIMD#14373
Q2k interleaving implementation - x86/x64 SIMD#14373ggerganov merged 9 commits intoggml-org:masterfrom
Conversation
39ab344 to
c2c53bc
Compare
|
I tested this on a 13900k with gcc 13 and clang 19, but the improvement is not very significant. Repacking has a significant cost, since it increases load time and prevents usage of mmap, and as it is, I find this very hard to justify for AVX2. It may make sense for AVX512, but I cannot test that. DetailsGCC-13:
Clang-19:
|
75dd04b to
3f6c61d
Compare
|
Hi @slaren , Thanks |
|
Hi @slaren , @ggerganov , |
| } | ||
| // Store the accumulated values | ||
| for (int i = 0; i < 16; i++) { | ||
|
|
There was a problem hiding this comment.
Deduplicate the generic GEMV and GEMM implementations following #14897.
After that, feel free to merge.
6c758bb to
a1053fb
Compare
|
Hi @slaren , @ggerganov , The code has been updated with de-duplication of generic code. Please let us know if the code is good for merging. Thanks |
* Initial Q2_K Block Interleaving Implementation * Addressed review comments and clean up of the code * Post rebase fixes * Initial CI/CD fixes * Update declarations in arch-fallback.h * Changes for GEMV Q2_K in arch-fallback.h * Enable repacking only on AVX-512 machines * Update comments in repack.cpp * Address q2k comments --------- Co-authored-by: Manogna-Sree <elisetti.manognasree@multicorewareinc.com>
* Initial Q2_K Block Interleaving Implementation * Addressed review comments and clean up of the code * Post rebase fixes * Initial CI/CD fixes * Update declarations in arch-fallback.h * Changes for GEMV Q2_K in arch-fallback.h * Enable repacking only on AVX-512 machines * Update comments in repack.cpp * Address q2k comments --------- Co-authored-by: Manogna-Sree <elisetti.manognasree@multicorewareinc.com>
Block Interleaving Formats
Block_Q2_Kx8 :
Performance Impact :
Gains of ~5.5 % seen with the AVX2 version and gains of ~25.5% seen with the AVX512 Version over the base commit with GCC Linux
GCC Linux :
Q2_K Model :
GCC Version = 12.3
Clang Linux:
More gains of ~26.3% seen with the AVX2 version and gains of ~53.9% seen with the AVX512 Version over the base commit with Clang Linux
Q2_K Model :
Clang Version = 20.1.0
The model tested was - https://huggingface.co/bartowski/Phi-3-mini-4k-instruct-GGUF
The PR was tested in AMD Ryzen 5 9600X which supports the following flags by default :
CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
Further the perplexity was tested and found to be similar with the Q2_K Model
The perplexity results are tabulated as follows :