Skip to content

Added AVX512F macros to ggml.c #6088

Merged
ggerganov merged 1 commit intoggml-org:masterfrom
amiralimi:avx512
Mar 16, 2024
Merged

Added AVX512F macros to ggml.c #6088
ggerganov merged 1 commit intoggml-org:masterfrom
amiralimi:avx512

Conversation

@amiralimi
Copy link
Copy Markdown
Contributor

Hi.
I added AVX512F macros to ggml.c. For commiting I ran pre-commit too. I wanted to add AVX512_FP16 support too, but I didn't have a hardware that supports it.

This change showed speed-up when running F16 and F32 models. Here are the results:

Results

avx512 - fp16
llama_print_timings:        50 runs   (  453.77 ms per token,     2.20 tokens per second)

avx - fp16
llama_print_timings:        50 runs   (  625.48 ms per token,     1.60 tokens per second)

avx512 - fp32
llama_print_timings:        50 runs   (  517.39 ms per token,     1.93 tokens per second)

avx - fp32
llama_print_timings:        50 runs   (  638.76 ms per token,     1.57 tokens per second)

I don't know if I need to add more things for this change (I'm new to open-source development).

Copy link
Copy Markdown
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

@ggerganov ggerganov merged commit c47cf41 into ggml-org:master Mar 16, 2024
@amiralimi amiralimi deleted the avx512 branch March 16, 2024 17:34
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants