Only enable sgemm for prompt processing, not for inference by netrunnereve · Pull Request #9330 · ggml-org/llama.cpp

netrunnereve · 2024-09-06T04:17:12Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

While sgemm/tinyblas was designed to speed up prompt processing using tiled matrix multiplications, llama.cpp also calls it for inference as a 1x1 computation. Personally I think it makes more sense for us to use our dedicated ggml_vec_dot functions for the inference dot products and leave sgemm for prompt processing only. We can optimize each one for its respective purpose and so forth.

See my PR #8049 for an example where sgemm has faster prompt processing while ggml_vec_dot has faster inference.

only enable sgemm for prompt processing

3222aae

slaren approved these changes Sep 7, 2024

View reviewed changes

ggerganov merged commit e536426 into ggml-org:master Sep 7, 2024

netrunnereve deleted the sgemm_pp branch September 8, 2024 01:03

netrunnereve mentioned this pull request Sep 11, 2024

IQ4_NL sgemm + Q4_0 AVX optimization #9422

Merged

4 tasks

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024

llamafile : disable sgemm for batch-size 1 (ggml-org#9330)

2430c63

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024

llamafile : disable sgemm for batch-size 1 (ggml-org#9330)

78fceb3

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

llamafile : disable sgemm for batch-size 1 (ggml-org#9330)

e7f5c7d

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

llamafile : disable sgemm for batch-size 1 (ggml-org#9330)

d48dd9c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only enable sgemm for prompt processing, not for inference#9330

Only enable sgemm for prompt processing, not for inference#9330
ggerganov merged 1 commit intoggml-org:masterfrom
netrunnereve:sgemm_pp

netrunnereve commented Sep 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

netrunnereve commented Sep 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants