Q4 cleanup by sw · Pull Request #1061 · ggml-org/llama.cpp

sw · 2023-04-19T14:18:03Z

Cleanup following #951 and #1046:

remove unused ggml_vec_dot_q4_0
warn for unused C functions (I didn't touch C++)
use ggml_is_quantized for the work buffer calculation

dfyz · 2023-04-19T14:59:09Z

I think that everything inside this #ifdef should be removed as well. More precisely, bytes_from_q4_0_twoblocks_avx512() and dot_q4_0_twoblocks_avx512() should be removed, since they are only used it ggml_vec_dot_q4_0().

I'm guessing the CI didn't catch it with -Wno-unused-function because we only test AVX-512 under MSVC? Might be a good idea to include an AVX-512 build for Linux as well, instead of ACCELERATE (which is a no-op on Linux).

I just realized that CI wouldn't fail in any case because -Wno-unused-function is only a warning, not an error.

sw · 2023-04-19T15:33:19Z

You are right. Parts of this clever code may be useful for other quantization types, but that's what the git history is for.

We might want to add -Werror if CI should catch warnings.

dfyz

I don't know if I can/should approve this, but this PR looks pretty uncontroversial to me.

* Q4 cleanup * Remove unused AVX512 Q4_0 code

* This works and TG is descent, but PP is low * Better * Apply f_logit_scale before mul mat with output tensor * This is better for PP: 600 t/s -> 700 t/s * To not lose this again * WIP * Equal split * WIP --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Q4 cleanup

21ee6d9

sw marked this pull request as ready for review April 19, 2023 14:19

sw requested review from dfyz and ggerganov April 19, 2023 14:23

Remove unused AVX512 Q4_0 code

e9657b2

dfyz approved these changes Apr 19, 2023

View reviewed changes

ggerganov merged commit f3d4edf into ggml-org:master Apr 19, 2023

sw deleted the q4-cleanup branch April 19, 2023 16:14

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

ggml : Q4 cleanup - remove 4-bit dot product code (ggml-org#1061)

578e69a

* Q4 cleanup * Remove unused AVX512 Q4_0 code

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

ggml : Q4 cleanup - remove 4-bit dot product code (ggml-org#1061)

ec0e355

* Q4 cleanup * Remove unused AVX512 Q4_0 code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q4 cleanup#1061

Q4 cleanup#1061
ggerganov merged 2 commits intoggml-org:masterfrom
sw:q4-cleanup

sw commented Apr 19, 2023

Uh oh!

dfyz commented Apr 19, 2023 •

edited

Loading

Uh oh!

sw commented Apr 19, 2023 •

edited

Loading

Uh oh!

dfyz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sw commented Apr 19, 2023

Uh oh!

dfyz commented Apr 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sw commented Apr 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dfyz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dfyz commented Apr 19, 2023 •

edited

Loading

sw commented Apr 19, 2023 •

edited

Loading