Skip to content

Q4 cleanup#1061

Merged
ggerganov merged 2 commits intoggml-org:masterfrom
sw:q4-cleanup
Apr 19, 2023
Merged

Q4 cleanup#1061
ggerganov merged 2 commits intoggml-org:masterfrom
sw:q4-cleanup

Conversation

@sw
Copy link
Copy Markdown
Contributor

@sw sw commented Apr 19, 2023

Cleanup following #951 and #1046:

  • remove unused ggml_vec_dot_q4_0
  • warn for unused C functions (I didn't touch C++)
  • use ggml_is_quantized for the work buffer calculation

@sw sw marked this pull request as ready for review April 19, 2023 14:19
@sw sw requested review from dfyz and ggerganov April 19, 2023 14:23
@dfyz
Copy link
Copy Markdown
Contributor

dfyz commented Apr 19, 2023

I think that everything inside this #ifdef should be removed as well. More precisely, bytes_from_q4_0_twoblocks_avx512() and dot_q4_0_twoblocks_avx512() should be removed, since they are only used it ggml_vec_dot_q4_0().

I'm guessing the CI didn't catch it with -Wno-unused-function because we only test AVX-512 under MSVC? Might be a good idea to include an AVX-512 build for Linux as well, instead of ACCELERATE (which is a no-op on Linux).

I just realized that CI wouldn't fail in any case because -Wno-unused-function is only a warning, not an error.

@sw
Copy link
Copy Markdown
Contributor Author

sw commented Apr 19, 2023

You are right. Parts of this clever code may be useful for other quantization types, but that's what the git history is for.

We might want to add -Werror if CI should catch warnings.

Copy link
Copy Markdown
Contributor

@dfyz dfyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I can/should approve this, but this PR looks pretty uncontroversial to me.

@ggerganov ggerganov merged commit f3d4edf into ggml-org:master Apr 19, 2023
@sw sw deleted the q4-cleanup branch April 19, 2023 16:14
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* Q4 cleanup

* Remove unused AVX512 Q4_0 code
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
* Q4 cleanup

* Remove unused AVX512 Q4_0 code
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
* This works and TG is descent, but PP is low

* Better

* Apply f_logit_scale before mul mat with output tensor

* This is better for PP: 600 t/s -> 700 t/s

* To not lose this again

* WIP

* Equal split

* WIP

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants