Skip to content

llama : use n_embd_head_v instead of n_embd_head_k when reshaping kqv#7327

Merged
ggerganov merged 2 commits intoggml-org:masterfrom
fairydreaming:llm_build_kqv_fix
May 17, 2024
Merged

llama : use n_embd_head_v instead of n_embd_head_k when reshaping kqv#7327
ggerganov merged 2 commits intoggml-org:masterfrom
fairydreaming:llm_build_kqv_fix

Conversation

@fairydreaming
Copy link
Copy Markdown
Collaborator

When reshaping kqv at the end of llm_build_kqv() n_embd_head_k is incorrectly used instead of n_embd_head_v to calculate kqv dimensions.

@mofosyne mofosyne added bugfix fixes an issue or bug Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level labels May 16, 2024
…d n_embd_head_k when making a view of cached value vectors.
@fairydreaming
Copy link
Copy Markdown
Collaborator Author

I found another place when variables for key vectors were used for processing value vectors, so I added another commit to this PR.

Copy link
Copy Markdown
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which models are affected by this?

@fairydreaming
Copy link
Copy Markdown
Collaborator Author

DeepSeek-V2 needs this since it has n_embd_head_k != n_embd_head_v, I'm not sure about other models:

llm_load_print_meta: n_embd_head_k    = 192
llm_load_print_meta: n_embd_head_v    = 128

@ggerganov ggerganov merged commit 27b0406 into ggml-org:master May 17, 2024
@fairydreaming fairydreaming deleted the llm_build_kqv_fix branch March 22, 2025 17:50
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* llama : use n_embd_head_v instead of n_embd_head_k when reshaping kqv

* llama : use n_embd_v_gqa and n_embd_head_v instead of n_embd_k_gqa and n_embd_head_k when making a view of cached value vectors.

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
* llama : use n_embd_head_v instead of n_embd_head_k when reshaping kqv

* llama : use n_embd_v_gqa and n_embd_head_v instead of n_embd_k_gqa and n_embd_head_k when making a view of cached value vectors.

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix fixes an issue or bug Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants