ggml : various fixes by ggerganov · Pull Request #1450 · ggml-org/llama.cpp

ggerganov · 2023-05-14T12:38:40Z

fix ggml_rope() when not inplace ggml-org/ggml@788381e
fix ggml_rope() GPT-NeoX mode (hopefully) ggml-org/ggml@788381e
fix data race in multi-threaded ggml_diag_mask_inf() operator ggml-org/ggml@a483bb2
compatibility with scratch buffers

The ggml_rope() fixes are irrelevant for LLaMA since n_rot == (n_embd / n_head), but it makes a difference for other models like GPT-J and GPT-NeoX where n_rot < (n_embd / n_head). I'm still not sure if this is the correct implementation, especially for the GPT-NeoX mode, but results kind of seem a bit better than before.

The non-inplace multi-thread ggml_diag_mask_inf() was broken here: #1428 . Again, irrelevant since in LLaMA forward we use ggml_diag_mask_inf_inplace(). Might be relevant to @xaedes

The "scratch buffers" fix might be relevant for LLaMA. See the new ggml_scratch_save() and ggml_scratch_load() functions and their usage in ggml.c: https://github.com/ggerganov/llama.cpp/blob/fixes/ggml.c#LL3925C1-L3939C1
The scratch buffers are mechanism for reusing memory from previous ops when it is no longer needed. The current way of using them is manual and very error-prone. Will hopefully come up with something better in the future.
More info here: ggml-org/whisper.cpp#431

- `ggml_rope()` - `ggml_diag_mask_inf()` multi-threaded - compatibility with scratch buffers

* WIP: mistral4 * CPU FA * CUDA FA 320, 256

ggml : various fixes

9c7dea1

- `ggml_rope()` - `ggml_diag_mask_inf()` multi-threaded - compatibility with scratch buffers

ggerganov mentioned this pull request May 14, 2023

Fix race condition bug in non-inplace ggml_compute_forward_diag_mask_f32 #1454

Merged

ggerganov merged commit 13c351a into master May 14, 2023

ggerganov deleted the fixes branch May 14, 2023 15:22

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

ggml : various fixes (ggml-org#1450)

b7e9d6b

- `ggml_rope()` - `ggml_diag_mask_inf()` multi-threaded - compatibility with scratch buffers

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

ggml : various fixes (ggml-org#1450)

e28fe5e

- `ggml_rope()` - `ggml_diag_mask_inf()` multi-threaded - compatibility with scratch buffers

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

Mistral 4 support (ggml-org#1450)

56477c7

* WIP: mistral4 * CPU FA * CUDA FA 320, 256

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : various fixes#1450

ggml : various fixes#1450
ggerganov merged 1 commit intomasterfrom
fixes

ggerganov commented May 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ggerganov commented May 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant