Skip to content

ggml : various fixes#1450

Merged
ggerganov merged 1 commit intomasterfrom
fixes
May 14, 2023
Merged

ggml : various fixes#1450
ggerganov merged 1 commit intomasterfrom
fixes

Conversation

@ggerganov
Copy link
Copy Markdown
Member

The ggml_rope() fixes are irrelevant for LLaMA since n_rot == (n_embd / n_head), but it makes a difference for other models like GPT-J and GPT-NeoX where n_rot < (n_embd / n_head). I'm still not sure if this is the correct implementation, especially for the GPT-NeoX mode, but results kind of seem a bit better than before.

The non-inplace multi-thread ggml_diag_mask_inf() was broken here: #1428 . Again, irrelevant since in LLaMA forward we use ggml_diag_mask_inf_inplace(). Might be relevant to @xaedes

The "scratch buffers" fix might be relevant for LLaMA. See the new ggml_scratch_save() and ggml_scratch_load() functions and their usage in ggml.c: https://github.com/ggerganov/llama.cpp/blob/fixes/ggml.c#LL3925C1-L3939C1
The scratch buffers are mechanism for reusing memory from previous ops when it is no longer needed. The current way of using them is manual and very error-prone. Will hopefully come up with something better in the future.
More info here: ggml-org/whisper.cpp#431

- `ggml_rope()`
- `ggml_diag_mask_inf()` multi-threaded
- compatibility with scratch buffers
@ggerganov ggerganov merged commit 13c351a into master May 14, 2023
@ggerganov ggerganov deleted the fixes branch May 14, 2023 15:22
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
- `ggml_rope()`
- `ggml_diag_mask_inf()` multi-threaded
- compatibility with scratch buffers
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
- `ggml_rope()`
- `ggml_diag_mask_inf()` multi-threaded
- compatibility with scratch buffers
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
* WIP: mistral4

* CPU FA

* CUDA FA 320, 256
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant