llama : add comments about experimental flags by ggerganov · Pull Request #7544 · ggml-org/llama.cpp

ggerganov · 2024-05-26T12:19:43Z

Certain combinations of [EXPERIMENTAL] llama_context_params are not always supported:

    struct llama_context_params {
        ...

        enum ggml_type type_k; // data type for K cache [EXPERIMENTAL]
        enum ggml_type type_v; // data type for V cache [EXPERIMENTAL]

        bool flash_attn;  // whether to use flash attention [EXPERIMENTAL]

        ...
    };

Here is a list of known incompatibilities (we can try to update it in the future):

flash_attn == true && type_k == F16 && type_v == F16
- CPU
  - Generally slower compared to no-FA, mostly used for testing purposes
- CUDA
- Metal
  - Slow or can't build for models with head size = 256 (Metal (iOS): Compute function exceeds available temporary registers #7261)
- Vulkan
- SYCL
flash_attn == true && (type_k != F16 || type_v != F16)
- CPU
- CUDA - partial support (CUDA: quantized KV support for FA vec #7527)
- Metal
- Vulkan
- SYCL
flash_attn == false && type_v != F16
- Not supported because the V cache is stored transposed, which prevents quantization

llama : add comments about experimental flags

aa0de27

ggerganov force-pushed the gg/fattn-warn branch from c99a2d9 to aa0de27 Compare May 26, 2024 12:20

ggerganov merged commit eaf6e03 into master May 27, 2024

ggerganov deleted the gg/fattn-warn branch May 27, 2024 06:24

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

llama : add comments about experimental flags (ggml-org#7544)

244a1af

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

llama : add comments about experimental flags (ggml-org#7544)

49da0c4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : add comments about experimental flags#7544

llama : add comments about experimental flags#7544
ggerganov merged 1 commit intomasterfrom
gg/fattn-warn

ggerganov commented May 26, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ggerganov commented May 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ggerganov commented May 26, 2024 •

edited

Loading