Fix FlashAttention debug test, FP32 assert by JohannesGaessler · Pull Request #7684 · ggml-org/llama.cpp

JohannesGaessler · 2024-06-01T20:31:48Z

Removes an incorrect assert for FP32 Flashattention. Pads the number of elements for per KV cache row in the tests to a multiple of the block size. The backend is still going to return that the head size is not supported but I think that if something like this were to ever be implemented padding would be the only sensible way to do it.

Fix FlashAttention debug test, FP32 assert

4510236

github-actions Bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jun 1, 2024

slaren approved these changes Jun 1, 2024

View reviewed changes

JohannesGaessler merged commit e141ce6 into ggml-org:master Jun 1, 2024

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

Fix FlashAttention debug test, FP32 assert (ggml-org#7684)

61544ee

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

Fix FlashAttention debug test, FP32 assert (ggml-org#7684)

0f8cdc3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix FlashAttention debug test, FP32 assert#7684

Fix FlashAttention debug test, FP32 assert#7684
JohannesGaessler merged 1 commit intoggml-org:masterfrom
JohannesGaessler:cuda-fa-quant-fixup

JohannesGaessler commented Jun 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JohannesGaessler commented Jun 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants