Update and fix Vulkan soft_max and argsort implementations by 0cc4m · Pull Request #7237 · ggml-org/llama.cpp

0cc4m · 2024-05-12T07:00:10Z

I updated Vulkan for the changes in #7192 and fixed a bug in the soft_max implementation. That allowed me to clean up some code that was only needed for the three input tensor soft_max op.

I also updated and fixed the argsort implementation. Now test-backend-ops fully passes for the Vulkan backend.

Adriankhl · 2024-05-12T13:49:11Z

Not sure if this is the right place to discuss, I am digging into the issue #7130

Here is the root cause:

Embedding computation always try to first allocate buffer with 0 size.

Because of size += TENSOR_ALIGNMENT, size is always bigger than 0 for cpu backend (not sure if this is the correct behaviour though). So cpu backend can always allocate a buffer successsfully.

https://github.com/ggerganov/llama.cpp/blob/b228aba91ac2cd9eb90e9d423ba1d0d20e0117e2/ggml-backend.c#L625-L631

For vulkan backend, ptr is still nullptr here after ggml_vk_host_malloc if size is 0.

https://github.com/ggerganov/llama.cpp/blob/b228aba91ac2cd9eb90e9d423ba1d0d20e0117e2/ggml-vulkan.cpp#L6031-L6043

And because ggml_vk_host_malloc runs successfully, it doesn't throw an exception, which causes problem later on.

Should there be a null check here to throw an exception? Falling back to CPU buffer actually works despite the warning.

ggerganov

Might be a good idea before merging to run the 2 tests from the #7192 and verify that the output is reasonable

Adriankhl · 2024-05-13T02:55:02Z

Not sure if this is the right place to discuss, I am digging into the issue #7130

Here is the root cause:

Embedding computation always try to first allocate buffer with 0 size.

Because of size += TENSOR_ALIGNMENT, size is always bigger than 0 for cpu backend (not sure if this is the correct behaviour though). So cpu backend can always allocate a buffer successsfully.

https://github.com/ggerganov/llama.cpp/blob/b228aba91ac2cd9eb90e9d423ba1d0d20e0117e2/ggml-backend.c#L625-L631

For vulkan backend, ptr is still nullptr here after ggml_vk_host_malloc if size is 0.

https://github.com/ggerganov/llama.cpp/blob/b228aba91ac2cd9eb90e9d423ba1d0d20e0117e2/ggml-vulkan.cpp#L6031-L6043

And because ggml_vk_host_malloc runs successfully, it doesn't throw an exception, which causes problem later on.

Should there be a null check here to throw an exception? Falling back to CPU buffer actually works despite the warning.

Nevermind, the issue is much deeper than this. Please ignore it here

…7237) * Update and fix Vulkan softmax implementation * Update and fix Vulkan argsort implementation

0cc4m added 2 commits May 12, 2024 08:52

Update and fix Vulkan softmax implementation

720f132

Update and fix Vulkan argsort implementation

9c5d3fc

0cc4m requested a review from ggerganov May 12, 2024 07:00

mofosyne added Vulkan Issues specific to the Vulkan backend Review Complexity : High Generally require indepth knowledge of LLMs or GPUs labels May 12, 2024

mofosyne added the bugfix fixes an issue or bug label May 12, 2024

ggerganov approved these changes May 12, 2024

View reviewed changes

ggerganov mentioned this pull request May 12, 2024

Vulkan outputs gibberish using extended context with vram saturated #7240

Closed

0cc4m merged commit c1b295e into master May 18, 2024

0cc4m deleted the 0cc4m/soft-max-fix branch May 18, 2024 06:11

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

Update and fix Vulkan soft_max and argsort implementations (ggml-org#…

87599bc

…7237) * Update and fix Vulkan softmax implementation * Update and fix Vulkan argsort implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update and fix Vulkan soft_max and argsort implementations#7237

Update and fix Vulkan soft_max and argsort implementations#7237
0cc4m merged 2 commits intomasterfrom
0cc4m/soft-max-fix

0cc4m commented May 12, 2024

Uh oh!

Adriankhl commented May 12, 2024 •

edited

Loading

Uh oh!

ggerganov left a comment

Uh oh!

Adriankhl commented May 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

0cc4m commented May 12, 2024

Uh oh!

Adriankhl commented May 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Adriankhl commented May 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Adriankhl commented May 12, 2024 •

edited

Loading