Skip to content

Update and fix Vulkan soft_max and argsort implementations#7237

Merged
0cc4m merged 2 commits intomasterfrom
0cc4m/soft-max-fix
May 18, 2024
Merged

Update and fix Vulkan soft_max and argsort implementations#7237
0cc4m merged 2 commits intomasterfrom
0cc4m/soft-max-fix

Conversation

@0cc4m
Copy link
Copy Markdown
Contributor

@0cc4m 0cc4m commented May 12, 2024

I updated Vulkan for the changes in #7192 and fixed a bug in the soft_max implementation. That allowed me to clean up some code that was only needed for the three input tensor soft_max op.

I also updated and fixed the argsort implementation. Now test-backend-ops fully passes for the Vulkan backend.

@0cc4m 0cc4m requested a review from ggerganov May 12, 2024 07:00
@mofosyne mofosyne added Vulkan Issues specific to the Vulkan backend Review Complexity : High Generally require indepth knowledge of LLMs or GPUs labels May 12, 2024
@Adriankhl
Copy link
Copy Markdown
Contributor

Adriankhl commented May 12, 2024

Not sure if this is the right place to discuss, I am digging into the issue #7130

Here is the root cause:

Embedding computation always try to first allocate buffer with 0 size.

Because of size += TENSOR_ALIGNMENT, size is always bigger than 0 for cpu backend (not sure if this is the correct behaviour though). So cpu backend can always allocate a buffer successsfully.

https://github.com/ggerganov/llama.cpp/blob/b228aba91ac2cd9eb90e9d423ba1d0d20e0117e2/ggml-backend.c#L625-L631

For vulkan backend, ptr is still nullptr here after ggml_vk_host_malloc if size is 0.

https://github.com/ggerganov/llama.cpp/blob/b228aba91ac2cd9eb90e9d423ba1d0d20e0117e2/ggml-vulkan.cpp#L6031-L6043

And because ggml_vk_host_malloc runs successfully, it doesn't throw an exception, which causes problem later on.

Should there be a null check here to throw an exception? Falling back to CPU buffer actually works despite the warning.

@mofosyne mofosyne added the bugfix fixes an issue or bug label May 12, 2024
Copy link
Copy Markdown
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea before merging to run the 2 tests from the #7192 and verify that the output is reasonable

@Adriankhl
Copy link
Copy Markdown
Contributor

Not sure if this is the right place to discuss, I am digging into the issue #7130

Here is the root cause:

Embedding computation always try to first allocate buffer with 0 size.

Because of size += TENSOR_ALIGNMENT, size is always bigger than 0 for cpu backend (not sure if this is the correct behaviour though). So cpu backend can always allocate a buffer successsfully.

https://github.com/ggerganov/llama.cpp/blob/b228aba91ac2cd9eb90e9d423ba1d0d20e0117e2/ggml-backend.c#L625-L631

For vulkan backend, ptr is still nullptr here after ggml_vk_host_malloc if size is 0.

https://github.com/ggerganov/llama.cpp/blob/b228aba91ac2cd9eb90e9d423ba1d0d20e0117e2/ggml-vulkan.cpp#L6031-L6043

And because ggml_vk_host_malloc runs successfully, it doesn't throw an exception, which causes problem later on.

Should there be a null check here to throw an exception? Falling back to CPU buffer actually works despite the warning.

Nevermind, the issue is much deeper than this. Please ignore it here

@0cc4m 0cc4m merged commit c1b295e into master May 18, 2024
@0cc4m 0cc4m deleted the 0cc4m/soft-max-fix branch May 18, 2024 06:11
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
…7237)

* Update and fix Vulkan softmax implementation

* Update and fix Vulkan argsort implementation
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
…7237)

* Update and fix Vulkan softmax implementation

* Update and fix Vulkan argsort implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix fixes an issue or bug Review Complexity : High Generally require indepth knowledge of LLMs or GPUs Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants