Skip to content

cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1#15038

Merged
ggerganov merged 3 commits intomasterfrom
gg/cuda-sycl-mm-batched-fix
Aug 2, 2025
Merged

cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1#15038
ggerganov merged 3 commits intomasterfrom
gg/cuda-sycl-mm-batched-fix

Conversation

@ggerganov
Copy link
Copy Markdown
Member

fix #15015 (comment)

  • Fix strides for batched GEMM to take into account when the ne02 == 1
  • Fix src1 contiguous condition - it's always cont when we convert it

@github-actions github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Aug 2, 2025
@JohannesGaessler
Copy link
Copy Markdown
Contributor

Good catch with src1 being potentially contiguous after a type conversion.

@ggerganov
Copy link
Copy Markdown
Member Author

The SYCL tests still fail because I think it needs to update the GGML_SYCL_DNNL path of this function. @qnixsynapse Will leave this to your team and merge this for now.

Waiting for the CUDA CI to pass and will merge.

@ggerganov ggerganov merged commit 15e92fd into master Aug 2, 2025
45 of 47 checks passed
@ggerganov ggerganov deleted the gg/cuda-sycl-mm-batched-fix branch August 2, 2025 14:13
@qnixsynapse
Copy link
Copy Markdown
Collaborator

@Rbiessy @Alcpz Since you guys were maintaining MUL_MAT kernels, tagging you both for visibility.

dpct path in batched kernel also doesn't seem to properly support non_cont inputs in my testing. So not doing anything at this time

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Aug 7, 2025
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1

ggml-ci

* cont : fix cont types

ggml-ci

* cont : adopt variable names and comment from the other branch
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
)

* cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1

ggml-ci

* cont : fix cont types

ggml-ci

* cont : adopt variable names and comment from the other branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants