Skip to content

CUDA: fix strided GEMM for [0,2,1,3] per && ne2==1#15037

Closed
JohannesGaessler wants to merge 1 commit intoggml-org:masterfrom
JohannesGaessler:cuda-fix-strided-gemm
Closed

CUDA: fix strided GEMM for [0,2,1,3] per && ne2==1#15037
JohannesGaessler wants to merge 1 commit intoggml-org:masterfrom
JohannesGaessler:cuda-fix-strided-gemm

Conversation

@JohannesGaessler
Copy link
Copy Markdown
Contributor

Fixes failing tests added in #15015 . The problem is that for ne02 == 1 the per-matrix strides can be calculated incorrectly.

@github-actions github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Aug 2, 2025
@ggerganov
Copy link
Copy Markdown
Member

I was just also opening a PR: #15038

Could you review that and if it's OK merge it instead as it also has SYCL fix + fix for src1 cont check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants