Skip to content

CUDA: fix MMQ writeback for int8 tensor cores#8100

Merged
JohannesGaessler merged 1 commit intoggml-org:masterfrom
JohannesGaessler:cuda-fix-mmq-writeback
Jun 24, 2024
Merged

CUDA: fix MMQ writeback for int8 tensor cores#8100
JohannesGaessler merged 1 commit intoggml-org:masterfrom
JohannesGaessler:cuda-fix-mmq-writeback

Conversation

@JohannesGaessler
Copy link
Copy Markdown
Contributor

The logic that I implemented in #8062 was not quite correct. I added an offset to a pointer but forgot that then the out-of-bounds checks relative to that pointer would also need to be adjusted. I assume this PR fixes #8096 (need confirmation).

@github-actions github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jun 24, 2024
@JohannesGaessler JohannesGaessler merged commit 3b099bc into ggml-org:master Jun 24, 2024
MagnusS0 pushed a commit to MagnusS0/llama.cpp-normistral-tokenizer that referenced this pull request Jul 1, 2024
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Crashes at the end of startup during first prompt processing

2 participants