CUDA: fix MMQ writeback for int8 tensor cores by JohannesGaessler · Pull Request #8100 · ggml-org/llama.cpp

JohannesGaessler · 2024-06-24T18:58:52Z

The logic that I implemented in #8062 was not quite correct. I added an offset to a pointer but forgot that then the out-of-bounds checks relative to that pointer would also need to be adjusted. I assume this PR fixes #8096 (need confirmation).

CUDA: fix MMQ writeback for int8 tensor cores

0402d4f

JohannesGaessler mentioned this pull request Jun 24, 2024

Bug: Crashes at the end of startup during first prompt processing #8096

Closed

github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jun 24, 2024

slaren approved these changes Jun 24, 2024

View reviewed changes

JohannesGaessler merged commit 3b099bc into ggml-org:master Jun 24, 2024

MagnusS0 pushed a commit to MagnusS0/llama.cpp-normistral-tokenizer that referenced this pull request Jul 1, 2024

CUDA: fix MMQ writeback for int8 tensor cores (ggml-org#8100)

f8152b2

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

CUDA: fix MMQ writeback for int8 tensor cores (ggml-org#8100)

e9f48bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: fix MMQ writeback for int8 tensor cores#8100

CUDA: fix MMQ writeback for int8 tensor cores#8100
JohannesGaessler merged 1 commit intoggml-org:masterfrom
JohannesGaessler:cuda-fix-mmq-writeback

JohannesGaessler commented Jun 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JohannesGaessler commented Jun 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants