Skip to content

CUDA: revert part of the RDNA1 optimizations#8309

Merged
JohannesGaessler merged 1 commit intoggml-org:masterfrom
daniandtheweb:gfx1010_optimizations
Jul 5, 2024
Merged

CUDA: revert part of the RDNA1 optimizations#8309
JohannesGaessler merged 1 commit intoggml-org:masterfrom
daniandtheweb:gfx1010_optimizations

Conversation

@daniandtheweb
Copy link
Copy Markdown
Contributor

@daniandtheweb daniandtheweb commented Jul 4, 2024

The change on the launch_bounds was causing a small performance drop in prompt processing, apparently this change was only beneficial before I tuned the mmq_y values.

model size params backend ngl test t/s master t/s PR Speedup
llama 8B Q5_K - Small 5.21 GiB 8.03 B ROCm 99 pp512 276.60 ± 0.41 300.60 ± 0.46 1.09

The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s
@github-actions github-actions Bot added the Nvidia GPU Issues specific to Nvidia GPUs label Jul 4, 2024
@JohannesGaessler JohannesGaessler merged commit 0a42380 into ggml-org:master Jul 5, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 13, 2024
The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants