Skip to content

ggml : document occupancy heuristics in cuda_op_mean call to reduce_rows kernal#18212

Merged
am17an merged 1 commit intoggml-org:masterfrom
Aadeshveer:docs-reduce-rows-heuristic
Dec 20, 2025
Merged

ggml : document occupancy heuristics in cuda_op_mean call to reduce_rows kernal#18212
am17an merged 1 commit intoggml-org:masterfrom
Aadeshveer:docs-reduce-rows-heuristic

Conversation

@Aadeshveer
Copy link
Copy Markdown
Contributor

Added comments to ggml_cuda_op_mean to explain the logic behind the thread block size selection (512 vs 128 vs 32) for calling reduce_rows kernal

As a newcomer to the codebase, I spent a significant amount of time trying to decipher why these specific "magic numbers" were chosen and how the nsm check works. I dug through the commit history to find the original optimization context regarding latency hiding and scheduler overhead.

This inline documentation documents those heuristics so future contributors don't have to repeat that archaeology work.

@github-actions github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Dec 20, 2025
@am17an
Copy link
Copy Markdown
Contributor

am17an commented Dec 20, 2025

it would be better if you just link the original PR which has the complete discussion

…row count and column size, derived from historical commit context
@Aadeshveer Aadeshveer force-pushed the docs-reduce-rows-heuristic branch from 0edb2f7 to 473a670 Compare December 20, 2025 09:42
@Aadeshveer
Copy link
Copy Markdown
Contributor Author

I've replaced the explanation with a link to PR #15132

@am17an am17an merged commit 10b4f82 into ggml-org:master Dec 20, 2025
71 checks passed
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
…row count and column size, derived from historical commit context (ggml-org#18212)
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
…row count and column size, derived from historical commit context (#18212)
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
…row count and column size, derived from historical commit context (ggml-org#18212)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants