ggml : document occupancy heuristics in cuda_op_mean call to reduce_rows kernal by Aadeshveer · Pull Request #18212 · ggml-org/llama.cpp

Aadeshveer · 2025-12-20T02:31:47Z

Added comments to ggml_cuda_op_mean to explain the logic behind the thread block size selection (512 vs 128 vs 32) for calling reduce_rows kernal

As a newcomer to the codebase, I spent a significant amount of time trying to decipher why these specific "magic numbers" were chosen and how the nsm check works. I dug through the commit history to find the original optimization context regarding latency hiding and scheduler overhead.

This inline documentation documents those heuristics so future contributors don't have to repeat that archaeology work.

am17an · 2025-12-20T08:28:53Z

it would be better if you just link the original PR which has the complete discussion

…row count and column size, derived from historical commit context

Aadeshveer · 2025-12-20T09:47:16Z

I've replaced the explanation with a link to PR #15132

…row count and column size, derived from historical commit context (ggml-org#18212)

…row count and column size, derived from historical commit context (#18212)

…row count and column size, derived from historical commit context (ggml-org#18212)

github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Dec 20, 2025

loci-dev mentioned this pull request Dec 20, 2025

UPSTREAM PR #18212: ggml : document occupancy heuristics in cuda_op_mean call to reduce_rows kernal auroralabs-loci/llama.cpp#632

Closed

Added comments explaining thread block size selection logic based on …

473a670

…row count and column size, derived from historical commit context

Aadeshveer force-pushed the docs-reduce-rows-heuristic branch from 0edb2f7 to 473a670 Compare December 20, 2025 09:42

am17an approved these changes Dec 20, 2025

View reviewed changes

am17an merged commit 10b4f82 into ggml-org:master Dec 20, 2025
71 checks passed

Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026

Added comments explaining thread block size selection logic based on …

99d65ec

…row count and column size, derived from historical commit context (ggml-org#18212)

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026

Added comments explaining thread block size selection logic based on …

085fa7a

…row count and column size, derived from historical commit context (#18212)

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

Added comments explaining thread block size selection logic based on …

a15ae11

…row count and column size, derived from historical commit context (ggml-org#18212)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : document occupancy heuristics in cuda_op_mean call to reduce_rows kernal#18212

ggml : document occupancy heuristics in cuda_op_mean call to reduce_rows kernal#18212
am17an merged 1 commit intoggml-org:masterfrom
Aadeshveer:docs-reduce-rows-heuristic

Aadeshveer commented Dec 20, 2025

Uh oh!

am17an commented Dec 20, 2025 •

edited

Loading

Uh oh!

Aadeshveer commented Dec 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Aadeshveer commented Dec 20, 2025

Uh oh!

am17an commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aadeshveer commented Dec 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

am17an commented Dec 20, 2025 •

edited

Loading