llama-bench: Fix to reduce very high ± variability by michaelw9999 · Pull Request #21282 · ggml-org/llama.cpp

michaelw9999 · 2026-04-01T23:55:08Z

Overview

After the implementation of PR #19754 , llama-bench started to show very high variability for most bench runs. That can still be avoided by adding repeats, eg, -r 5 or -r 10. This change fixes llama-bench's high variability/noise by adding 4 warmup runs (easy to adjust), which seems to be the sweet spot of not adding too much delay from the extra run, but still substantially reducing variability.
There still may be some variance from one run to another based on system load or other factors, but this new default setting helps to prevent that and is more consistent with the output.

Additional information

Example llama-bench before/after change in output on some models I've tested, without flags:

Model	Test	Before	After
Qwen3.5 0.8B Q4_K_M	pp512	25231.96 ± 17262.95	42934.98 ± 240.28
Qwen3.5 0.8B Q4_K_M	tg128	496.66 ± 2.32	496.70 ± 4.56
Qwen3.5 0.8B NVFP4	pp512	27147.01 ± 1477	46399.96 ± 144.82
Qwen3.5 0.8B NVFP4	tg128	404.41 ± 1.87	394.96 ± 1.38
Cascade 31B MXFP4	pp512	8609.84 ± 4706.08	10613.82 ± 28.14
Cascade 31B MXFP4	tg128	201.45 ± 9.3	205.84 ± 4.34
Cascade 31B NVFP4	pp512	9052.70 ± 4221.99	10986.75 ± 22.5
Cascade 31B NVFP4	tg128	195.93 ± 3.98	199.45 ± 0.58

Requirements

I have read and agree with the contributing guidelines
Yes
AI usage disclosure:
Yes - helped locate the best position to put in the fix.

michaelw9999 · 2026-04-01T23:57:00Z

@JohannesGaessler

am17an

can you add --n-warmup-runs instead of this? on CPUs the variability can be even higher

JohannesGaessler · 2026-04-02T09:31:42Z

To clarify: what we are observing is not an issue with variance, it's an issue with bias. The first benchmark run rather than the warmup run is the one where a CUDA graph is actually captured so the performance of the first benchmark run is consistently underestimated vs. real-life usage. The correct solution as far as I'm concerned is to just do 2 warmup runs if and only if the number of tokens and the physical batch size are equal. That should fix the issue and only minimally increase the runtime.

michaelw9999 · 2026-04-02T19:20:53Z

I'm going to switch this to draft and study this further and see if I can come up with a better solution that also works for cpu and for tg.
I still see big too much variation on warmup=2 for me, and even on -r 50 sometimes there's a hickup and it's way off. Perhaps add up a few and just discard outliers.

Most of the time even without n=4 tg variability remains low (eg, 0.9% for tg128 on my first data point). Fix brought pp512's to 0.55%.
I've tried -r 50 on some of the tiny models and while it's better with n=4 maybe there is still something else going on.

am17an · 2026-04-03T08:36:38Z

BTW that is kind of expected for an extremely small model like you're testing (Qwen 0.8B), you should try larger models which mirror real world use-cases

llama-bench: add warmups to reduce high variability

b151c39

github-actions Bot added the examples label Apr 2, 2026

am17an reviewed Apr 2, 2026

View reviewed changes

michaelw9999 marked this pull request as draft April 2, 2026 19:21

pedapudi mentioned this pull request Apr 8, 2026

gfx1151 nwarps, tile sizing to curb VGPR pressure #21344

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-bench: Fix to reduce very high ± variability#21282

llama-bench: Fix to reduce very high ± variability#21282
michaelw9999 wants to merge 1 commit intoggml-org:masterfrom
michaelw9999:bench-fix

michaelw9999 commented Apr 1, 2026

Uh oh!

michaelw9999 commented Apr 1, 2026

Uh oh!

am17an left a comment

Uh oh!

JohannesGaessler commented Apr 2, 2026

Uh oh!

michaelw9999 commented Apr 2, 2026

Uh oh!

am17an commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

michaelw9999 commented Apr 1, 2026

Overview

Additional information

Requirements

Uh oh!

michaelw9999 commented Apr 1, 2026

Uh oh!

am17an left a comment

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler commented Apr 2, 2026

Uh oh!

michaelw9999 commented Apr 2, 2026

Uh oh!

am17an commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants