llama-bench: Fix to reduce very high ± variability#21282
llama-bench: Fix to reduce very high ± variability#21282michaelw9999 wants to merge 1 commit intoggml-org:masterfrom
Conversation
am17an
left a comment
There was a problem hiding this comment.
can you add --n-warmup-runs instead of this? on CPUs the variability can be even higher
|
To clarify: what we are observing is not an issue with variance, it's an issue with bias. The first benchmark run rather than the warmup run is the one where a CUDA graph is actually captured so the performance of the first benchmark run is consistently underestimated vs. real-life usage. The correct solution as far as I'm concerned is to just do 2 warmup runs if and only if the number of tokens and the physical batch size are equal. That should fix the issue and only minimally increase the runtime. |
|
I'm going to switch this to draft and study this further and see if I can come up with a better solution that also works for cpu and for tg. Most of the time even without n=4 tg variability remains low (eg, 0.9% for tg128 on my first data point). Fix brought pp512's to 0.55%. |
|
BTW that is kind of expected for an extremely small model like you're testing (Qwen 0.8B), you should try larger models which mirror real world use-cases |
Overview
After the implementation of PR #19754 ,
llama-benchstarted to show very high variability for most bench runs. That can still be avoided by adding repeats, eg,-r 5or-r 10. This change fixes llama-bench's high variability/noise by adding 4 warmup runs (easy to adjust), which seems to be the sweet spot of not adding too much delay from the extra run, but still substantially reducing variability.There still may be some variance from one run to another based on system load or other factors, but this new default setting helps to prevent that and is more consistent with the output.
Additional information
Example
llama-benchbefore/after change in output on some models I've tested, without flags:Requirements
Yes
Yes - helped locate the best position to put in the fix.