Skip to content

server bench: fix bench not waiting for model load#7284

Merged
JohannesGaessler merged 1 commit intoggml-org:masterfrom
JohannesGaessler:server-bench-fix-wait
May 15, 2024
Merged

server bench: fix bench not waiting for model load#7284
JohannesGaessler merged 1 commit intoggml-org:masterfrom
JohannesGaessler:server-bench-fix-wait

Conversation

@JohannesGaessler
Copy link
Copy Markdown
Contributor

While working on #6828 I noticed that when using a large static n-ngam cache the benchmark would report 0 iterations for the first 8 minutes and then 30 iterations for the last 2 minutes. What seems to be happening is that bench.py doesn't correctly wait for the server to be ready so the clock starts ticking even while the n-gram cache is still being loaded. From what I can tell loading the model from disk can have the same issue if it's e.g. on an HDD.

This PR makes it so that bench.py waits for response 200 (SERVER_STATE_READY) from the health endpoint for checking whether the server is actually ready. I'm not sure if there is a better way to implement this than what I did; I'm definitely open to suggestions.

@JohannesGaessler JohannesGaessler requested a review from phymbert May 14, 2024 13:34
@mofosyne mofosyne added examples Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix python python script changes labels May 14, 2024
@JohannesGaessler JohannesGaessler merged commit 583fd6b into ggml-org:master May 15, 2024
@ggerganov
Copy link
Copy Markdown
Member

It looks like this change causes the server Benchmark that we run on the self-hosted runner to fail like this:

https://github.com/ggerganov/llama.cpp/actions/runs/9094073377/job/24998422481

I tried to revert it and now the benchmark passes:

https://github.com/ggerganov/llama.cpp/actions/runs/9112533114

I'm not sure why it is causing the error - any ideas how to fix?

@phymbert
Copy link
Copy Markdown
Collaborator

Yes, the problem is here:

https://github.com/ggerganov/llama.cpp/blob/9afdffe70ebf3166d429b4434783bb0b7f97bdeb/examples/server/bench/bench.py#L113

It considers prometheus not started, which is not working as expected. Probably easier to revert and separate in another PR prometheus check vs llama.cpp server checks ?

phymbert added a commit that referenced this pull request May 16, 2024
phymbert added a commit that referenced this pull request May 16, 2024
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants