server : add arg for disabling prompt caching by rgerganov · Pull Request #18776 · ggml-org/llama.cpp

rgerganov · 2026-01-12T11:42:44Z

We have a use case where we run end to end tests which include llama-server and we expect all responses to be strictly deterministic. Of course we are setting a fixed seed in the HTTP request but it turns out this is not enough if prompt caching is enabled. Unfortunately, we cannot use the cache_prompt request option because it is not OpenAI compatible.

This patch adds another command line arg for disabling prompt caching but I'd be happy to discard it if there is some other way to accomplish this. Even if there is no such way, I don't insist merging this if maintainers decide this adds more confusion or our use case is not a valid one.

Disabling prompt caching is useful for clients who are restricted to sending only OpenAI-compat requests and want deterministic responses.

ngxson

I think it's ok to add this, I thought we already had this arg, but turns out we haven't had it.

Nits: it's better to move this arg right before --cache-reuse, while also update the help message of --cache-reuse saying that it depends on --cache-prompt

rgerganov · 2026-01-12T12:07:14Z

@ngxson thanks for the review, let me know if any other docs need update

* server : add arg for disabling prompt caching Disabling prompt caching is useful for clients who are restricted to sending only OpenAI-compat requests and want deterministic responses. * address review comments * address review comments

server : add arg for disabling prompt caching

4a83202

Disabling prompt caching is useful for clients who are restricted to sending only OpenAI-compat requests and want deterministic responses.

rgerganov requested a review from ngxson January 12, 2026 11:43

github-actions Bot added examples server labels Jan 12, 2026

ngxson approved these changes Jan 12, 2026

View reviewed changes

address review comments

8b6fcbc

rgerganov marked this pull request as ready for review January 12, 2026 12:06

rgerganov requested a review from ggerganov as a code owner January 12, 2026 12:06

ngxson reviewed Jan 12, 2026

View reviewed changes

Comment thread common/arg.cpp Outdated

Comment thread common/arg.cpp Outdated

address review comments

3c90952

ngxson approved these changes Jan 12, 2026

View reviewed changes

rgerganov merged commit bcf7546 into ggml-org:master Jan 12, 2026
75 of 76 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : add arg for disabling prompt caching#18776

server : add arg for disabling prompt caching#18776
rgerganov merged 3 commits intoggml-org:masterfrom
rgerganov:arg-cache-prompt

rgerganov commented Jan 12, 2026

Uh oh!

ngxson left a comment •

edited

Loading

Uh oh!

rgerganov commented Jan 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rgerganov commented Jan 12, 2026

Uh oh!

ngxson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rgerganov commented Jan 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngxson left a comment •

edited

Loading