Fix prompt cache saving and chat-persistent rollover#1678
Merged
ejones merged 2 commits intoggml-org:masterfrom Jun 3, 2023
Merged
Fix prompt cache saving and chat-persistent rollover#1678ejones merged 2 commits intoggml-org:masterfrom
ejones merged 2 commits intoggml-org:masterfrom
Conversation
DannyDaemonic
approved these changes
Jun 3, 2023
Contributor
DannyDaemonic
left a comment
There was a problem hiding this comment.
This is a clever fix. Feel free to merge after the suggested size() to !empty() fix.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Collaborator
Author
|
Thanks! |
wbruna
added a commit
to wbruna/llama.cpp
that referenced
this pull request
Aug 20, 2025
…gml-org#1678) * Add separate flash attention config for image generation * Add config option for Conv2D Direct
Seunghhon
pushed a commit
to Seunghhon/llama.cpp
that referenced
this pull request
Apr 26, 2026
* Fix prompt cache saving and chat-persistent rollover (fixes ggml-org#1670) * clang-tidy Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
phuongncn
pushed a commit
to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4
that referenced
this pull request
Apr 28, 2026
* Fix prompt cache saving and chat-persistent rollover (fixes ggml-org#1670) * clang-tidy Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
ljubomirj
pushed a commit
to ljubomirj/llama.cpp
that referenced
this pull request
May 6, 2026
* Fix prompt cache saving and chat-persistent rollover (fixes ggml-org#1670) * clang-tidy Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1670, by reworking the original fix for #1585 from #1609.
The original fix examined
embdto determine if the prompt had been evaluated, butembdis limited to the batch size. In addition, that fix leftsession_tokensin its original state (i.e., the longer, cached prompt), while normal session evaluation truncates it at the first eval. This combination meant that any prompts with a cache hit on just the first batch (512 by default) would begin eval-ing ~from the second batch, and all of that eval would get appended to the end of the full, original cached prompt. This had the downstream effect of diverging the cache from the prompt and overrunning the context size in the cache, as seen in #1670.For the fix, I opted to move the re-eval logic to main's initialization rather than at the eval stage. Here, it transforms
session_tokenssuch that it will only match (prompt - 1) tokens.Testing:
--prompt-cacheis longer than the new one #1585, applied the Z/joke test and got a joke that did not start with "Z"