Skip to content

spec : save the dynamic/static ngram cache file#22055

Draft
petersid2022 wants to merge 1 commit intoggml-org:masterfrom
petersid2022:self-speculation-save-cache
Draft

spec : save the dynamic/static ngram cache file#22055
petersid2022 wants to merge 1 commit intoggml-org:masterfrom
petersid2022:self-speculation-save-cache

Conversation

@petersid2022
Copy link
Copy Markdown
Contributor

@petersid2022 petersid2022 commented Apr 17, 2026

Overview

  • When we select the COMMON_SPECULATIVE_TYPE_NGRAM_CACHE speculative implementation we create a new common_speculative_state_ngram_cache state using create_state_ngram_cache, where we instantiate the new state by specifying various parameters (e.g, n_draft, save_static and save_dynamic) by hardcoding them.

  • Instead we extend common_params_speculative to include those options as well.

  • An attempt was also made to implement the save_static / save_dynamic behavior by calling common_ngram_cache_save on object destruction.

Additional information

Add self‑speculative decoding (no draft model required)#18471

Requirements

@petersid2022 petersid2022 force-pushed the self-speculation-save-cache branch from 2e1c956 to 430c0ca Compare April 18, 2026 13:14
@petersid2022 petersid2022 marked this pull request as ready for review April 18, 2026 13:37
@petersid2022 petersid2022 requested a review from a team as a code owner April 18, 2026 13:37
@petersid2022 petersid2022 force-pushed the self-speculation-save-cache branch 2 times, most recently from d5448ea to ba99720 Compare April 20, 2026 05:49
@petersid2022 petersid2022 requested review from a team, CISC, IMbackK, ggerganov and pwilkin as code owners April 20, 2026 05:49
@petersid2022 petersid2022 force-pushed the self-speculation-save-cache branch 4 times, most recently from cf7a308 to 8ae6c04 Compare April 20, 2026 06:59
@CISC CISC removed request for a team, CISC, IMbackK and pwilkin April 20, 2026 08:03
@petersid2022 petersid2022 force-pushed the self-speculation-save-cache branch 2 times, most recently from afc3295 to dc2ab62 Compare April 20, 2026 18:29
@petersid2022 petersid2022 force-pushed the self-speculation-save-cache branch 4 times, most recently from c402b3d to 9da23a4 Compare April 21, 2026 18:34
@petersid2022 petersid2022 changed the title spec: save the dynamic/static ngram cache file spec : save the dynamic/static ngram cache file Apr 21, 2026
@petersid2022 petersid2022 force-pushed the self-speculation-save-cache branch 9 times, most recently from 89b10b8 to 5c5bea4 Compare April 29, 2026 11:12
Comment thread common/common.h Outdated
};

struct common_params_speculative_ngram_cache {
struct common_params_speculative_ngram_cache : common_params_speculative_ngram_map {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the wrong way of going about this, but I am curious if the same concept of m-gram speculative tokens can be applied in the ngram-cache implemetantion

@petersid2022 petersid2022 force-pushed the self-speculation-save-cache branch 4 times, most recently from e3017a4 to 268d95e Compare May 1, 2026 09:55
@ggerganov
Copy link
Copy Markdown
Member

The new parameters are never populated. Did you test this change? What is the goal of this PR?

@ggerganov ggerganov marked this pull request as draft May 1, 2026 10:16
@petersid2022 petersid2022 force-pushed the self-speculation-save-cache branch 5 times, most recently from b4ad275 to 4fe77aa Compare May 2, 2026 17:56
@petersid2022
Copy link
Copy Markdown
Contributor Author

first of all, thanks for taking the time to review my PR!

TBH, my initial scope for this PR (after coming across the TODO on line 930 of common/speculative.cpp) was to move the save_[state,dynamic] bools from inside create_state_ngram_cache to the more centralized common.h, as the TODO instructed. That would allow for those options to be configurable from the user. Though, after thinking about it some more I decided to drop the booleans and instead rely on the existence or not of path_[static,dynamic] parameter.

The way I went about testing my changes was using the below command:

./build/bin/llama-server --port 1234 -m ~/models/Qwen3.5-9B-Q8_0.gguf --spec-type ngram-cache --lookup-cache-static ~/static.bin --lookup-cache-dynamic ~/dynamic.bin

P.S: In the common_speculative_state_ngram_cache constructor, I noticed that if the common_ngram_cache_load call fails on either one of them we abort execution. Could we instead call common_ngram_cache_save() so that we create the file?

@petersid2022 petersid2022 force-pushed the self-speculation-save-cache branch from 4fe77aa to 3f65c81 Compare May 2, 2026 18:11
* fix todo on providing n_draft, save_static and save_dynamic from common/common.h

* implement the functionality by saving the cache at the common_speculative_state_ngram_cache destruction
@petersid2022 petersid2022 force-pushed the self-speculation-save-cache branch from 3f65c81 to 719eb8b Compare May 4, 2026 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants