common : do not pass prompt tokens to reasoning budget sampler by aldehir · Pull Request #22488 · ggml-org/llama.cpp

aldehir · 2026-04-28T22:15:39Z

Overview

Do not pass prompt tokens through the reasoning budget sampler, mirroring grammar behavior. Renamed accept_grammar to is_generated to better convey the purpose of this flag.

Also adjusted the prefill logic to pass the generation prompt through the reasoning budget sampler as well. I removed the prefill_tokens parameter, as it required the prefill to match the starting token sequence exactly. Instead, we simply feed each token individually so it gets processed by the state machine.

Additional information

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

aldehir · 2026-04-28T22:27:14Z

@BruceJillis if you have an opportunity, can you see if this addresses the core issue.

pwilkin

Nice :)

BruceJillis · 2026-04-29T09:51:57Z

@aldehir I like the change! Now the same state machine drives prefill and generation. I tested with Qwen3.6-27B and the test cases I made for #22323: I see activated firing during common_sampler_init / prefill replay and deactivated on natural close. So yes it addresses the issue and the refactor looks clean to me.

As an aside: a user flagged that the reasoning budget logs are very noisy on #22323. Do you have a rough timeline for this PR? If it's a while out I'd like to open a small follow up that logs the unimportant transitions at DEBUG while leaving budget exhausted / forcing immediately at INFO.

aldehir · 2026-04-29T11:53:33Z

I did another pass and realized precomputing if the grammar should accept needs to stay, otherwise it checks against the updated reasoning budget state, which is incorrect.

aldehir · 2026-04-29T12:14:01Z

@BruceJillis this should land soon, so that will resolve the logging and only log for generated think sequences.

pwilkin · 2026-04-29T12:59:13Z

Waiting for CI and will merge.

* 'master' of github.com:tekintian/llama.cpp: (659 commits) ggml-webgpu: Improve performance of mat-vec and mat-mat for MUL_MAT_ID (ggml-org#22464) Update llama-mmap to use ftello/fseeko (ggml-org#22497) common : check for null getpwuid in hf-cache (ggml-org#22550) vulkan: add get/set tensor 2d functions (ggml-org#22514) spec: fix argument typo (ggml-org#22552) ci : bump ty to 0.0.33 (ggml-org#22535) vendor : update cpp-httplib to 0.43.2 (ggml-org#22548) CUDA: fix tile FA kernel on Pascal (ggml-org#22541) scripts : add wc2wt.sh - create worktree from current HEAD (ggml-org#22513) add fast matmul iquants (ggml-org#22504) spec : fix draft model checkpoints (ggml-org#22521) spec : fix vocab compat checks in spec example (ggml-org#22426) common : do not pass prompt tokens to reasoning budget sampler (ggml-org#22488) hexagon: make vmem and buffer-size configurable (ggml-org#22487) CUDA: fuse SSM_CONV + ADD(bias) + SILU (ggml-org#22478) spec : disacard last drafted token with low prob (ggml-org#22506) sync : ggml ggml : bump version to 0.10.1 (ggml/1469) webui: fix slow mic stop and WAV encode (ggml-org#22480) ggml-cpu : disable tiled matmul on AIX to fix page boundary segfault (ggml-org#22293) ... # Conflicts: # .gitignore

…org#22488)

aldehir added 2 commits April 28, 2026 16:56

common : add is_generated guard to reasoning budget sampler

83cb41a

common : seed the reasoning budget sampler with prefill tokens

1ca6d4d

aldehir requested a review from a team as a code owner April 28, 2026 22:15

aldehir changed the title ~~Fix reasoning budget~~ common : do not pass prompt tokens to reasoning budget sampler Apr 28, 2026

aldehir marked this pull request as draft April 28, 2026 22:17

aldehir marked this pull request as ready for review April 28, 2026 22:22

pwilkin approved these changes Apr 28, 2026

View reviewed changes

BruceJillis mentioned this pull request Apr 29, 2026

common : re-arm reasoning budget after DONE on new <think> #22323

Merged

ggerganov reviewed Apr 29, 2026

View reviewed changes

Comment thread common/sampling.h Outdated

cont : update comment about accept_grammar

b75418a

aldehir requested a review from ggerganov April 29, 2026 11:38

ggerganov approved these changes Apr 29, 2026

View reviewed changes

cont : precompute grammar acceptance like before

7dddefe

pwilkin approved these changes Apr 29, 2026

View reviewed changes

ggerganov approved these changes Apr 29, 2026

View reviewed changes

aldehir merged commit d775992 into ggml-org:master Apr 29, 2026
46 checks passed

aldehir deleted the fix-reasoning-budget branch April 29, 2026 19:11

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

common : do not pass prompt tokens to reasoning budget sampler (ggml-…

b4022a5

…org#22488)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common : do not pass prompt tokens to reasoning budget sampler#22488

common : do not pass prompt tokens to reasoning budget sampler#22488
aldehir merged 4 commits intoggml-org:masterfrom
aldehir:fix-reasoning-budget

aldehir commented Apr 28, 2026

Uh oh!

aldehir commented Apr 28, 2026

Uh oh!

pwilkin left a comment

Uh oh!

BruceJillis commented Apr 29, 2026

Uh oh!

Uh oh!

aldehir commented Apr 29, 2026

Uh oh!

aldehir commented Apr 29, 2026

Uh oh!

pwilkin commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

aldehir commented Apr 28, 2026

Overview

Additional information

Requirements

Uh oh!

aldehir commented Apr 28, 2026

Uh oh!

pwilkin left a comment

Choose a reason for hiding this comment

Uh oh!

BruceJillis commented Apr 29, 2026

Uh oh!

Uh oh!

aldehir commented Apr 29, 2026

Uh oh!

aldehir commented Apr 29, 2026

Uh oh!

pwilkin commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants