server : fix warmup draft cache type by ggerganov · Pull Request #12446 · ggml-org/llama.cpp

ggerganov · 2025-03-18T08:49:43Z

fix #12436

Use F16 draft cache type during the warmup.

ggml-ci

server : fix warmup draft cache type

3637435

ggml-ci

ggerganov requested a review from ngxson as a code owner March 18, 2025 08:49

github-actions Bot added examples server labels Mar 18, 2025

ggerganov mentioned this pull request Mar 18, 2025

Misc. bug: KV Cache seems to be initialized twice for the draft model? #12436

Closed

ngxson approved these changes Mar 18, 2025

View reviewed changes

ggerganov merged commit 810e0af into master Mar 18, 2025
47 checks passed

ggerganov deleted the ggg/server-fix-warmup-draft-cache-type branch March 18, 2025 10:05

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025

server : fix warmup draft cache type (ggml-org#12446)

b7417a3

ggml-ci

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

server : fix warmup draft cache type (ggml-org#12446)

8dcc33d

ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : fix warmup draft cache type#12446

server : fix warmup draft cache type#12446
ggerganov merged 1 commit intomasterfrom
ggg/server-fix-warmup-draft-cache-type

ggerganov commented Mar 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ggerganov commented Mar 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants