Skip to content

Eval bug: qwen35moe always forces a full prompt reprocess after each message, 'failed to truncate' #19690

@timkhronos

Description

@timkhronos

Name and Version

version: 8070 (cc45f2a)
built with GNU 13.3.0 for Linux x86_64

Operating systems

Linux

GGML backends

CUDA

Hardware

Ryzen 9950x3d + 5090

Models

The issue presents with several quants of
Qwen3.5 397B

Problem description & steps to reproduce

When a generation is initiated, regardless if from the llama.cpp webui, sillytavern or other frontends, every generation regardless if it's a new reply or a swipe forces a full prompt reprocessing.

launch arguments:

/llama-server -m model/Qwen3.5-397B-A17B-UD-Q5_K_XL-00001-of-00007.gguf -ngl 999 --threads 16 --threads-batch 16 --batch-size 2048 -ub 2048 -ot "blk.(0|1|2).ffn_.=CUDA0" -ot "blk.._exps.=CPU" --ctx-size 96000 --port 15000 --chat-template-kwargs "{"enable_thinking": false}" --mmproj Desktop/model/AI/LLama.cpp/mmproj-F32.gguf --no-mmap

First Bad Commit

It is present since the commit adding support for qwen3.5moe

Relevant log output

slot get_availabl: id 3 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 0.838
slot launch_slot_: id 3 | task -1 | sampler chain: logits -> logit-bias -> top-k -> min-p -> ?temp-ext -> adaptive-p
slot launch_slot_: id 3 | task 427 | processing task, is_child = 0
slot update_slots: id 3 | task 427 | new prompt, n_ctx_slot = 96000, n_keep = 0, task.n_tokens = 2189
slot update_slots: id 3 | task 427 | need to evaluate at least 1 token for each active slot (n_past = 2189, task.n_tokens() = 2189)
slot update_slots: id 3 | task 427 | n_past was set to 2188
slot update_slots: id 3 | task 427 | n_tokens = 2188, memory_seq_rm [2188, end)
slot update_slots: id 3 | task 427 | failed to truncate tokens with position >= 2188 - clearing the memory
slot prompt_clear: id 3 | task 427 | clearing prompt with 2188 tokens

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions