Eval bug:  qwen35moe always forces a full prompt reprocess after each message, 'failed to truncate'

### Name and Version

version: 8070 (cc45f2ada)
built with GNU 13.3.0 for Linux x86_64

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

Ryzen 9950x3d + 5090

### Models

The issue presents with several quants of
Qwen3.5 397B

### Problem description & steps to reproduce

When a generation is initiated, regardless if from the llama.cpp webui, sillytavern or other frontends, every generation regardless if it's a new reply or a swipe forces a full prompt reprocessing.

launch arguments: 
>/llama-server -m model/Qwen3.5-397B-A17B-UD-Q5_K_XL-00001-of-00007.gguf -ngl 999 --threads 16 --threads-batch 16 --batch-size 2048 -ub 2048 -ot "blk\.(0|1|2)\.ffn_.*=CUDA0" -ot "blk\..*_exps\.=CPU" --ctx-size 96000 --port 15000 --chat-template-kwargs "{\"enable_thinking\": false}" --mmproj Desktop/model/AI/LLama.cpp/mmproj-F32.gguf --no-mmap


### First Bad Commit

It is present since the commit adding support for qwen3.5moe

### Relevant log output

>slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 0.838
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> logit-bias -> top-k -> min-p -> ?temp-ext -> adaptive-p 
slot launch_slot_: id  3 | task 427 | processing task, is_child = 0
slot update_slots: id  3 | task 427 | new prompt, n_ctx_slot = 96000, n_keep = 0, task.n_tokens = 2189
slot update_slots: id  3 | task 427 | need to evaluate at least 1 token for each active slot (n_past = 2189, task.n_tokens() = 2189)
slot update_slots: id  3 | task 427 | n_past was set to 2188
slot update_slots: id  3 | task 427 | n_tokens = 2188, memory_seq_rm [2188, end)
slot update_slots: id  3 | task 427 | failed to truncate tokens with position >= 2188 - clearing the memory
slot prompt_clear: id  3 | task 427 | clearing prompt with 2188 tokens


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: qwen35moe always forces a full prompt reprocess after each message, 'failed to truncate' #19690

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: qwen35moe always forces a full prompt reprocess after each message, 'failed to truncate' #19690

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions