Eval bug: Qwen3.5-35b-a3b unloads automatically with no errors

### Name and Version

version: 8179 (ecbcb7ea9)



### Operating systems

Windows

### GGML backends

CUDA

### Hardware

5090 Mobile (24GB Vram), cuda 13.1.

### Models

unsloth's Qwen3.5-35B-A3B-UD-Q4_K_M.gguf 

### Problem description & steps to reproduce

args that I use to launch llama.cpp:
-m C:\Users\anubh\.lmstudio\models\lmstudio-community\Qwen3.5-35b-a3b-GGUF\Qwen3.5-35B-A3B-UD-Q4_K_M.gguf -ngl 99 -t 23 --temp 0.6 --top-k 20 --top-p 0.95 --parallel 1 --mlock --swa-full -c 200000 -ctk q8_0 -ctv q8_0 -fa on --jinja --reasoning-budget 0 --host 0.0.0.0 -fit off

The server stop suddenly without any error logs, but the last line is always " load:  - found better prompt with f_keep".

This happens very frequently when it is used with Claude Code and less frequently with Opencode.
Repo Step:
1 Load the model
2 Configure Claude code  to run it with the model locally.
3 go inside any codebase and delete any existing CLAUED.md file
4 run /init inside claude code.
5. delete any generated CLAUED.md file
6. repeat #4 and #5 2-3 times

Expected:
- The model should keep running

Actual
- The model unloads with the last line always like " found better prompt with f_keep..."

Please check logs sections for attached logs.

-----
[Not sure if it would help in debugging] Additionally, when I use a UI wrapper(https://github.com/anubhavgupta/llama-cpp-manager) to launch the app then I get an exit code as well:

```
srv   prompt_save:  - saving prompt with length 4207, total state size = 106.513 MiB

srv          load:  - looking for better prompt, base f_keep = 0.001, sim = 0.000
srv          load:  - found better prompt with f_keep = 0.543, sim = 0.994

Server process exited with code 3221225477 // <----------------THIS
```

a quick chatGPT search suggested it to be: 0xC0000005  →  STATUS_ACCESS_VIOLATION

### First Bad Commit

Getting it since the launch day of Qwen3.5 35b a3b.

### Relevant log output

<details>
<summary>Logs</summary>


```console
`
slot print_timing: id  0 | task 15417 |
prompt eval time =     333.66 ms /   208 tokens (    1.60 ms per token,   623.39 tokens per second)
       eval time =   29940.07 ms /  4000 tokens (    7.49 ms per token,   133.60 tokens per second)
      total time =   30273.74 ms /  4208 tokens
slot      release: id  0 | task 15417 | stop processing: n_tokens = 4207, truncated = 0
srv  update_slots: all slots are idle

srv  log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200

srv  params_from_: Chat format: peg-constructed

slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 837581953
srv  get_availabl: updating prompt cache

srv   prompt_save:  - saving prompt with length 4207, total state size = 106.513 MiB

srv          load:  - looking for better prompt, base f_keep = 0.001, sim = 0.000
srv          load:  - found better prompt with f_keep = 0.543, sim = 0.994
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Qwen3.5-35b-a3b unloads automatically with no errors #20002

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Qwen3.5-35b-a3b unloads automatically with no errors #20002

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions