Name and Version
version: 8179 (ecbcb7e)
Operating systems
Windows
GGML backends
CUDA
Hardware
5090 Mobile (24GB Vram), cuda 13.1.
Models
unsloth's Qwen3.5-35B-A3B-UD-Q4_K_M.gguf
Problem description & steps to reproduce
args that I use to launch llama.cpp:
-m C:\Users\anubh.lmstudio\models\lmstudio-community\Qwen3.5-35b-a3b-GGUF\Qwen3.5-35B-A3B-UD-Q4_K_M.gguf -ngl 99 -t 23 --temp 0.6 --top-k 20 --top-p 0.95 --parallel 1 --mlock --swa-full -c 200000 -ctk q8_0 -ctv q8_0 -fa on --jinja --reasoning-budget 0 --host 0.0.0.0 -fit off
The server stop suddenly without any error logs, but the last line is always " load: - found better prompt with f_keep".
This happens very frequently when it is used with Claude Code and less frequently with Opencode.
Repo Step:
1 Load the model
2 Configure Claude code to run it with the model locally.
3 go inside any codebase and delete any existing CLAUED.md file
4 run /init inside claude code.
5. delete any generated CLAUED.md file
6. repeat #4 and #5 2-3 times
Expected:
- The model should keep running
Actual
- The model unloads with the last line always like " found better prompt with f_keep..."
Please check logs sections for attached logs.
[Not sure if it would help in debugging] Additionally, when I use a UI wrapper(https://github.com/anubhavgupta/llama-cpp-manager) to launch the app then I get an exit code as well:
srv prompt_save: - saving prompt with length 4207, total state size = 106.513 MiB
srv load: - looking for better prompt, base f_keep = 0.001, sim = 0.000
srv load: - found better prompt with f_keep = 0.543, sim = 0.994
Server process exited with code 3221225477 // <----------------THIS
a quick chatGPT search suggested it to be: 0xC0000005 → STATUS_ACCESS_VIOLATION
First Bad Commit
Getting it since the launch day of Qwen3.5 35b a3b.
Relevant log output
Logs
`
slot print_timing: id 0 | task 15417 |
prompt eval time = 333.66 ms / 208 tokens ( 1.60 ms per token, 623.39 tokens per second)
eval time = 29940.07 ms / 4000 tokens ( 7.49 ms per token, 133.60 tokens per second)
total time = 30273.74 ms / 4208 tokens
slot release: id 0 | task 15417 | stop processing: n_tokens = 4207, truncated = 0
srv update_slots: all slots are idle
srv log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
srv params_from_: Chat format: peg-constructed
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 837581953
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 4207, total state size = 106.513 MiB
srv load: - looking for better prompt, base f_keep = 0.001, sim = 0.000
srv load: - found better prompt with f_keep = 0.543, sim = 0.994
Name and Version
version: 8179 (ecbcb7e)
Operating systems
Windows
GGML backends
CUDA
Hardware
5090 Mobile (24GB Vram), cuda 13.1.
Models
unsloth's Qwen3.5-35B-A3B-UD-Q4_K_M.gguf
Problem description & steps to reproduce
args that I use to launch llama.cpp:
-m C:\Users\anubh.lmstudio\models\lmstudio-community\Qwen3.5-35b-a3b-GGUF\Qwen3.5-35B-A3B-UD-Q4_K_M.gguf -ngl 99 -t 23 --temp 0.6 --top-k 20 --top-p 0.95 --parallel 1 --mlock --swa-full -c 200000 -ctk q8_0 -ctv q8_0 -fa on --jinja --reasoning-budget 0 --host 0.0.0.0 -fit off
The server stop suddenly without any error logs, but the last line is always " load: - found better prompt with f_keep".
This happens very frequently when it is used with Claude Code and less frequently with Opencode.
Repo Step:
1 Load the model
2 Configure Claude code to run it with the model locally.
3 go inside any codebase and delete any existing CLAUED.md file
4 run /init inside claude code.
5. delete any generated CLAUED.md file
6. repeat #4 and #5 2-3 times
Expected:
Actual
Please check logs sections for attached logs.
[Not sure if it would help in debugging] Additionally, when I use a UI wrapper(https://github.com/anubhavgupta/llama-cpp-manager) to launch the app then I get an exit code as well:
a quick chatGPT search suggested it to be: 0xC0000005 → STATUS_ACCESS_VIOLATION
First Bad Commit
Getting it since the launch day of Qwen3.5 35b a3b.
Relevant log output
Logs