Skip to content

Eval bug: Qwen3.5-35b-a3b unloads automatically with no errors #20002

@anubhavgupta

Description

@anubhavgupta

Name and Version

version: 8179 (ecbcb7e)

Operating systems

Windows

GGML backends

CUDA

Hardware

5090 Mobile (24GB Vram), cuda 13.1.

Models

unsloth's Qwen3.5-35B-A3B-UD-Q4_K_M.gguf

Problem description & steps to reproduce

args that I use to launch llama.cpp:
-m C:\Users\anubh.lmstudio\models\lmstudio-community\Qwen3.5-35b-a3b-GGUF\Qwen3.5-35B-A3B-UD-Q4_K_M.gguf -ngl 99 -t 23 --temp 0.6 --top-k 20 --top-p 0.95 --parallel 1 --mlock --swa-full -c 200000 -ctk q8_0 -ctv q8_0 -fa on --jinja --reasoning-budget 0 --host 0.0.0.0 -fit off

The server stop suddenly without any error logs, but the last line is always " load: - found better prompt with f_keep".

This happens very frequently when it is used with Claude Code and less frequently with Opencode.
Repo Step:
1 Load the model
2 Configure Claude code to run it with the model locally.
3 go inside any codebase and delete any existing CLAUED.md file
4 run /init inside claude code.
5. delete any generated CLAUED.md file
6. repeat #4 and #5 2-3 times

Expected:

  • The model should keep running

Actual

  • The model unloads with the last line always like " found better prompt with f_keep..."

Please check logs sections for attached logs.


[Not sure if it would help in debugging] Additionally, when I use a UI wrapper(https://github.com/anubhavgupta/llama-cpp-manager) to launch the app then I get an exit code as well:

srv   prompt_save:  - saving prompt with length 4207, total state size = 106.513 MiB

srv          load:  - looking for better prompt, base f_keep = 0.001, sim = 0.000
srv          load:  - found better prompt with f_keep = 0.543, sim = 0.994

Server process exited with code 3221225477 // <----------------THIS

a quick chatGPT search suggested it to be: 0xC0000005 → STATUS_ACCESS_VIOLATION

First Bad Commit

Getting it since the launch day of Qwen3.5 35b a3b.

Relevant log output

Logs
`
slot print_timing: id  0 | task 15417 |
prompt eval time =     333.66 ms /   208 tokens (    1.60 ms per token,   623.39 tokens per second)
       eval time =   29940.07 ms /  4000 tokens (    7.49 ms per token,   133.60 tokens per second)
      total time =   30273.74 ms /  4208 tokens
slot      release: id  0 | task 15417 | stop processing: n_tokens = 4207, truncated = 0
srv  update_slots: all slots are idle

srv  log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200

srv  params_from_: Chat format: peg-constructed

slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 837581953
srv  get_availabl: updating prompt cache

srv   prompt_save:  - saving prompt with length 4207, total state size = 106.513 MiB

srv          load:  - looking for better prompt, base f_keep = 0.001, sim = 0.000
srv          load:  - found better prompt with f_keep = 0.543, sim = 0.994

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions