Skip to content

Useless compression and buggy contextWindowSize #1924

@AliakseiLasevich

Description

@AliakseiLasevich

What happened?

I'm not sure if the problem is with qwen code or my model, but when I limit the context, compression seems useless.
I use llama.cpp locally with unsloth Qwen3-Coder-Next-UD-Q4_K_XL.gguf.
First compression was 82k -> 25k which looks pretty normal. The second is already useless 81651 to 81273
Log from qwen-code:

ℹIMPORTANT: This conversation approached the input token limit for unsloth/Qwen3-Coder-Next. A compressed context will be sent for future
    messages (compressed from: 81651 to 81273 tokens).

Then qwen code continued its execution and ended with an error

  ✕ [API Error: 400 request (100582 tokens) exceeds the available context size (100096 tokens), try increasing it]

settings.json:

{
  "$version": 3,
  "general": {
    "language": "ru"
  },
  "env": {
    "LOCAL_LLM_API_KEY": "local-llm"
  },
  "modelProviders": {
    "openai": [
      {
        "id": "unsloth/Qwen3-Coder-Next",
        "name": "unsloth/Qwen3-Coder-Next",
        "description": "Local Qwen model via OpenAI-compatible API",
        "baseUrl": "http://192.168.0.33:8001/v1",
        "envKey": "LOCAL_LLM_API_KEY",
        "generationConfig": {
          "contextWindowSize": 95000
        }
      }
    ]
  },
  "security": {
    "auth": {
      "selectedType": "openai"
    }
  },
  "model": {
    "name": "unsloth/Qwen3-Coder-Next",
    "chatCompression": {
      "contextPercentageThreshold": 0.85
    },
    "generationConfig": {
      "timeout": 1200000,
      "maxRetries": 3
    }
  },
  "tools": {
    "approvalMode": "default"
  }
}
 │ Status                                                                                           │
  │                                                                                                  │
  │ Qwen Code                         0.10.5 (135b47db)                                              │
  │ Runtime                           Node.js v24.11.0 / npm 11.6.1                                  │
  │ OS                                darwin arm64 (24.5.0)                                          │
  │                                                                                                  │
  │ Auth                              openai (http://192.168.0.33:8001/v1)                           │
  │ Model                             unsloth/Qwen3-Coder-Next                                       │
  │ Session ID                        178d7455-45b3-4be0-9d2c-716be523109f                           │
  │ Sandbox                           no sandbox                                                     │
  │ Proxy                             no proxy                                                       │
  │ Memory Usage                      361.9 MB

Metadata

Metadata

Assignees

Labels

status/needs-triageIssue needs to be triaged and labeledtype/bugSomething isn't working as expectedtype/feature-requestNew feature or enhancement request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions