Skip to content

fix: Add max_tokens to claude-sonnet-4-6 to prevent integration test hangs#2136

Closed
juanmichelini wants to merge 1 commit intomainfrom
openhands/fix-claude-sonnet-4-6-hang
Closed

fix: Add max_tokens to claude-sonnet-4-6 to prevent integration test hangs#2136
juanmichelini wants to merge 1 commit intomainfrom
openhands/fix-claude-sonnet-4-6-hang

Conversation

@juanmichelini
Copy link
Collaborator

@juanmichelini juanmichelini commented Feb 19, 2026

Summary

Fixes #2135

Integration tests were hanging indefinitely when running with the claude-sonnet-4-6 model. This minimal fix adds a max_tokens: 8192 parameter to the model configuration to prevent the model from generating excessively long responses that could cause infinite loops.

Root Cause

The claude-sonnet-4-6 model was causing agents to enter infinite loops during integration tests, particularly in browser-related tests. Without a max_tokens limit, the model could generate responses that triggered repeated tool calls.

Solution

Added max_tokens: 8192 to the claude-sonnet-4-6 configuration in .github/run-eval/resolve_model_config.py. This limits the response length and prevents the hanging behavior while still allowing sufficient tokens for complex tasks.

Testing

This change will be validated by the integration tests running with claude-sonnet-4-6 in CI.

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this? (Will be tested in CI with actual claude-sonnet-4-6)
  • If there is an example, have you run the example to make sure that it works? (N/A - configuration change)
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works? (N/A - configuration change)
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name? (N/A - configuration change)
  • Is the github CI passing? (Will check after PR is created)

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:519bf09-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-519bf09-python \
  ghcr.io/openhands/agent-server:519bf09-python

All tags pushed for this build

ghcr.io/openhands/agent-server:519bf09-golang-amd64
ghcr.io/openhands/agent-server:519bf09-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:519bf09-golang-arm64
ghcr.io/openhands/agent-server:519bf09-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:519bf09-java-amd64
ghcr.io/openhands/agent-server:519bf09-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:519bf09-java-arm64
ghcr.io/openhands/agent-server:519bf09-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:519bf09-python-amd64
ghcr.io/openhands/agent-server:519bf09-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:519bf09-python-arm64
ghcr.io/openhands/agent-server:519bf09-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:519bf09-golang
ghcr.io/openhands/agent-server:519bf09-java
ghcr.io/openhands/agent-server:519bf09-python

About Multi-Architecture Support

  • Each variant tag (e.g., 519bf09-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 519bf09-python-amd64) are also available if needed

claude-sonnet-4-6 was causing integration tests to hang indefinitely.
Adding max_tokens: 8192 limits response length and prevents infinite loops.

Fixes #2135

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Pragmatic fix for a real problem. Solves the hanging tests, but feels like treating symptoms rather than understanding the root cause.

"llm_config": {
"model": "litellm_proxy/anthropic/claude-sonnet-4-6",
"temperature": 0.0,
"max_tokens": 8192,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Why does only claude-sonnet-4-6 need max_tokens when other models (gpt-4, claude-3.5-sonnet, etc.) don't have this set? This suggests we might be treating symptoms of an agent loop bug rather than fixing the root cause.

Questions:

  • Why is this model generating infinite responses? Is there a bug in how the agent handles its outputs?
  • Why 8192 specifically? No other model configs have this limit.
  • Should we add max_tokens to ALL models as a safety measure?

The pragmatic fix is fine - it solves the immediate problem. But understanding WHY this model behaves differently would prevent similar issues in the future.

@juanmichelini
Copy link
Collaborator Author

This is wrong. The error was solved in #2138

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix claude-sonnet-4-6 causing integration tests to hang indefinitely

3 participants

Comments