fix: Add max_tokens to claude-sonnet-4-6 to prevent integration test hangs by juanmichelini · Pull Request #2136 · OpenHands/software-agent-sdk

juanmichelini · 2026-02-19T21:54:48Z

Summary

Integration tests were hanging indefinitely when running with the claude-sonnet-4-6 model. This minimal fix adds a max_tokens: 8192 parameter to the model configuration to prevent the model from generating excessively long responses that could cause infinite loops.

Root Cause

The claude-sonnet-4-6 model was causing agents to enter infinite loops during integration tests, particularly in browser-related tests. Without a max_tokens limit, the model could generate responses that triggered repeated tool calls.

Solution

Added max_tokens: 8192 to the claude-sonnet-4-6 configuration in .github/run-eval/resolve_model_config.py. This limits the response length and prevents the hanging behavior while still allowing sufficient tokens for complex tasks.

Testing

This change will be validated by the integration tests running with claude-sonnet-4-6 in CI.

Checklist

If the PR is changing/adding functionality, are there tests to reflect this? (Will be tested in CI with actual claude-sonnet-4-6)
If there is an example, have you run the example to make sure that it works? (N/A - configuration change)
If there are instructions on how to run the code, have you followed the instructions and made sure that it works? (N/A - configuration change)
If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name? (N/A - configuration change)
Is the github CI passing? (Will check after PR is created)

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:519bf09-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-519bf09-python \
  ghcr.io/openhands/agent-server:519bf09-python

All tags pushed for this build

ghcr.io/openhands/agent-server:519bf09-golang-amd64
ghcr.io/openhands/agent-server:519bf09-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:519bf09-golang-arm64
ghcr.io/openhands/agent-server:519bf09-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:519bf09-java-amd64
ghcr.io/openhands/agent-server:519bf09-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:519bf09-java-arm64
ghcr.io/openhands/agent-server:519bf09-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:519bf09-python-amd64
ghcr.io/openhands/agent-server:519bf09-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:519bf09-python-arm64
ghcr.io/openhands/agent-server:519bf09-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:519bf09-golang
ghcr.io/openhands/agent-server:519bf09-java
ghcr.io/openhands/agent-server:519bf09-python

About Multi-Architecture Support

Each variant tag (e.g., 519bf09-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 519bf09-python-amd64) are also available if needed

claude-sonnet-4-6 was causing integration tests to hang indefinitely. Adding max_tokens: 8192 limits response length and prevents infinite loops. Fixes #2135 Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

🟡 Acceptable - Pragmatic fix for a real problem. Solves the hanging tests, but feels like treating symptoms rather than understanding the root cause.

all-hands-bot · 2026-02-19T21:56:30Z

.github/run-eval/resolve_model_config.py

        "llm_config": {
            "model": "litellm_proxy/anthropic/claude-sonnet-4-6",
            "temperature": 0.0,
+            "max_tokens": 8192,


🟡 Suggestion: Why does only claude-sonnet-4-6 need max_tokens when other models (gpt-4, claude-3.5-sonnet, etc.) don't have this set? This suggests we might be treating symptoms of an agent loop bug rather than fixing the root cause.

Questions:

Why is this model generating infinite responses? Is there a bug in how the agent handles its outputs?

Why 8192 specifically? No other model configs have this limit.

Should we add max_tokens to ALL models as a safety measure?

The pragmatic fix is fine - it solves the immediate problem. But understanding WHY this model behaves differently would prevent similar issues in the future.

juanmichelini · 2026-02-19T22:37:58Z

This is wrong. The error was solved in #2138

fix: Add max_tokens to claude-sonnet-4-6 to prevent hangs

47e29b5

claude-sonnet-4-6 was causing integration tests to hang indefinitely. Adding max_tokens: 8192 limits response length and prevents infinite loops. Fixes #2135 Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai bot mentioned this pull request Feb 19, 2026

Fix claude-sonnet-4-6 causing integration tests to hang indefinitely #2135

Closed

5 tasks

all-hands-bot approved these changes Feb 19, 2026

View reviewed changes

juanmichelini closed this Feb 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Add max_tokens to claude-sonnet-4-6 to prevent integration test hangs#2136

fix: Add max_tokens to claude-sonnet-4-6 to prevent integration test hangs#2136
juanmichelini wants to merge 1 commit intomainfrom
openhands/fix-claude-sonnet-4-6-hang

juanmichelini commented Feb 19, 2026 •

edited by github-actions bot

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot Feb 19, 2026

Uh oh!

juanmichelini commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

juanmichelini commented Feb 19, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Solution

Testing

Checklist

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

all-hands-bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

juanmichelini commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

juanmichelini commented Feb 19, 2026 •

edited by github-actions bot

Loading