fix: Add max_tokens to claude-sonnet-4-6 to prevent integration test hangs#2136
Closed
juanmichelini wants to merge 1 commit intomainfrom
Closed
fix: Add max_tokens to claude-sonnet-4-6 to prevent integration test hangs#2136juanmichelini wants to merge 1 commit intomainfrom
juanmichelini wants to merge 1 commit intomainfrom
Conversation
claude-sonnet-4-6 was causing integration tests to hang indefinitely. Adding max_tokens: 8192 limits response length and prevents infinite loops. Fixes #2135 Co-authored-by: openhands <openhands@all-hands.dev>
5 tasks
all-hands-bot
approved these changes
Feb 19, 2026
Collaborator
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Pragmatic fix for a real problem. Solves the hanging tests, but feels like treating symptoms rather than understanding the root cause.
| "llm_config": { | ||
| "model": "litellm_proxy/anthropic/claude-sonnet-4-6", | ||
| "temperature": 0.0, | ||
| "max_tokens": 8192, |
Collaborator
There was a problem hiding this comment.
🟡 Suggestion: Why does only claude-sonnet-4-6 need max_tokens when other models (gpt-4, claude-3.5-sonnet, etc.) don't have this set? This suggests we might be treating symptoms of an agent loop bug rather than fixing the root cause.
Questions:
- Why is this model generating infinite responses? Is there a bug in how the agent handles its outputs?
- Why 8192 specifically? No other model configs have this limit.
- Should we add max_tokens to ALL models as a safety measure?
The pragmatic fix is fine - it solves the immediate problem. But understanding WHY this model behaves differently would prevent similar issues in the future.
Collaborator
Author
|
This is wrong. The error was solved in #2138 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2135
Integration tests were hanging indefinitely when running with the
claude-sonnet-4-6model. This minimal fix adds amax_tokens: 8192parameter to the model configuration to prevent the model from generating excessively long responses that could cause infinite loops.Root Cause
The claude-sonnet-4-6 model was causing agents to enter infinite loops during integration tests, particularly in browser-related tests. Without a max_tokens limit, the model could generate responses that triggered repeated tool calls.
Solution
Added
max_tokens: 8192to the claude-sonnet-4-6 configuration in.github/run-eval/resolve_model_config.py. This limits the response length and prevents the hanging behavior while still allowing sufficient tokens for complex tasks.Testing
This change will be validated by the integration tests running with claude-sonnet-4-6 in CI.
Checklist
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:519bf09-pythonRun
All tags pushed for this build
About Multi-Architecture Support
519bf09-python) is a multi-arch manifest supporting both amd64 and arm64519bf09-python-amd64) are also available if needed