Skip to content

Fix: Claude Sonnet 4.6 rejects requests with both temperature and top_p #2137

@juanmichelini

Description

@juanmichelini

Problem

Claude Sonnet 4.6 (and other Anthropic models) reject API requests when both temperature AND top_p are specified together.

Evidence from Production Failures

Multiple evaluation jobs failing with 100% failure rate:

  • eval-22184270397-claude-son (failed after 7h32m, swebench)
  • eval-22184282669-claude-son (failed after 7h32m, commit0)
  • eval-22112470055-claude-son (error state, 21h47m, swtbench)

Root Cause

  1. SDK default: top_p=1.0 (set in openhands/sdk/llm/llm.py)
  2. Benchmarks override: temperature=0.1 (set in evaluation configs)
  3. Result: Both parameters sent to Anthropic API → request rejected

From Anthropic's API docs:

You cannot specify both temperature and top_p in the same request.

Current Code

# openhands-sdk/openhands/sdk/llm/llm.py
class LLM(BaseModel):
    top_p: float | None = Field(
        default=1.0,  # ← Always sends this
        ge=0,
        le=1,
        description="Top-p (nucleus) sampling parameter..."
    )

Proposed Minimal Fix

Option A: Don't send top_p for Claude models when temperature is set

Add logic to skip top_p for Anthropic models when temperature is explicitly provided:

# In model_features.py or similar
def should_send_top_p(model: str, temperature: float | None) -> bool:
    """Return False if model doesn't support both temperature and top_p together."""
    # Anthropic models (Claude) don't accept both parameters
    if 'claude' in model.lower() or 'anthropic' in model.lower():
        return temperature is None
    return True

Option B: Set top_p default to None for Claude models

def get_default_top_p(model: str) -> float | None:
    """Return None for Claude models to avoid conflicts with temperature."""
    if 'claude' in model.lower() or 'anthropic' in model.lower():
        return None
    return 1.0  # Keep existing default for other models

Scope

This issue is ONLY about fixing Claude Sonnet 4.6 compatibility.

Related

Impact

Currently blocking production evaluations:

  • 100% failure rate for Claude Sonnet 4.6 evaluations

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions