Set default temperature to None instead of 0.0#1989
Conversation
Change get_default_temperature to return None by default instead of 0.0. This allows models to use their provider's default temperature setting rather than forcing temperature=0.0 for all models. Models with specific temperature requirements (kimi-k2-thinking, kimi-k2.5) still explicitly return 1.0. Fixes #1913 Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands check the failing CI, merge main, reproduce locally, and fix any issues. |
|
I'm on it! neubig can track my progress at all-hands.dev |
Update test to expect temperature=None instead of 0.0, aligning with the PR's change to get_default_temperature() returning None by default. Co-authored-by: openhands <openhands@all-hands.dev>
SummaryI've successfully addressed the failing CI for PR #1989 by: Completed Tasks:
Change Made:- assert config.temperature == 0.0
+ assert config.temperature is None # None to use provider defaultsVerification:
Pushed:The fix has been pushed to the |
Coverage Report •
|
||||||||||||||||||||||||||||||
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Taste Rating: Acceptable
Pragmatic fix to a real problem, but breaks existing behavior without proper user migration guidance.
The core change is simple and solves the claude-4.6 incompatibility issue. However, this is a significant behavior change that moves from deterministic outputs (temp=0.0) to provider-dependent defaults. See inline comments for specific concerns.
VERDICT: ✅ Worth merging after addressing migration documentation
KEY INSIGHT: This is the right technical fix, but users who relied on deterministic behavior need explicit guidance on setting temperature=0.0 in their configs.
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
enyst
left a comment
There was a problem hiding this comment.
Thank you for this! I think maybe we can then remove the DEFAULT_TEMPERATURE_MODELS too? That’s there because Kimi is a reasoning model that doesn’t accept effort, if I recall correctly, but it needs temperature 1 like reasoning models.
I had the agent test Kimi:
- for 2.5, it worked fine with
None - for 2-thinking, no deployments found. (maybe it’s legacy now; we may want to remove it from eval proxy, since it doesn’t appear to work anymore)
Please see PR:
Addresses review comment asking for more explicit documentation about: - What None means (provider default) - How to get deterministic behavior (explicitly set 0.0) - The typical temperature range Co-authored-by: openhands <openhands@all-hands.dev>
…emperature-models
Co-authored-by: OpenHands Bot <contact@all-hands.dev>
As per enyst's suggestion on PR #1989: - Remove DEFAULT_TEMPERATURE_MODELS entirely - get_default_temperature() now returns None for all models - This simplifies the codebase and lets all models use their provider's default temperature - Previously Kimi 2.5 was tested to work fine with None Co-authored-by: openhands <openhands@all-hands.dev>
|
FWIW, PR 1994 is a PR to this PR. 😅 |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Taste Rating: Acceptable - Solves a real problem with elegant simplification, but introduces a breaking change that needs careful consideration.
Verdict: Worth merging with minor improvements to address backward compatibility concerns and design consistency.
Key Insight: This trades deterministic defaults for provider flexibility—a pragmatic choice, but one that changes behavior for all existing users.
all-hands-bot
left a comment
There was a problem hiding this comment.
🔴 Taste Rating: Needs Rework
This PR simplifies the code (good!) but introduces a breaking change that violates the "never break userspace" principle. Users relying on deterministic outputs (temperature=0.0) will now get unpredictable provider defaults.
[CRITICAL ISSUES] - Must Fix
1. Breaking Change Without Documentation
File: openhands-sdk/openhands/sdk/llm/llm.py (lines 180-185)
🔴 Critical: This description doesn't mention that this is a BREAKING CHANGE. Users who relied on temperature=0.0 for deterministic outputs will now get provider-specific defaults (which vary: OpenAI defaults to 1.0, Anthropic to 1.0, etc.).
You've broken userspace without documenting it. At minimum:
- Clear migration guide: "To preserve deterministic behavior, explicitly set temperature=0.0"
- Deprecation warning in release notes
- Consider a compatibility flag for one release cycle
Suggested fix:
temperature: float | None = Field(
default=None,
ge=0,
description=(
"Sampling temperature for response generation. "
"Defaults to None (uses provider default, typically 1.0 for most models). "
"BREAKING CHANGE: Previously defaulted to 0.0 for deterministic outputs. "
"To restore deterministic behavior, explicitly set temperature=0.0. "
"Higher values (0.7-1.0) increase creativity and randomness."
),
)2. Unused Parameter - Code Smell
File: openhands-sdk/openhands/sdk/llm/utils/model_features.py (line 183)
🔴 Critical: The _model parameter is now completely ignored (indicated by the underscore prefix). This is a code smell that screams "delete this function entirely."
If the model parameter is never used, why does this function exist? Either:
- Make it a module-level constant:
DEFAULT_TEMPERATURE = None - Delete it and use
Nonedirectly where needed - Or explain why we need the indirection
The function signature promises model-specific logic but delivers nothing. That's broken by design.
[IMPROVEMENT OPPORTUNITIES] - Should Fix
3. Misleading Documentation
File: openhands-sdk/openhands/sdk/llm/utils/model_features.py (lines 184-189)
🟠 Important: The docstring says "Return the default temperature for a given model pattern" but the model is completely ignored. This is a lie.
Suggested fix:
def get_default_temperature(_model: str) -> float | None:
"""Return None to allow providers to use their default temperature.
This function exists for backward compatibility and may be removed in future versions.
To use provider defaults, pass temperature=None when creating LLM instances.
"""
return None4. Missing Integration Tests
File: tests/sdk/llm/test_model_features.py (lines 342-346)
🟠 Important: These tests only verify the function returns None. They don't test what matters:
- Does passing
temperature=Noneto a provider actually work? - Does it fix the claude-4.6 issue mentioned in #1913?
- What temperature does each provider actually use when None is passed?
You're testing the implementation, not the behavior. Add tests that:
- Verify LiteLLM/providers accept temperature=None
- Confirm claude-4.6 no longer errors
- Document what "provider default" means for major providers
VERDICT
❌ Needs rework: The simplification is good in principle, but the breaking change and unused parameter issues must be addressed first.
KEY INSIGHT
Simplifying code by removing special cases is "good taste" - BUT you cannot break existing user expectations in the process. Either preserve backward compatibility with explicit opt-in to new behavior, or clearly document the breaking change and provide a migration path.
As per review comment on PR #1989: - Remove the unused _model parameter since DEFAULT_TEMPERATURE_MODELS was removed - Function now returns None without needing model info - Updated tests to use new function signature Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable
This is pragmatic simplification that solves a real problem (claude-4.6 incompatibility), but has a documentation discrepancy and potential breaking change concerns.
The Good: Eliminates model-specific special cases, defers to provider defaults, and simplifies the codebase. This is the right direction.
Key Issues: PR description claims model-specific logic is preserved but the code removes it entirely. Function signature change may break external callers.
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable
The simplification is good and solves a real problem (claude-4.6 errors). However, this is a breaking change that affects all existing users who relied on deterministic outputs. The code is cleaner after this change, but the user impact needs careful consideration.
Address AI review comments: - Fix docstring to not mention 'for a given model' since function no longer takes model param - Explicitly document the breaking change from 0.0 to None default - Add rationale and guidance for users who need deterministic outputs Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable - The simplification is elegant, but there's a critical breaking change concern.
[CRITICAL ISSUES]
The code previously set temperature=1.0 explicitly for kimi-k2-thinking and kimi-k2.5 models. Removing this special case without explanation violates the "Never Break Userspace" principle. Those special cases existed for a reason - were those models incompatible with temperature=0.0? Do they work correctly with None?
[IMPROVEMENT OPPORTUNITIES]
The data structure simplification (removing DEFAULT_TEMPERATURE_MODELS list) is good taste - it eliminates special cases. However, this isn't just a refactor; it's a behavioral change that needs verification.
[TESTING GAPS]
Tests were updated to expect None, but there's no coverage proving that previously special-cased models (kimi-k2-thinking, kimi-k2.5) still work correctly with None instead of 1.0.
VERDICT: ❌ Needs verification - Core simplification is sound, but the breaking change for kimi models must be validated before merging.
KEY INSIGHT: You're trading special-case complexity for potential runtime breakage - verify the kimi models work with None before shipping.
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
|
The agent missed this:
Yes, I tested it, previous comment #1989 (review) details temperature. In the linked PR, Kimi 2.5 failed because of the similar On another note, I would love to know if Xingyao is okay with this change, if he saw this issue (and/or slack discussion). Maybe we can give it a moment for potential issues to be clarified, before merging? |
|
/rerun |
|
@OpenHands fix failing CI. |
|
I'm on it! neubig can track my progress at all-hands.dev |
…-default-temperature-to-none This merges enyst's cleanup changes: - Remove get_default_temperature function entirely - Add get_default_top_p for Kimi and HuggingFace models - Centralize top_p defaults for models that need them Co-authored-by: openhands <openhands@all-hands.dev>
SummaryI investigated the CI status for PR #1989 and found that no code changes were needed - the CI was already in a passing state. What I Did:
CI Status:All 24 required checks passed successfully:
Conclusion:The failing CI from the previous comment by @neubig had already been fixed in earlier commits. The branch was merged with main and all tests were passing. No additional changes were required - I simply waited for all the CI jobs (including the longer-running Docker image builds) to complete and verified they all passed. |
Like temperature, top_p should default to None to let providers use their own defaults. Only models that require specific values (e.g., Kimi K2.5 needs 0.95) get auto-configured. Co-authored-by: openhands <openhands@all-hands.dev>
Kimi K2.5 works fine with top_p=None (uses provider default), so we don't need to special-case it anymore. The override was only needed when top_p defaulted to 1.0. Co-authored-by: openhands <openhands@all-hands.dev>
Let HuggingFace use its provider default for top_p rather than forcing 0.9. This is consistent with letting providers handle their own defaults. Co-authored-by: openhands <openhands@all-hands.dev>
Remove DEFAULT_TOP_P_MODELS and get_default_top_p function. All models now use provider defaults for top_p (None). Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Taste Rating: Acceptable - Elegant simplification that solves a real problem, but introduces a second undocumented breaking change.
The Good: Eliminates special-case logic and model-specific mappings. This is the right direction - simpler data structures, fewer edge cases.
Key Concerns: top_p breaking change not mentioned in PR description, and HuggingFace special-case removal needs verification.
| "or higher values (0.7-1.0) for more creative responses." | ||
| ), | ||
| ) | ||
| top_p: float | None = Field( |
There was a problem hiding this comment.
🟠 Important - Undocumented Breaking Change: The PR title and description only mention temperature changing to None, but top_p is also changing from 1.0 to None. This is a second breaking change that affects ALL models.
Impact: Users who explicitly relied on top_p=1.0 behavior (OpenAI default) will now get provider-specific defaults, which may differ.
Recommendation: Update the PR description to explicitly call out the top_p change alongside the temperature change.
| if d.get("top_p", 1.0) == 1.0: | ||
| d["top_p"] = 0.9 | ||
|
|
||
| return d |
There was a problem hiding this comment.
🟠 Important - Special Case Removal: The HuggingFace-specific top_p coercion that was removed above this line existed because "HF doesn't support the OpenAI default value for top_p (1)".
Risk: Now that top_p defaults to None instead of 1.0, HuggingFace models might:
- Still reject whatever provider default they receive
- Behave unexpectedly if their provider default differs from 0.9
Question: Was this tested with actual HuggingFace models? If the provider now handles None correctly for HF models, this removal is fine. But if not, we're trading one compatibility issue (claude-4.6) for another (HuggingFace).
Recommendation: Either verify HF models work with top_p=None or preserve this special case.
Summary
Change
get_default_temperatureto returnNoneby default instead of0.0. This allows models to use their provider's default temperature setting rather than forcingtemperature=0.0for all models.Motivation
As discussed in #1913, always setting
temperature=0.0causes issues with some models (like claude-4.6) that don't accept this value. By returningNone, we let the provider use their own default temperature, which:Changes
get_default_temperature()now returnsfloat | Noneinstead offloat0.0toNonekimi-k2-thinking,kimi-k2.5) still explicitly return1.0LLMclass to reflect the new behaviorNoneinstead of0.0Testing
All 583 LLM tests pass. Pre-commit hooks pass on all modified files.
Fixes #1913
@neubig can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:b84941d-pythonRun
All tags pushed for this build
About Multi-Architecture Support
b84941d-python) is a multi-arch manifest supporting both amd64 and arm64b84941d-python-amd64) are also available if needed