Enable configurable context condensation in all benchmarks#429
Open
juanmichelini wants to merge 2 commits intomainfrom
Open
Enable configurable context condensation in all benchmarks#429juanmichelini wants to merge 2 commits intomainfrom
juanmichelini wants to merge 2 commits intomainfrom
Conversation
This change enables context condensation in all benchmarks and makes it configurable via config.py files and command-line arguments. The default condenser from software-agent-sdk is now used by default with max_size=80 and keep_first=4. Changes: - Add condenser configuration fields to EvalMetadata - Add CONDENSER_DEFAULTS to config.py files in swebench, swtbench, and swebenchmultimodal - Add command-line arguments for controlling condenser (--enable-condenser, --disable-condenser, --condenser-max-size, --condenser-keep-first) - Update agent creation in all benchmarks to use LLMSummarizingCondenser when enabled - Add comprehensive tests for condenser configuration Fixes #407 Co-authored-by: openhands <openhands@all-hands.dev>
Collaborator
Author
|
@OpenHands "condenser_max_size" should be 240 and "condenser_keep_first": should be 2. change that for all benchmarks |
|
I'm on it! juanmichelini can track my progress at all-hands.dev |
…and Commit0 support - Update condenser_max_size from 80 to 240 and condenser_keep_first from 4 to 2 across all benchmarks - Update EvalMetadata defaults in models.py to use new values (240/2) - Add condenser configuration to GAIA and Commit0 benchmarks: - Add CONDENSER_DEFAULTS to config.py files - Add LLMSummarizingCondenser import and agent creation logic to run_infer.py files - Update tests to reflect new condenser defaults and add coverage for GAIA and Commit0 - All tests pass and pre-commit checks pass Co-authored-by: openhands <openhands@all-hands.dev>
Summary of ChangesI've successfully updated the condenser configuration across all benchmarks and added condenser support to GAIA and Commit0 as requested. Here's what was completed: ✅ Checklist - All Requirements Met
Verification
The PR is now updated with all requested changes. All benchmarks (SWE-bench, SWT-bench, SWE-bench Multimodal, GAIA, and Commit0) now use |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR enables context condensation in all benchmarks and makes it configurable via
config.pyfiles and command-line arguments. The default condenser from software-agent-sdk (LLMSummarizingCondenser) is now used by default withmax_size=80andkeep_first=4.Fixes #407
Changes
Configuration
EvalMetadata: Added three new fields to support condenser configuration:
enable_condenser(bool, default: True): Enable/disable the context condensercondenser_max_size(int, default: 80): Maximum number of events before condensingcondenser_keep_first(int, default: 4): Number of initial events to always keepBenchmark configs: Added
CONDENSER_DEFAULTSto:benchmarks/swebench/config.pybenchmarks/swtbench/config.pybenchmarks/swebenchmultimodal/config.pyCommand-Line Arguments
Added new CLI arguments to control condenser behavior:
--enable-condenser: Explicitly enable the condenser--disable-condenser: Disable the condenser (takes precedence over enable)--condenser-max-size N: Set the maximum number of events before condensing--condenser-keep-first N: Set the number of initial events to always keepAgent Creation
Updated agent creation in all benchmark evaluation classes to use
LLMSummarizingCondenserwhen enabled:benchmarks/swebench/run_infer.pybenchmarks/swtbench/run_infer.pybenchmarks/swebenchmultimodal/run_infer.pybenchmarks/multiswebench/run_infer.pyTesting
Added comprehensive test coverage in
tests/test_condenser_config.py:All tests pass and pre-commit checks (ruff, pycodestyle, pyright) pass.
Usage
Default behavior (condenser enabled)
Disable condenser
Custom condenser settings
Notes
--disable-condenserflag takes precedence over--enable-condenserto allow explicit disabling"condenser") to track token usage separately from the main agent@juanmichelini can click here to continue refining the PR