Scenario
I hit this pitfall directly.
Long agent tasks often involve many rounds of tool calls: browsing web pages, reading files, running commands, editing code, and validating again. Context grows quickly, especially when tool results and thinking/tool_use structures are long. Without automatic compression, the final symptoms are often model 400 errors, empty responses, slowness, or the model starting to forget the current goal.
Current Pain Points
- Simple truncation loses important task chains.
- Trimming only by message count is not enough, because a single tool result may be very large.
- Waiting until the provider reports a context-limit error is too late.
- There is no compression trigger based on call count, context length, or tool-result length.
Suggested Direction
Add a layered compression strategy:
- Triggers: LLM call count, estimated token/character count, tool-result length, and message count.
- Compression targets: old thinking, tool_use, tool_result, web-page output, and long command output.
- Preserved content: recent turns, the original user goal, current plan, unfinished TODOs, key file paths, and diff summaries.
- The compressed output should ideally be structured:
Goal / Completed / Current State / TODO / Key Evidence / Risks.
- If compression fails, fall back to safe truncation and record it in logs.
A lightweight direction is to first compress old <thinking>/<tool_use>/<tool_result> blocks at the tag level, then force deeper compression when the context crosses a threshold. Longer term, this could evolve into a real context engine.
Acceptance Criteria
- A synthetic 50+ tool-call long task should not directly fail with 400 because of context growth.
- After compression, the user goal, current plan, and recent tool results are still preserved.
- Logs show when compression was triggered and the before/after size.
- Compression thresholds are configurable.
Scenario
I hit this pitfall directly.
Long agent tasks often involve many rounds of tool calls: browsing web pages, reading files, running commands, editing code, and validating again. Context grows quickly, especially when tool results and
thinking/tool_usestructures are long. Without automatic compression, the final symptoms are often model 400 errors, empty responses, slowness, or the model starting to forget the current goal.Current Pain Points
Suggested Direction
Add a layered compression strategy:
Goal / Completed / Current State / TODO / Key Evidence / Risks.A lightweight direction is to first compress old
<thinking>/<tool_use>/<tool_result>blocks at the tag level, then force deeper compression when the context crosses a threshold. Longer term, this could evolve into a real context engine.Acceptance Criteria