Skip to content

feat(view): Incremental view construction and conversation state view property#2141

Open
csmith49 wants to merge 11 commits intomainfrom
feat/incremental-view
Open

feat(view): Incremental view construction and conversation state view property#2141
csmith49 wants to merge 11 commits intomainfrom
feat/incremental-view

Conversation

@csmith49
Copy link
Collaborator

@csmith49 csmith49 commented Feb 19, 2026

Summary

This PR updates the conversation state with a persistent View object that is incrementally updated as events are recorded.

To support this, View creation has been refactored to be incremental. Events are added one at a time using the View.add_event function, which updates the view in-place and handles condensation application, unhandled condensation request tracking, and so on.

This should improve performance (we no longer need to rebuild a view from all events every time the agent takes a step) and make View a first-class object accessible wherever the conversation state is.

Design Decisions

Views are maintained by ConversationState objects as a private field, and exposed as a property. This ensures the view is not serialized when the conversation state is saved. Instead, we rebuild the view from the whole list of events when the state is deserialized.

The idea is to avoid accidentally serializing events outside the file system store we maintain, but there is the tradeoff that loading conversation states might be slightly more expensive. This is an easy decision to change should the need arise.

Performance "Benefits"

While reducing the number of View.from_events calls will technically improve performance, in most use cases we expect it will be incredibly minor gains.

While working on this PR I had a stack trace sampler monitoring the OpenHands ACP instance that was assisting me. On that trace, something like 99% of the samples from the Agent.step function were API calls to the LLM. Other traces have that number lower (to make room for critic calls and callback handlers), but View construction has never been more than 0.5% of the duration of Agent.step.

Point is, we're hugely network-bound.

The real benefit of this PR is making View a first-class object accessible to more of the system. It exposes precisely the list of events that are converted to messages and sent to the LLM, and so represents the agent's "attention window". That has to be helpful outside the condenser.

Breaking Changes

  • View.condensations field removed.
  • View.enforce_properties changed from a static method to an in-place modification of the calling view.
  • prepare_llm_messages no longer takes a list events as input. Instead of converting that list of events into a view it takes a view directly.

Lastly, because View.enforce_properties is only called when a view is constructed directly from a list of events (instead of being incrementally built), we now only enforce properties when conversation states are loaded from disk.

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:b6e828d-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-b6e828d-python \
  ghcr.io/openhands/agent-server:b6e828d-python

All tags pushed for this build

ghcr.io/openhands/agent-server:b6e828d-golang-amd64
ghcr.io/openhands/agent-server:b6e828d-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:b6e828d-golang-arm64
ghcr.io/openhands/agent-server:b6e828d-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:b6e828d-java-amd64
ghcr.io/openhands/agent-server:b6e828d-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:b6e828d-java-arm64
ghcr.io/openhands/agent-server:b6e828d-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:b6e828d-python-amd64
ghcr.io/openhands/agent-server:b6e828d-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:b6e828d-python-arm64
ghcr.io/openhands/agent-server:b6e828d-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:b6e828d-golang
ghcr.io/openhands/agent-server:b6e828d-java
ghcr.io/openhands/agent-server:b6e828d-python

About Multi-Architecture Support

  • Each variant tag (e.g., b6e828d-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., b6e828d-python-amd64) are also available if needed

@csmith49 csmith49 self-assigned this Feb 19, 2026
csmith49 and others added 2 commits February 19, 2026 16:20
…ties

Replace @cached_property with a manually-invalidated cache for
View.manipulation_indices. The cached_property was never invalidated
when add_event or enforce_properties mutated self.events, causing
stale indices to be returned.

Now _invalidate_manipulation_indices() is called in every branch of
add_event and enforce_properties that modifies the events list.

Co-authored-by: openhands <openhands@all-hands.dev>
@csmith49 csmith49 changed the title feat(view): Incremental view construction feat(view): Incremental view construction and conversation state view property Feb 19, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 19, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/agent
   agent.py2313684%94, 98, 238–240, 242, 272–273, 280–281, 313, 366–367, 369, 409, 548–549, 554, 566–567, 572–573, 592–593, 595, 623–624, 631–632, 636, 644–645, 682, 688, 700, 707
   utils.py55394%62, 82–83
openhands-sdk/openhands/sdk/context/view
   view.py72790%82, 93–98
openhands-sdk/openhands/sdk/conversation
   state.py191895%188, 192, 203, 354, 400–402, 516
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py3291894%277, 282, 310, 375, 417, 563–564, 567, 713, 721, 723, 734, 736–738, 920, 927–928
TOTAL18332557069% 

@csmith49 csmith49 added the integration-test Runs the integration tests and comments the results label Feb 19, 2026
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@csmith49 csmith49 added condenser-test Triggers a run of all condenser integration tests and removed integration-test Runs the integration tests and comments the results labels Feb 20, 2026
@github-actions
Copy link
Contributor

Hi! I started running the condenser tests on your PR. You will receive a comment with the results shortly.

Note: These are non-blocking tests that validate condenser functionality across different LLMs.

@github-actions
Copy link
Contributor

Condenser Test Results (Non-Blocking)

These tests validate condenser functionality and do not block PR merges.

🧪 Integration Tests Results

Overall Success Rate: 100.0%
Total Cost: $0.99
Models Tested: 2
Timestamp: 2026-02-20 03:32:37 UTC

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_anthropic_claude_opus_4_5_20251101 100.0% 5/5 0 5 $0.90 440,917
litellm_proxy_gpt_5.1_codex_max 100.0% 2/2 3 5 $0.09 62,635

📋 Detailed Results

litellm_proxy_anthropic_claude_opus_4_5_20251101

  • Success Rate: 100.0% (5/5)
  • Total Cost: $0.90
  • Token Usage: prompt: 424,641, completion: 16,276, cache_read: 374,439, cache_write: 43,738, reasoning: 911
  • Run Suffix: litellm_proxy_anthropic_claude_opus_4_5_20251101_6519856_opus_condenser_run_N5_20260220_032807

litellm_proxy_gpt_5.1_codex_max

  • Success Rate: 100.0% (2/2)
  • Total Cost: $0.09
  • Token Usage: prompt: 58,879, completion: 3,756, cache_read: 18,304, reasoning: 1,792
  • Run Suffix: litellm_proxy_gpt_5.1_codex_max_6519856_gpt51_condenser_run_N5_20260220_032804
  • Skipped Tests: 3

Skipped Tests:

  • c01_thinking_block_condenser: Model litellm_proxy/gpt-5.1-codex-max does not support extended thinking or reasoning effort
  • c05_size_condenser: This test stresses long repetitive tool loops to trigger size-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.
  • c04_token_condenser: This test stresses long repetitive tool loops to trigger token-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.

@csmith49 csmith49 marked this pull request as ready for review February 20, 2026 03:41
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Solid design making View a first-class object. Manual cache invalidation is correct but fragile. No critical issues.

Key Insight: The real win here isn't performance—it's making View an explicit representation of the agent's attention window, accessible throughout the system. Good architectural move.

See inline comments for improvement opportunities.

for property in ALL_PROPERTIES:
results &= property.manipulation_indices(self.events)

self._cached_manipulation_indices = results
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Manual cache invalidation is fragile. This works now, but it's easy to forget _invalidate_manipulation_indices() when adding new methods that modify self.events.

Suggestion: Consider documenting this invariant clearly, or use a different pattern like:

  • Wrap self.events in a custom list type that auto-invalidates on mutation
  • Use Pydantic's @computed_field(cached=True) if the model config allows it

Not blocking—current implementation is correct—but this is a maintenance risk.

Comment on lines 106 to +127
Since enforcement is intended as a fallback to inductively maintaining the
properties via the associated manipulation indices, any time a property must be
enforced a warning is logged.

Modifies the view in-place.
"""
for property in ALL_PROPERTIES:
events_to_forget = property.enforce(current_view_events, all_events)
events_to_forget = property.enforce(self.events, all_events)
if events_to_forget:
logger.warning(
f"Property {property.__class__} enforced, "
f"{len(events_to_forget)} events dropped."
)
return View.enforce_properties(
[
event
for event in current_view_events
if event.id not in events_to_forget
],
all_events,
)
return current_view_events
self.events = [
event for event in self.events if event.id not in events_to_forget
]
self._invalidate_manipulation_indices()

@staticmethod
def from_events(events: Sequence[Event]) -> View:
"""Create a view from a list of events, respecting the semantics of any
condensation events.
# If we've forgotten events to enforce the properties, we'll need to
# attempt to apply each property again. Once we get all the way through
# the properties without any kind of modification, we can exit the loop.
self.enforce_properties(all_events)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: The recursive call with break is a bit harder to follow than necessary. The old tail-recursive pattern was clearer.

Consider making the recursion explicit:

def enforce_properties(self, all_events: Sequence[Event]) -> None:
    """Enforce all properties, recursively dropping events until stable."""
    while True:
        changed = False
        for property in ALL_PROPERTIES:
            events_to_forget = property.enforce(self.events, all_events)
            if events_to_forget:
                logger.warning(
                    f"Property {property.__class__} enforced, "
                    f"{len(events_to_forget)} events dropped."
                )
                self.events = [
                    event for event in self.events 
                    if event.id not in events_to_forget
                ]
                self._invalidate_manipulation_indices()
                changed = True
                break  # Restart property checks
        if not changed:
            break  # All properties satisfied

This makes it clearer that we're iterating until a fixed point.

logger.debug(f"Skipping non-LLMConvertibleEvent of type {type(event)}")

@staticmethod
def from_events(events: Sequence[Event]) -> View:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit: This incrementally builds the view, which is good. But there's no test coverage for the case where a non-LLMConvertibleEvent (that's not condensation-related) is added.

Consider adding a test that verifies these are silently skipped:

def test_add_event_skips_non_llm_convertible():
    view = View()
    view.add_event(ConversationStateUpdateEvent(...))
    assert len(view.events) == 0

Not critical, just helps document the behavior.

@enyst
Copy link
Collaborator

enyst commented Feb 20, 2026

@OpenHands Do a /codereview-roasted on this PR. Post your review as a comment on the PR. Note that it will be rendered as markdown.

Once you posted, take a step back and look with fresh eyes: re-read critically your review and the PR, and analyze every point of it, is it actually real, is it minor, is it major compared to what it seemed? Post your re-analysis as a new comment on the PR.

@openhands-ai
Copy link

openhands-ai bot commented Feb 20, 2026

I'm on it! enyst can track my progress at all-hands.dev

Copy link
Collaborator

enyst commented Feb 20, 2026

/codereview-roasted

Taste rating: 🔴 Needs improvement

You took a relatively simple data model ("the event log is the source of truth") and replaced it with two mutable sources of truth: EventLog and a cached/derived View that must be kept in sync manually.

That can be fine if you make it impossible to get out of sync. Right now it’s very possible, and the failure mode is the worst kind: silent wrong behavior.


[CRITICAL ISSUES]

  • ConversationState now has a cache without a hard invalidation story

    • Where: openhands-sdk/openhands/sdk/conversation/state.py (view + add_event at ~206–218), openhands-sdk/openhands/sdk/agent/agent.py (uses state.view at ~246)

    • Problem: state.events is still exposed as a mutable EventLog, but only ConversationState.add_event() updates the view. Any code doing the old/obvious thing:

      state.events.append(e)

      will now silently fail to update state.view, and therefore the next LLM call (which now uses state.view) will ignore those events.

      This is a behavioral break that won’t crash; it’ll just make agents “forget” things and make debugging miserable.

    • Fix direction: either (a) make events append go through ConversationState.add_event (proxy/wrapper), (b) stop exposing a mutable event log, or (c) compute the view lazily from events (invalidate on append) instead of keeping two independently-mutable structures.

  • View.add_event() can poison the view permanently after a bad Condensation

    • Where: openhands-sdk/openhands/sdk/context/view/view.py (add_event ~130–169, enforcement ~100–129)

    • Problem: add_event() applies Condensation.apply() directly (~153–156) but does not re-run enforce_properties(). Since Condensation.apply() can forget arbitrary IDs, it can violate ToolCallMatchingProperty / BatchAtomicityProperty invariants.

      Previously, rebuilding a view via from_events() + enforcement was an automatic “self-heal” step. Now enforcement only happens on resume (ConversationState.create(... resume ...) rebuilds view). In other words: one malformed condensation can corrupt the runtime view for the rest of the process.

    • Fix direction: enforce properties (or at least validate) after applying a Condensation, or enforce at the ConversationState.add_event() layer when the appended event is a Condensation.


[IMPROVEMENT OPPORTUNITIES]

  • Breaking API churn with no compatibility shim

    • Where: openhands-sdk/openhands/sdk/agent/utils.py (prepare_llm_messages now takes View at ~114–179)
    • Problem: This is a signature-breaking change in a utility function that downstream users will import.
    • Fix direction: accept both View and Sequence[Event] (deprecated path), or provide a transitional helper.
  • Manual caching + public mutability is a trap

    • Where: openhands-sdk/openhands/sdk/context/view/view.py (manipulation_indices cache ~40–60)
    • Problem: you fixed invalidation for the mutations you control (add_event, enforce_properties), but view.events is still a public list. Any direct mutation makes the cache wrong again.
    • Fix direction: either treat events as effectively private (and document that), or stop caching something derived from a mutable list you don’t control.
  • Dead helper left behind

    • Where: View.unhandled_condensation_request_exists (view.py ~84–98)
    • Problem: unused after the incremental refactor. It’s harmless, but it reads like unfinished surgery.

[TESTING GAPS]

  • The new tests/sdk/conversation/local/test_state_view.py is good coverage for the happy path.
  • What’s missing is a test for the real regression risk introduced by the cache:
    • If state.events.append() is still “allowed”, verify the view stays correct (or explicitly assert it doesn’t and document that add_event() is required).

VERDICT

Needs rework before I’d be comfortable calling this “safer/faster”. The concept (incremental view) is fine; the current implementation relies on “everyone always uses the right mutation path” and that’s not an engineering strategy.

KEY INSIGHT

You didn’t add a View; you added a cache. Caches must have a bulletproof invalidation model, or they become distributed lies.

Copy link
Collaborator

enyst commented Feb 20, 2026

Re-analysis of my /codereview-roasted (fresh eyes)

Re-reading both the PR and my own comment, here’s what I think is actually important vs what’s more “defense-in-depth / taste”.

Still major / real

  • ConversationState.events vs ConversationState.view divergence is a genuine behavioral trap.
    • This isn’t theoretical: the SDK continues to expose a mutable EventLog via state.events, but Agent.step() now uses state.view to build the LLM context.
    • If any downstream code has ever appended events directly to state.events (which used to be the obvious / only knob), those events will now be omitted from the LLM context silently.
    • Even if maintainers consider state.events.append() “internal”, the current API surface doesn’t communicate that, and the failure mode is non-obvious.

If only one thing gets addressed, it should be this: either make events effectively append-only through ConversationState.add_event() (proxy/wrapper), or make it impossible/explicitly unsupported to mutate events directly.

Probably medium / depends on invariants

  • “Bad Condensation poisons view”: this is more about robustness than an immediate bug.
    • In the intended flow, condensers should only forget along manipulation_indices, so properties should remain satisfied by construction.
    • However, the moment you accept that Condensation.apply() can remove arbitrary IDs (it can), then skipping any enforcement/validation after applying a Condensation means the view can become invalid until restart/resume.

So: I still think a post-condensation sanity/enforcement hook is a good idea, but I’d downgrade this from “guaranteed breakage” to “cheap guardrail that prevents future weirdness / version skew issues”.

Probably minor

  • prepare_llm_messages(events -> view) signature break: likely low impact inside this repo (only internal callers), but could break downstream imports. Whether that matters depends on how “public” you consider openhands.sdk.agent.utils.
  • Manual caching + mutable list: real, but typical. If View.events is treated as internal, the cache is fine.
  • Unused helper (unhandled_condensation_request_exists): pure cleanup nit.

Net

My original review wasn’t wrong about the core risk, but I was too harsh in implying the condensation enforcement change is necessarily a runtime bug. The API/invariant mismatch between events and view is the only point I’d call “must address”, because it’s the one that can bite real users without any obvious stacktrace.

@openhands-ai
Copy link

openhands-ai bot commented Feb 20, 2026

Summary of work completed

  • Retrieved PR feat(view): Incremental view construction and conversation state view property #2141 metadata and inspected the key diffs affecting:

    • View incremental construction (View.add_event, in-place enforce_properties, and manipulation_indices caching)
    • ConversationState gaining a persistent private _view and a view property, plus add_event() to keep _events and _view in sync
    • Agent/conversation paths switching from state.eventsstate.view for prepare_llm_messages
    • Test updates and new tests covering view synchronization and cache invalidation
  • Posted a /codereview-roasted review as a markdown PR comment focusing on the main engineering risks:

    • Two mutable sources of truth (state.events vs state.view) and the silent divergence hazard if anything appends directly to state.events
    • Potential invariant breakage if a Condensation forgets events in a way that violates view properties and enforcement isn’t re-run at runtime
    • API churn risk (prepare_llm_messages signature change), cache-vs-mutable-list pitfalls, and leftover unused helper
  • Then posted a second PR comment with a fresh-eyes re-analysis, downgrading some points and re-emphasizing the one truly major issue:

    • The events vs view divergence trap remains the “must address” item
    • Condensation enforcement concern reframed as robustness/guardrail rather than guaranteed bug

Compliance checklist (per request)

  • Did a /codereview-roasted on the PR
  • Posted the review as a PR comment (markdown)
  • Re-read critically and posted a separate re-analysis comment

Changes / pushing

  • No code changes were made in the repository as part of this request, so there is nothing to push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

condenser-test Triggers a run of all condenser integration tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments