Skip to content

Add durable recovery for Think chat RPC#1517

Merged
threepointone merged 2 commits into
mainfrom
fix-think-chat-recovery
May 13, 2026
Merged

Add durable recovery for Think chat RPC#1517
threepointone merged 2 commits into
mainfrom
fix-think-chat-recovery

Conversation

@threepointone
Copy link
Copy Markdown
Contributor

@threepointone threepointone commented May 13, 2026

Summary

This PR makes Think.chat() use the same durable execution model as the other Think turn entry points. Sub-agent RPC chat turns are now queued under keepAliveWhile(), wired through AbortRegistry, wrapped in runFiber() when chatRecovery is enabled, and streamed through a dedicated RPC callback helper that stores UIMessage chunks in ResumableStream before forwarding them to the parent callback.

It also narrows the chat() API by removing typed support for ChatOptions.tools. Runtime callers that still pass options.tools get a warning, but the tools are ignored. This keeps child-agent capabilities durable and owned by the child through getTools(), session/context tools, extensions, MCP tools, and client tool schemas. Parent-child orchestration docs now point users to agentTool() / runAgentTool() for retained runs, abort bridging, event replay, and UI drill-in.

The recovery path now handles terminal stream metadata more carefully. If a recovered chat fiber already has a completed or errored stream row, it persists recoverable partial output and avoids scheduling another continuation. For durable submissions associated with recovered work, terminal stream recovery now finalizes the submission instead of leaving it running.

The documentation now includes a central Think turn API chooser covering:

  • browser chat via useAgentChat
  • saveMessages() for server-triggered turns that can wait
  • submitMessages() for durable acceptance and later inspection
  • raw subAgent(...).chat() for direct streaming RPC
  • agentTool() / runAgentTool() for retained child-agent orchestration
  • persistMessages() for context injection without a model turn
  • continueLastTurn() as an advanced subclass/recovery primitive

Notable changes

  • Wrap Think.chat() RPC turns in chat recovery fibers.
  • Persist chat() stream chunks for recovery lookup and partial-message persistence.
  • Add _streamResultToRpcCallback() to keep RPC streaming behavior separate from WebSocket framing.
  • Remove callerTools / ChatOptions.tools from the typed Think turn path.
  • Warn and ignore legacy runtime options.tools passed to chat().
  • Avoid duplicate recovery continuation for already-terminal chat streams.
  • Finalize recovered durable submissions whose stream is already terminal.
  • Update e2e coverage so helper sub-agent recovery exercises the RPC chat() path directly.
  • Add/adjust unit coverage for durable chat() stash usage, stream chunk persistence, terminal stream recovery, and ignored runtime tools.
  • Update Think docs, README, design notes, and changeset for the new API shape and chooser guidance.

Test plan

  • npm run test --workspace @cloudflare/think -- src/tests/think-session.test.ts
  • npm run test:e2e --workspace @cloudflare/think -- src/e2e-tests/chat-recovery.test.ts
  • npm run build --workspace @cloudflare/think
  • npm run check

Notes

I also synced the corresponding public docs changes in the sibling cloudflare-docs checkout, but this PR intentionally contains only cloudflare/agents changes.

Made with Cursor


Open in Devin Review

Wrap Think.chat() RPC turns in the same chatRecovery fiber machinery used by the other Think turn entry points so interrupted sub-agent turns can recover partial output after Durable Object eviction. This routes chat() through keepAliveWhile(), the turn queue, AbortRegistry signal linking, runFiber(), and a dedicated RPC streaming helper that persists UIMessage chunks through ResumableStream before forwarding them to the caller callback.

Remove typed support for per-call chat(options.tools) because ephemeral caller-provided tools do not fit the durable sub-agent model. Legacy runtime callers still get a warning, but the tools are ignored; child turns now get their capabilities from the child agent itself through getTools(), session/context tools, extensions, MCP, and client tool schemas. The docs now steer parent-child orchestration toward agentTool() / runAgentTool() when retained runs, abort bridging, replay, and UI drill-in are desired.

Tighten recovery edge cases around terminal stream metadata so recovered fibers whose stream already completed or errored do not schedule duplicate continuations, while still persisting recoverable partial output and finalizing any associated durable submissions. The tests cover durable chat() stash usage, stream chunk persistence, completed-stream recovery, ignored runtime tools, and the e2e parent-to-child RPC chat recovery path.

Add a central Think turn API chooser to explain when to use browser chat, saveMessages(), submitMessages(), raw subAgent(...).chat(), agentTool()/runAgentTool(), persistMessages(), and continueLastTurn(). Update the README, user docs, and design docs so the tool ownership, recovery, and durable submission stories are consistent across the feature surface.

Co-authored-by: Cursor <cursoragent@cursor.com>
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 13, 2026

🦋 Changeset detected

Latest commit: 72d3002

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@cloudflare/think Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 6 additional findings.

Open in Devin Review

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 13, 2026

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1517

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1517

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1517

hono-agents

npm i https://pkg.pr.new/hono-agents@1517

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1517

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1517

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1517

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1517

commit: 72d3002

Mark RPC chat streams completed before persisting the finalized assistant message, matching the existing WebSocket stream ordering more closely. This reduces the race window where a recovered fiber could see a fully persisted assistant message while the associated stream still appears active.

Add a regression test for the already-persisted completed-stream recovery case. Replaying stored chunks with the original start.messageId should update the existing assistant message rather than appending a duplicate.

Co-authored-by: Cursor <cursoragent@cursor.com>
@threepointone threepointone merged commit 449b421 into main May 13, 2026
4 checks passed
@threepointone threepointone deleted the fix-think-chat-recovery branch May 13, 2026 09:38
@github-actions github-actions Bot mentioned this pull request May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant