Add durable recovery for Think chat RPC#1517
Merged
Merged
Conversation
Wrap Think.chat() RPC turns in the same chatRecovery fiber machinery used by the other Think turn entry points so interrupted sub-agent turns can recover partial output after Durable Object eviction. This routes chat() through keepAliveWhile(), the turn queue, AbortRegistry signal linking, runFiber(), and a dedicated RPC streaming helper that persists UIMessage chunks through ResumableStream before forwarding them to the caller callback. Remove typed support for per-call chat(options.tools) because ephemeral caller-provided tools do not fit the durable sub-agent model. Legacy runtime callers still get a warning, but the tools are ignored; child turns now get their capabilities from the child agent itself through getTools(), session/context tools, extensions, MCP, and client tool schemas. The docs now steer parent-child orchestration toward agentTool() / runAgentTool() when retained runs, abort bridging, replay, and UI drill-in are desired. Tighten recovery edge cases around terminal stream metadata so recovered fibers whose stream already completed or errored do not schedule duplicate continuations, while still persisting recoverable partial output and finalizing any associated durable submissions. The tests cover durable chat() stash usage, stream chunk persistence, completed-stream recovery, ignored runtime tools, and the e2e parent-to-child RPC chat recovery path. Add a central Think turn API chooser to explain when to use browser chat, saveMessages(), submitMessages(), raw subAgent(...).chat(), agentTool()/runAgentTool(), persistMessages(), and continueLastTurn(). Update the README, user docs, and design docs so the tool ownership, recovery, and durable submission stories are consistent across the feature surface. Co-authored-by: Cursor <cursoragent@cursor.com>
🦋 Changeset detectedLatest commit: 72d3002 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
agents
@cloudflare/ai-chat
@cloudflare/codemode
hono-agents
@cloudflare/shell
@cloudflare/think
@cloudflare/voice
@cloudflare/worker-bundler
commit: |
Mark RPC chat streams completed before persisting the finalized assistant message, matching the existing WebSocket stream ordering more closely. This reduces the race window where a recovered fiber could see a fully persisted assistant message while the associated stream still appears active. Add a regression test for the already-persisted completed-stream recovery case. Replaying stored chunks with the original start.messageId should update the existing assistant message rather than appending a duplicate. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR makes
Think.chat()use the same durable execution model as the other Think turn entry points. Sub-agent RPC chat turns are now queued underkeepAliveWhile(), wired throughAbortRegistry, wrapped inrunFiber()whenchatRecoveryis enabled, and streamed through a dedicated RPC callback helper that stores UIMessage chunks inResumableStreambefore forwarding them to the parent callback.It also narrows the
chat()API by removing typed support forChatOptions.tools. Runtime callers that still passoptions.toolsget a warning, but the tools are ignored. This keeps child-agent capabilities durable and owned by the child throughgetTools(), session/context tools, extensions, MCP tools, and client tool schemas. Parent-child orchestration docs now point users toagentTool()/runAgentTool()for retained runs, abort bridging, event replay, and UI drill-in.The recovery path now handles terminal stream metadata more carefully. If a recovered chat fiber already has a completed or errored stream row, it persists recoverable partial output and avoids scheduling another continuation. For durable submissions associated with recovered work, terminal stream recovery now finalizes the submission instead of leaving it running.
The documentation now includes a central Think turn API chooser covering:
useAgentChatsaveMessages()for server-triggered turns that can waitsubmitMessages()for durable acceptance and later inspectionsubAgent(...).chat()for direct streaming RPCagentTool()/runAgentTool()for retained child-agent orchestrationpersistMessages()for context injection without a model turncontinueLastTurn()as an advanced subclass/recovery primitiveNotable changes
Think.chat()RPC turns in chat recovery fibers.chat()stream chunks for recovery lookup and partial-message persistence._streamResultToRpcCallback()to keep RPC streaming behavior separate from WebSocket framing.callerTools/ChatOptions.toolsfrom the typed Think turn path.options.toolspassed tochat().chat()path directly.chat()stash usage, stream chunk persistence, terminal stream recovery, and ignored runtime tools.Test plan
npm run test --workspace @cloudflare/think -- src/tests/think-session.test.tsnpm run test:e2e --workspace @cloudflare/think -- src/e2e-tests/chat-recovery.test.tsnpm run build --workspace @cloudflare/thinknpm run checkNotes
I also synced the corresponding public docs changes in the sibling
cloudflare-docscheckout, but this PR intentionally contains onlycloudflare/agentschanges.Made with Cursor