Templates: Durable prose-recovery + always-rotate (companion to ai-bridge#425)#204
Templates: Durable prose-recovery + always-rotate (companion to ai-bridge#425)#204
Conversation
Pins both advanced templates to the ai-bridge PR branch so the long-running agent server crash-resumes in-flight runs via heartbeat + CAS claim. Revert the [tool.uv.sources] entry once that PR merges and a new release is cut. Also fixes a latent IndexError in agent-openai-advanced's deduplicate_input: when the long-running server re-invokes the handler with input=[] to resume from the session (the agnostic resume contract validated by prototyping), messages[-1] blew up. Now we return [] for empty input — the session already has prior turns so there is nothing to dedupe. No change to either template's agent.py.
Makes the bundled chat UI durable end-to-end without any client-side
changes. The Express /invocations proxy in e2e-chatbot-app-next now:
- Rewrites streaming POSTs to { ...body, background: true, stream: true },
so every user turn persists each SSE event to Lakebase via
LongRunningAgentServer.
- Sniffs response.id + sequence_number out of the forwarded SSE stream.
- If upstream closes before [DONE] (pod died, lost connection), the proxy
transparently reconnects via
GET /responses/{id}?stream=true&starting_after=N
and resumes emitting events to the still-connected browser client. The
browser sees one continuous stream.
Non-streaming requests and non-POST methods keep the original passthrough
behavior.
Also points agent-openai-advanced/scripts/start_app.py at the
dhruv0811/durable-execution-templates branch of app-templates so the new
proxy code is actually deployed (override via APP_TEMPLATES_BRANCH env
var). Revert once this lands on main.
… actually fires
Previous attempt left the proxy dead-code: the Node AI SDK honored API_PROXY
verbatim and sent requests straight to http://localhost:8000/invocations
(FastAPI), skipping the Express /invocations handler at :3000 entirely.
Confirmed in logs: requests reached the backend with {"stream": true}
but never with "background": true.
Split the two concerns across env vars:
API_PROXY=http://localhost:3000/invocations (AI SDK -> Express proxy)
AGENT_BACKEND_URL=http://localhost:8000/invocations (Express proxy -> FastAPI)
Express handler prefers AGENT_BACKEND_URL, falls back to API_PROXY for
backwards compat so existing templates don't break.
response_id is buried in the raw backend SSE stream and never surfaces to the browser because the Vercel AI SDK re-wraps the stream as its own message format before sending to the client. Log it on the server side instead so test instructions can `grep 'background started response_id=' ` from apps logs. Also distinguish the startup log so it's clear the durable-resume code path is live. No behavior change; pure observability.
app.yaml env vars were overriding databricks.yml at runtime, so the AI SDK was still talking directly to the Python FastAPI backend and the Express /invocations proxy never saw the request. Keep both files in sync.
…RL to FastAPI The script was unconditionally overwriting API_PROXY with the backend URL right before launching the frontend, which defeated our whole durable- resume-rewrite story: the Node AI SDK bypassed the Express /invocations handler and streamed straight from FastAPI. Fix: API_PROXY now points at CHAT_APP_PORT (the Express proxy), and we default AGENT_BACKEND_URL (previously unset) to the Python backend. Use os.environ.setdefault for AGENT_BACKEND_URL so operators can still override via databricks.yml or app.yaml.
…resp_* Broadens the response_id parser so it works whether the backend tags frames with top-level response_id (preferred) or the older nested-only shape.
…tally Matches the [/invocations] prefix so the full story is greppable from apps logs without correlating Node and Python timestamps.
The library logger inherits from root (default WARNING) so INFO-level lifecycle messages from LongRunningAgentServer (heartbeat, claim, resume, stream lifecycle) were being dropped. Set both the ai-bridge logger and the root level to LOG_LEVEL so apps logs carry the full durable-resume story without requiring callers to tune logging themselves.
When a response is killed mid-stream, the partial assistant text that was already rendered to the client kept receiving fresh deltas from attempt 2 — users saw attempt-1-partial + attempt-2-full concatenated in one bubble. Express /invocations proxy now seals the in-progress assistant message across an attempt boundary: 1. On upstream close without [DONE], immediately append a '(connection interrupted — reconnecting…)' suffix delta to the active message so the user sees something is happening during the ~10s stale window. 2. On the response.resumed sentinel, emit synthetic response.content_part.done + response.output_item.done events for the active message — effectively ending the first assistant bubble at OpenAI Responses API level. 3. Attempt 2's natural response.output_item.added (with a fresh item_id) then creates a clean second bubble showing the full answer. Tool calls naturally de-dup by call_id across attempts, so no closure synthesis needed for them. Also mirrors the routing + logging fixes previously applied to agent-openai-advanced onto agent-langgraph-advanced so both templates get durable resume with the full [durable] log lifecycle visible: - app.yaml + databricks.yml: split API_PROXY (-> Express :3000) from AGENT_BACKEND_URL (-> FastAPI :8000). - scripts/start_app.py: honor AGENT_BACKEND_URL, point API_PROXY at the Express proxy, clone e2e-chatbot-app-next from the durable-execution branch. - agent_server/start_server.py: raise databricks_ai_bridge + root logger to LOG_LEVEL so [durable] INFO lines surface in apps logs.
Durable-resume can interrupt the pod between an LLM emitting tool_calls and the SDK finishing the tool executions — the Session is left with function_call items whose matching function_call_output never got written. The next LLM request over that session fails: 400 BAD_REQUEST: An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_xxx, call_yyy, ... Piggy-back on deduplicate_input (which already touches the session each turn) to inject synthetic function_call_output items for every orphan function_call. Message is plain-text, so the LLM sees 'tool X was interrupted, please retry if needed' and can decide whether to re-call or continue. No change to agent.py.
The previous heal added synthetic function_call_output at the END of the session (add_items only appends). When the conversation has a message between the orphan function_call and the synthetic output, the SDK rebuilds the LLM request as an assistant-with-tool_calls message that doesn't have its tool responses right after it, and the API rejects with 'assistant message with tool_calls must be followed by tool messages'. Also: the Vercel AI SDK client echoes the full conversation back each turn. deduplicate_input drops most of it but the Runner.run path can still re-persist prior items, leaving DUPLICATE function_call rows for the same call_id. Replace with a clear+rebuild sanitize pass: dedupe function_call / function_call_output by call_id, inject synthetic outputs immediately after any orphan function_call, clear the session, and re-add the canonical sequence. No-op when already clean.
Keep the UI minimal but fix the doubled-text issue: when a mid-stream
kill happens, the AI SDK merges all deltas within one streamText call
into one UIMessage — so our proxy-level seal events were valid but
invisible, and attempt 2's text kept appending to attempt 1's partial.
Minimal solution:
1. Express /invocations proxy already emits response.resumed at the
attempt boundary (unchanged).
2. chat.ts server: detect response.resumed via onChunk and forward it
to the UI stream as { type: 'data-resumed', data: { attempt } }.
3. chat.tsx client: on 'data-resumed', call setMessages to drop all
text parts from the last (assistant) message. Tool call parts stay
because they dedupe by call_id naturally.
Also: fix auto-resume loop burning MAX_RESUME_ATTEMPTS on terminal
errors by exiting early when an error event with code=task_failed or
code=task_timeout comes through the proxy.
No changes to agent.py. Agnosticism tenet intact.
Your 'clean up at end of stream' idea — much more robust than relying on mid-stream mutation sticking. On data-resumed we now snapshot the attempt-1 text length, and in onFinish we slice exactly that many chars off the front of the last assistant message's text parts. Whatever the AI SDK accumulator did during streaming, the final rendered state contains only attempt 2's content. The mid-stream mutation wipe stays in place too — when it sticks the text visibly clears during the 10s stale window, which is nicer UX than waiting for onFinish. When it doesn't stick, onFinish catches it.
PreviewMessage is memoized: while loading it compares prevProps.message to nextProps.message by reference; when not loading it deep-equals the parts array (which short-circuits on identical references). Our previous truncate mutated part.text in place and returned [...prev] — same message + same parts array refs, so the memo skipped the re-render and the old text stuck on screen even though state was technically updated. Map to NEW part objects with sliced text and wrap a NEW message object so both the reference check (loading path) and deep-equal (done path) see a change and re-render.
State-level wipes were getting clobbered by the AI SDK accumulator — ReactChatState.replaceMessage deep-clones state.message on every write(), and activeTextParts keeps mutating the originals behind the UI's back. Solution: transform at the VIEW layer instead of fighting the state machine. Chat component tracks attempt1TextLen per messageId (state, not ref, so it propagates to children). Messages maps each message through a render-time slice that drops the leading attempt-1 chars from text parts before passing to PreviewMessage. Creates new message + part objects so the memo's reference check trips and the component re-renders. onFinish still does the authoritative setMessages truncate so the persisted-to-DB final message reflects only attempt 2. That truncate now also clears attempt1TextLen, so the render-time slice becomes a no-op after completion (state is already truncated).
…cution-templates # Conflicts: # agent-openai-advanced/databricks.yml
Drop the [chat][onData] / [chat][onFinish] / [chat][onChunk] tracing statements that were used to trace the attempt-1 → attempt-2 flow while tuning the render-time slice and post-stream truncate. The server-side Express proxy still logs resume lifecycle (background started / resume fetch / terminal error / stream done) since that's operationally useful; the ai-bridge backend's [durable] INFO logs stay as-is. Co-authored-by: Isaac
Move the per-template workarounds for mid-tool crash-resume into the
databricks-ai-bridge library and wire them in:
- agent-openai-advanced/utils.py: deduplicate_input now calls
session.repair() (new public method on AsyncDatabricksSession) instead
of the 100-line in-template _sanitize_session. Same behavior — dedupe
function_call/function_call_output by call_id, inject synthetic
outputs for orphans — just owned by the library.
- agent-langgraph-advanced/agent.py: before agent.astream, call
build_tool_resume_repair on the checkpointer's messages and apply via
agent.aupdate_state(..., as_node="tools"). The as_node is critical —
without it LangGraph re-evaluates the model→{tools,END} branch from
the updated state and crashes with KeyError: 'model'.
- agent-langgraph-advanced/agent.py: when the checkpointer already has
a thread, only forward the latest user turn from request.input — the
UI client (Vercel AI SDK) re-echoes the full history on every turn,
which can re-inject orphan tool_uses from a previously-interrupted
attempt that the client kept in its buffer.
Both pyproject.toml files now pin databricks-openai / databricks-langchain
to the same ai-bridge branch (subdirectory git sources) so the new
helpers are picked up. Temporary; revert to registry once the bridge PR
merges.
Co-authored-by: Isaac
Library side (databricks-langchain, PR #416):
- New build_tool_resume_repair_middleware() returns an AgentMiddleware whose
before_model hook runs build_tool_resume_repair. Swaps the manual
aget_state / aupdate_state(as_node="tools") surgery in the template for a
one-line `middleware=[...]` arg to create_agent.
- The as_node="tools" footgun (KeyError: 'model' in the model→{tools,END}
conditional branch re-eval) disappears entirely; repair runs inside the
graph's own execution flow, not as external state surgery.
Template (agent-langgraph-advanced):
- init_agent: add middleware=[build_tool_resume_repair_middleware()] to
create_agent. stream_handler drops the 8-line repair block.
- utils.py process_agent_astream_events: skip None node_data (the graph's
updates stream emits {middleware_node: None} when the middleware is a
no-op, which is every turn on the happy path).
UI (e2e-chatbot-app-next):
- On data-resumed from the backend, wipe text parts from the last assistant
message in one setMessages. Tool-call parts are kept as-is (they already
dedupe across attempts by call_id). Dropped:
* attempt1TextLen state + per-message snapshot in onData
* render-time text slice in Messages.tsx
* onFinish authoritative post-stream truncate
The AI SDK's seal-on-resume synthesis (Express proxy) still creates a
fresh output_item_id for attempt 2, so new deltas land in a fresh text
part — our wipe of the old text part is sufficient.
Net: -99 LOC across 4 files. Same behavior for the "delete old text,
leave tools alone" UX; substantially less state-machine choreography.
Co-authored-by: Isaac
setMessages can't wipe mid-stream — the AI SDK's activeResponse.state is a snapshot taken at makeRequest time, and every text-delta calls write() → this.state.replaceMessage(lastIdx, activeResponse.state.message), which overwrites any setMessages we do. Our wipe was visible for a single chunk then reverted. Fix: snapshot the assistant message's parts.length at data-resumed, and at render time hide text parts at indices BEFORE that cutoff. Tool / step parts render normally at every index. Works for openai and langgraph because it transforms at the view layer rather than fighting the AI SDK state machine. Removes server-side debug log. Keeps the minimal delete-old-text UX. Co-authored-by: Isaac
…lper - Removed the "_(connection interrupted — reconnecting…)_" delta block. Render-time slice hides attempt-1 text on resume anyway, so the suffix was invisible past the 10s stale window and too subtle during it. - Extracted writeEvent(type, payload) helper; sealActiveMessage went from 45 → 22 lines, no behavior change. - Removed readActive() TS-widening helper (no longer needed without the suffix block). - Inlined onFirstResponseId helper into its single call site. Net: 92 lines removed, 36 added in this file. Co-authored-by: Isaac
Durability mechanics now live entirely in databricks-ai-bridge's LongRunningAgentServer (rotate conv_id on resume + full-history input sanitizer, see ai-bridge PR #416). Templates can drop the explicit repair surface: - agent-langgraph-advanced/agent.py: drop middleware=[build_tool_resume_repair_middleware()] from create_agent and the unused import. Also drop the stream_handler UI-echo dedupe block — the server sanitizer handles mid-history orphans end-to-end. - agent-openai-advanced/utils.py: drop await session.repair() from deduplicate_input. session.repair() stays available as a public method for callers who want destructive session cleanup. Net: agent.py / utils.py in both advanced templates have zero durability-specific lines. The contract becomes "use our checkpointer/ session classes with LongRunningAgentServer — durable resume + orphan repair is free." Co-authored-by: Isaac
Temporarily short-circuit the resumeCutIndex write so attempt-1's text stays visible while attempt-2 streams over it. Lets us see how the server-side inheritance + synthetic-output prompt shape the LLM's mid-turn continuation behavior without the visual wipe hiding what attempt-2 actually emits. Re-enable by uncommenting the block; the rest of the wipe plumbing (state hook, Messages prop threading, render-time slice) is left in place so re-enabling is a 1-line flip. Co-authored-by: Isaac
…les resume Server-side changes earlier in this branch (prior-attempt tool-event inheritance + partial-stream reassembly in databricks-ai-bridge) make the client-side "wipe attempt-1 text when resume fires" machinery unnecessary: attempt-2's LLM sees attempt-1's work as history and continues seamlessly instead of restarting. The wipe was also hiding the new continuation quality from the user. Turning the wipe off in UI testing confirmed the server-side story is sufficient. Delete the full stack: - packages/core/src/types.ts: drop `resumed` from CustomUIDataTypes. - server/src/routes/chat.ts: drop writerRef + emittedResumedAttempts + the onChunk raw-event branch that emitted data-resumed parts. Trace-extraction stays; only the resume-forwarding path is removed. - client/src/components/chat.tsx: drop resumeCutIndex state hook, the data-resumed onData handler (was already commented out), and the prop pass to <Messages/>. - client/src/components/messages.tsx: drop resumeCutIndex prop from MessagesProps + its destructuring + the render-time text-part slice. The server still emits `response.resumed` as a sentinel so the Express proxy's sealActiveMessage() call correctly closes attempt-1's open text part before attempt-2's fresh output_item.added creates a new one. The proxy no longer extracts it into a UI data part. Co-authored-by: Isaac
Remove everything that isn't strictly required for durable resume with the server-side-only approach in ai-bridge PR #416: - agent-langgraph-advanced/agent_server/agent.py: revert entirely. The test-scaffolding tools (get_weather, get_stock_price, deep_research) were only for crash-test harnesses; the asyncio import only existed to support them. User-space durability surface for this template is now zero lines. - agent-openai-advanced/agent_server/agent.py: revert entirely. Drop the test-scaffolding tools (get_weather, get_stock_price, search_best_restaurants, deep_research) and asyncio import. Same zero-user-space result. - agent-langgraph-advanced/agent_server/utils.py: revert. The "middleware nodes that no-op return None" guard was defensive against middleware we no longer install. - agent-openai-advanced/agent_server/utils.py: revert. The empty-input guard was defensive against the old input=[] resume replay that no longer happens — server always replays the original input. - e2e-chatbot-app-next/server/src/index.ts: drop the activeMessage / sealActiveMessage / writeEvent machinery. Was synthesizing closure events on response.resumed to seal attempt-1's text part for the UI wipe. UI wipe is gone; the AI SDK creates parts by item_id so attempt-2's fresh output_item.added naturally starts a new part and attempt-1's open part finalizes on stream end. - Plus the earlier UI cleanup (chat.tsx, messages.tsx, types.ts, routes/chat.ts) that removed the data-resumed / resumeCutIndex plumbing. Remaining essentials: - agent_server/start_server.py: log-level setup so [durable] logs surface in app logs. - scripts/start_app.py: API_PROXY / AGENT_BACKEND_URL wiring so the Node AI SDK routes streaming POSTs through the Express background-mode + auto-resume proxy. Clone-from-branch is marked TEMPORARY (revert when ai-bridge ships). - pyproject.toml: databricks-ai-bridge git source pointer (TEMPORARY). - e2e-chatbot-app-next/server/src/index.ts: background-mode rewrite + auto-resume proxy for the /invocations route. Co-authored-by: Isaac
Infinite Stream Resume loop seen with Claude multi-tool turns via
durable retrieve. Root:
- useChat's onStreamPart reset resumeAttemptCountRef on every chunk,
so the 3-retry cap was only enforced when a stream ended empty.
When Claude's provider failed to emit a clean `finish` UIMessageChunk
at the end of the stream, lastPart.type !== 'finish' kept
streamIncomplete = true. Each resume replayed the cached stream,
delivered chunks, reset the counter to 0, onFinish fired without
`finish`, looped.
Fix:
- Remove the per-chunk reset in onStreamPart.
- Reset only in prepareSendMessagesRequest when the last message is a
user message (a genuine new turn). Tool-result continuations
(non-user-message continuations) don't reset.
- Cap stays at 3; after that, fetchChatHistory() pulls the
DB-persisted state so the user sees the final assistant output
instead of spinning forever.
Co-authored-by: Isaac
Final stable state for durable execution. End-to-end UI-validated
scenarios that now work:
- Multi-tool turn interrupted mid-sequence, durable resume inherits
completed tool pairs + narrative (reordered) + synthetic output
for the interrupted call, agent continues from where it left off.
- Text-only mid-stream crash, partial-text reassembly + Claude
prefill → continuation.
- Cross-turn recall after crash-and-resume (stable thread via read-
time checkpoint repair on LangGraph / session auto-repair on
OpenAI).
- Multi-tool on GPT-5 + openai-agents (single-response-per-turn).
Template fix here: process_agent_stream_events now disambiguates by
(a) item.type bucket for delta routing and (b) call_id bucket for
multiple open function_calls. The original single curr_item_id bucket
worked for GPT-5's strictly serial events but collided on Claude's
interleaved + parallel tool-call events, which produced two items
sharing one id and broke the client's part tracking.
Pairs with databricks-ai-bridge PR #416 changes (rotate + replay +
full-history sanitizer + prior-attempt tool-pair inheritance +
narrative hoist + checkpoint read-time repair + session auto-repair).
Co-authored-by: Isaac
End-to-end UI test on Claude (via deployed agent-openai-advanced with the updated databricks-ai-bridge) confirmed that the bridge-side ordering fix (sanitizer + narrative hoist + tool-pair inheritance + session auto-repair) is sufficient on its own. The two template-side guards added in earlier commits are no longer needed: - Revert 0ddbd60: `process_agent_stream_events` per-type + per-call-id id tracking. The single-bucket implementation handles Claude's interleaved + parallel tool-call events correctly now that the upstream ordering is clean. - Revert 5f3c507: `chat.tsx` user-message-only resume-counter reset. Claude now emits a clean `finish` UIMessageChunk through the durable retrieve path, so the per-chunk reset no longer traps the 3-retry cap in an infinite loop. Keeps the advanced templates lean — durability logic lives entirely in databricks-ai-bridge (LongRunningAgentServer). Co-authored-by: Isaac
Extract three pure helpers above the route handler so the SSE frame loop reads like prose: - parseSseFrame(frame): classifies a frame as done / passthrough / data. - extractResponseId(payload): tolerates FastAPI's three response_id locations (response_id, response.id, top-level id with resp_ prefix). - isTerminalErrorFrame(payload): detects task_failed / task_timeout so the resume loop can short-circuit. pumpStream now just drives the reader + forwards bytes; the parsing logic is testable in isolation and the handler body is substantially shorter. Co-authored-by: Isaac
Both advanced templates were setting these env vars to hard-coded
localhost URLs that match the bundled-process topology (Node on 3000,
FastAPI on 8000). The values are fixed by the templates themselves —
a customer deploying the advanced stack can't change them without
breaking the bundle. Making them required in yaml adds noise without
adding configurability.
Push the defaults into the chatbot:
- New ``getApiProxyUrl()`` helper in ``packages/ai-sdk-providers/src/
api-proxy.ts`` resolves the effective proxy URL:
1. explicit ``API_PROXY`` wins,
2. ``DATABRICKS_SERVING_ENDPOINT`` set → direct-endpoint mode, no
proxy,
3. otherwise → ``http://localhost:${CHAT_APP_PORT|PORT|3000}/invocations``
(advanced-template convention).
Used from ``providers-server.ts`` and ``request-context.ts`` so both
agree on proxy activation.
- ``server/src/index.ts`` defaults ``AGENT_BACKEND_URL`` to
``http://localhost:8000/invocations`` when unset. Explicit empty
string still disables the ``/invocations`` proxy route.
- Drop the ``API_PROXY`` / ``AGENT_BACKEND_URL`` block (and its comment)
from both advanced templates' ``app.yaml`` and ``databricks.yml``.
Preserves direct-serving-endpoint CUJs: when
``DATABRICKS_SERVING_ENDPOINT`` is set (basic chatbot deployments), the
AI SDK talks straight to the endpoint and never hits ``/invocations``.
Co-authored-by: Isaac
Prior cleanup commit dropped ``API_PROXY=http://localhost:8000/invocations`` from the advanced templates' ``app.yaml`` and ``databricks.yml``. That line pre-existed on ``main``; the PR never meant to remove it. Scope of the previous change was only the *newly-added* ``API_PROXY`` + ``AGENT_BACKEND_URL`` block that activated the Node proxy path. Restore the four files to exactly match ``main``. The chatbot-side ``getApiProxyUrl()`` default only fires when ``API_PROXY`` is unset, so users with main's explicit setting keep their existing behavior. Co-authored-by: Isaac
Both helpers answer routing-decision questions for the provider layer (proxy URL + context-injection gate), and the separate file wasn't buying isolation — providers-server.ts already imports from request-context.ts. One file, same logic. Co-authored-by: Isaac
…surface
Companion to databricks-ai-bridge#425 (POC for prose-recovery + always-rotate
durable-resume). Minimal template-side changes:
- agent-{openai,langgraph}-advanced/pyproject.toml: switch the
databricks-ai-bridge / databricks-openai / databricks-langchain branch pins
from `dhruv0811/durable-execution-resume` (the structured-repair PR #416)
to `dhruv0811/durable-execution-prose-recovery` (the new POC).
- agent-langgraph-advanced/agent_server/agent.py: invoke_handler now returns
the resolved `thread_id` in `custom_outputs`. After a crash + resume, the
bridge rotates `context.conversation_id` to `{base}::attempt-N`. Surfacing
it here lets the client pass it back as `custom_inputs.thread_id` on the
next turn, so subsequent turns land on the rotated (clean) checkpointer row
instead of the orphan-poisoned original. The OpenAI template already does
this via `session.session_id` in custom_outputs; LangGraph just didn't.
Status
======
POC for review alongside databricks-ai-bridge#425. Not intended to merge
unless empirical data justifies the trade vs PR #195.
Co-authored-by: Isaac
Companion to databricks-ai-bridge#425 prose-recovery design. The bridge's
always-rotate flow rotates `context.conversation_id` to `{base}::attempt-N`
on every durable-resume and emits the rotated value in the
`response.resumed` SSE event.
This patch:
- Maintains an in-memory `Map<chat_id, rotated_conversation_id>` in the
shared AI-SDK provider's databricksFetch.
- Captures the rotation by sniffing the SSE response for
`response.resumed { conversation_id: ... }` events.
- On subsequent requests for the same chat, swaps the rotated value into
`context.conversation_id` before forwarding.
Net effect: turn N+1 after a crash lands on the rotated (clean) SDK
session instead of the orphan-poisoned original — closing the multi-turn
gap without requiring SDK adapter wrappers in the bridge.
In-memory only (single Express process). A multi-pod deployment would
persist this on the chat row.
Co-authored-by: Isaac
…dedup The previous heuristic compared `session_items >= messages - 1` to decide whether to forward only the latest user message. Under prose-recovery + always-rotate, the rotated session has FEWER items than the chatbot's accumulated UI echo (attempt 2's session is fresh; UI accumulated events from both attempts), so the heuristic was returning all messages, including duplicates of attempt 2's tool_calls and the orphan from attempt 1. The Runner then combined session+input, producing duplicate function_call items that the OpenAI SDK groups into a malformed assistant.tool_calls block — Anthropic 400 with "tool_call_ids did not have response messages". Fix: if the session has any items at all, treat it as the authoritative source of cross-turn history and only forward the new user message. First-turn path (empty session) still returns the full input. Co-authored-by: Isaac
…or cross-turn dedup Mirrors the same fix applied to agent-openai-advanced/utils.py (deduplicate_input). When the rotated checkpointer state exists for the current thread_id, only forward the latest user message from the chatbot's request input. Without this, the chatbot's full-history echo (including any orphan tool_use AIMessage from a crashed attempt 1 that the rotated checkpointer doesn't have) would be merged into state via `add_messages` and poison the next LLM call with an unpaired tool_use. Closes the multi-turn gap on the LangGraph side. The bridge (databricks-ai-bridge#425) no longer needs the input sanitizer (`tool_repair.py` + `_sanitize_request_input`) — between this LangGraph dedup and the OpenAI session-as-authoritative dedup, both templates handle UI-echo cleanly. Co-authored-by: Isaac
Cross-turn echo dedup is now handled SDK-agnostically inside the bridge via _trim_echoed_history (databricks-ai-bridge#425). Both templates' agent.py / utils.py go back to main — no per-SDK calls into session.get_items() / agent.aget_state(), no thread_id surface in custom_outputs. The remaining template-side change for the always-rotate flow is e2e-chatbot-app-next/packages/ai-sdk-providers/src/providers-server.ts (alias capture from response.resumed sentinel + injection on outgoing requests). Co-authored-by: Isaac
Replace `response.body!` with an explicit early-return guard. Functionally identical (the SSE check above already implies a body exists), but satisfies Biome's lint/style/noNonNullAssertion rule introduced by my prior commit.
Per Bryan's review feedback, the test framework's _MANAGED_SCHEMAS list isn't the right layer for handling memory- schema permissions — that crosses a layer boundary into per- template configuration. The right shape is: * Workspace setup grants USAGE on workspace-managed schemas to the writer role; SPs inherit it automatically. * Per-template grant_lakebase_permissions.py owns its own table list and grants relative to whichever schema the agent is configured to use (via LAKEBASE_AGENT_MEMORY_SCHEMA env var). In the autoscaling test branch we already have: databricks_writer_16401=UC/... on agent_langgraph_memory which means new SPs created by the test deploy already have USAGE through role inheritance. Combined with the workspace-side ALTER DATABASE search_path that exposes the schema by default, the deployed app resolves the pgvector type without any test- framework grants on this schema. Co-authored-by: Isaac
This reverts commit 9d4af20.
Companion to databricks-ai-bridge#425's removal of `_trim_echoed_history`. Per design discussion with Bryan, echo dedup is an agent-layer concern — the agent owns its SDK session/checkpointer and is the right layer to know what's already persisted vs what's a new turn. agent-openai-advanced/agent_server/utils.py - Update `deduplicate_input` heuristic from `len(session_items) >= len(messages) - 1` to `session_items and len(messages) > 1`. The old count-based check broke under prose-recovery + always-rotate (rotated session has fewer items than the chatbot's accumulated UI echo). The new check trusts the session as authoritative for prior turns whenever it has any items. agent-langgraph-advanced/agent_server/agent.py - Add `aget_state` probe in `stream_handler`. When the checkpointer already has messages for this thread, drop everything in input except the latest user message before passing to `agent.astream`. Without this, `add_messages` would append the chatbot's full-history echo — it dedupes by `id`, but MLflow's `responses_to_cc` doesn't preserve IDs, so dedup never fires across the bridge boundary. Both: ~10 lines per template, runs at the same point the SDK session read happens, no SDK adapter wrapping. Co-authored-by: Isaac
dhruv0811
left a comment
There was a problem hiding this comment.
Overall this seems really heavy on the UI side. Is there a way to circumvent the whole express proxy thing? I want to simplify this a lot. This is too large and clunky. Can you think through only the minimum required changes based on the final version of the server here: databricks/databricks-ai-bridge#425 to see what is required in the template and in the UI, and how we can simplify the approach.
| logging.getLogger("mlflow.utils.autologging_utils").setLevel(logging.ERROR) | ||
| sp_workspace_client = WorkspaceClient() | ||
|
|
||
| LLM_ENDPOINT_NAME = "databricks-gpt-5-2" |
There was a problem hiding this comment.
This is a local change for testing, should not be in the PR.
| # For on-behalf-of user authentication, pass get_user_workspace_client() to init_agent. | ||
| agent = await init_agent(store=store, checkpointer=checkpointer) | ||
|
|
||
| # When the checkpointer already has prior turns for this thread, |
There was a problem hiding this comment.
Can we move this logic to a function that is equivalent to the deduplicate_input in the openai template? Location and usage wise.
| poll_interval_seconds=float(os.getenv("POLL_INTERVAL_SECONDS", "1.0")), | ||
| ) | ||
|
|
||
| log_level = os.getenv("LOG_LEVEL", "INFO") |
There was a problem hiding this comment.
This feels like a very messy way to propogate logs, can we see if its possible to clean this up? and change the comments to be bridge specific, not durable specific since all bridge logs will be shown with this.
| " Option 3 (provisioned): LAKEBASE_INSTANCE_NAME=<your-instance-name>\n" | ||
| ) | ||
|
|
||
| memory_schema = os.getenv("LAKEBASE_AGENT_MEMORY_SCHEMA") or None |
There was a problem hiding this comment.
Why is this addition needed? Doesn't seem durable related??
Summary
Companion to databricks-ai-bridge#425 — chatbot cooperation for the bridge's always-rotate flow + per-template UI-echo dedup.
The bridge keeps its HTTP surface minimal (heartbeat, scan, CAS claim, conversation_id rotation, prose recovery on resume). This PR delivers the two pieces that have to live outside the bridge:
conversation_idfrom the SSEresponse.resumedsentinel, so subsequent turns from the same chat send the rotated value ascontext.conversation_idand land on the clean rotated session (instead of the original orphan-poisoned one).agent.py/utils.py— when the SDK's session/checkpointer already has prior turns, the agent forwards only the latest user message. Without this, the chatbot's full-history echo would combine with the SDK's own session items and the LLM call would receive duplicates → malformedassistant.tool_callsblock → 400.Changes
1. Chatbot alias map (
e2e-chatbot-app-next/packages/ai-sdk-providers/src/providers-server.ts)In-memory
Map<chat_id, rotated_conversation_id>indatabricksFetch:chat_id, swap it intocontext.conversation_idbefore forwarding.data:lines, when a{type: 'response.resumed', conversation_id: ...}event lands update the alias map.In-memory only (single Express process). A multi-pod chatbot deployment would persist this on the chat row.
2. OpenAI template UI-echo dedup (
agent-openai-advanced/agent_server/utils.py::deduplicate_input)When the OpenAI session has any persisted items, the prior turns are there — forward only the latest user message. The Runner prepends session history on the LLM call automatically.
The OpenAI SDK's own
deduplicate_input_items_preferring_latest(agents/run_internal/items.py:280) only dedupes bycall_idforfunction_call/function_call_outputitems —_dedupe_keyreturnsNonefor any item with arolefield (line 224), so user/assistant message echo is not covered by the SDK.3. LangGraph template UI-echo dedup (
agent-langgraph-advanced/agent_server/agent.py::stream_handler)When the checkpointer already has messages for this thread, drop everything in input except the latest user message before passing to
agent.astream. LangGraph'sadd_messagesreducer dedupes byid, but MLflow'sresponses_to_cc(mlflow/types/responses.py:315-383) doesn't preserve IDs across the bridge boundary, so the reducer can't dedup automatically.4.
pyproject.tomlbranch pin (both advanced templates)Each
uv syncresolves to branch tip. Revert to registry versions before merge.Test plan
long_running)dhruv-lg-adv-durable-prose(LangGraph + Claude Sonnet 4.5) — multi-tool kill mid-deep_research + multi-turn ✅dhruv-oai-adv-durable-prose(OpenAI Agents SDK + GPT-5) — same matrix ✅dhruv-oai-cl-durable-prose(OpenAI Agents SDK + Claude Sonnet 4.5) — same matrix ✅/_debug/kill_taskroute registration verified on all 3 deployed appsHow to run the crash test
Pre-merge checklist
pyproject.tomlgit-branch pins to registry versionsAPP_TEMPLATES_BRANCHenv var (or remove the--branchflag inscripts/start_app.py::clone_frontend_if_needed) so the chatbot clones from mainLONG_RUNNING_ENABLE_DEBUG_KILL=1from production deploy configsText.Only.-.Prose.mov
Tool.Calling.Multiturn.-.Prose.mov