Templates: Durable prose-recovery + always-rotate (companion to ai-bridge#425) by dhruv0811 · Pull Request #204 · databricks/app-templates

dhruv0811 · 2026-04-28T00:08:29Z

Summary

Companion to databricks-ai-bridge#425 — chatbot cooperation for the bridge's always-rotate flow + per-template UI-echo dedup.

The bridge keeps its HTTP surface minimal (heartbeat, scan, CAS claim, conversation_id rotation, prose recovery on resume). This PR delivers the two pieces that have to live outside the bridge:

Chatbot capture of the rotated conversation_id from the SSE response.resumed sentinel, so subsequent turns from the same chat send the rotated value as context.conversation_id and land on the clean rotated session (instead of the original orphan-poisoned one).
Per-template UI-echo dedup in agent.py / utils.py — when the SDK's session/checkpointer already has prior turns, the agent forwards only the latest user message. Without this, the chatbot's full-history echo would combine with the SDK's own session items and the LLM call would receive duplicates → malformed assistant.tool_calls block → 400.

Changes

1. Chatbot alias map (`e2e-chatbot-app-next/packages/ai-sdk-providers/src/providers-server.ts`)

In-memory Map<chat_id, rotated_conversation_id> in databricksFetch:

Outgoing requests: if the alias map has a rotated value for the current chat_id, swap it into context.conversation_id before forwarding.
Incoming SSE responses: wrap the body, parse data: lines, when a {type: 'response.resumed', conversation_id: ...} event lands update the alias map.

In-memory only (single Express process). A multi-pod chatbot deployment would persist this on the chat row.

2. OpenAI template UI-echo dedup (`agent-openai-advanced/agent_server/utils.py::deduplicate_input`)

session_items = await session.get_items()
if session_items and len(messages) > 1:
    return [messages[-1]]
return messages

When the OpenAI session has any persisted items, the prior turns are there — forward only the latest user message. The Runner prepends session history on the LLM call automatically.

The OpenAI SDK's own deduplicate_input_items_preferring_latest (agents/run_internal/items.py:280) only dedupes by call_id for function_call / function_call_output items — _dedupe_key returns None for any item with a role field (line 224), so user/assistant message echo is not covered by the SDK.

3. LangGraph template UI-echo dedup (`agent-langgraph-advanced/agent_server/agent.py::stream_handler`)

state = await agent.aget_state(config)
if state and state.values.get("messages") and input_state["messages"]:
    last_user = next(
        (m for m in reversed(input_state["messages"]) if m.get("role") == "user"),
        None,
    )
    input_state["messages"] = [last_user] if last_user else []

When the checkpointer already has messages for this thread, drop everything in input except the latest user message before passing to agent.astream. LangGraph's add_messages reducer dedupes by id, but MLflow's responses_to_cc (mlflow/types/responses.py:315-383) doesn't preserve IDs across the bridge boundary, so the reducer can't dedup automatically.

4. `pyproject.toml` branch pin (both advanced templates)

[tool.uv.sources]
databricks-ai-bridge = { git = "https://github.com/databricks/databricks-ai-bridge", branch = "dhruv0811/durable-execution-prose-recovery" }
databricks-{openai,langchain} = { git = "...", branch = "...", subdirectory = "integrations/{openai,langchain}" }

Each uv sync resolves to branch tip. Revert to registry versions before merge.

Test plan

Bridge unit tests pass (104 across long_running)
UI testing on dhruv-lg-adv-durable-prose (LangGraph + Claude Sonnet 4.5) — multi-tool kill mid-deep_research + multi-turn ✅
UI testing on dhruv-oai-adv-durable-prose (OpenAI Agents SDK + GPT-5) — same matrix ✅
UI testing on dhruv-oai-cl-durable-prose (OpenAI Agents SDK + Claude Sonnet 4.5) — same matrix ✅
Bridge SHA + clone-branch + /_debug/kill_task route registration verified on all 3 deployed apps

How to run the crash test

APP_URL="https://dhruv-oai-cl-durable-prose-6051921418418893.staging.aws.databricksapps.com"
TOKEN=$(databricks auth token --profile dhruv.gupta@databricks.com | jq -r .access_token)

# 1. POST background streaming request
RESP=$(curl -sS -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"input":[{"role":"user","content":"Run deep_research on quantum computing"}],"background":true,"stream":true}' \
  "$APP_URL/responses")
RID=$(echo "$RESP" | jq -r .id)

# 2. Kill mid-tool (deep_research sleeps 15s)
sleep 5
curl -sS -X POST -H "Authorization: Bearer $TOKEN" "$APP_URL/_debug/kill_task/$RID"

# 3. Wait past stale threshold (10s) + scanner interval (30s ±)
sleep 12

# 4. Poll until completed
for i in $(seq 1 12); do
  sleep 3
  curl -sS -H "Authorization: Bearer $TOKEN" "$APP_URL/responses/$RID" | jq '{status, attempt_number}'
done

# Expected: status=completed, attempt_number=2

Pre-merge checklist

Bridge ai-bridge#425 merged and a release cut
Revert pyproject.toml git-branch pins to registry versions
Revert APP_TEMPLATES_BRANCH env var (or remove the --branch flag in scripts/start_app.py::clone_frontend_if_needed) so the chatbot clones from main
Remove LONG_RUNNING_ENABLE_DEBUG_KILL=1 from production deploy configs
Decide whether to persist alias map per-chat (multi-pod readiness)

Text.Only.-.Prose.mov

Tool.Calling.Multiturn.-.Prose.mov

Pins both advanced templates to the ai-bridge PR branch so the long-running agent server crash-resumes in-flight runs via heartbeat + CAS claim. Revert the [tool.uv.sources] entry once that PR merges and a new release is cut. Also fixes a latent IndexError in agent-openai-advanced's deduplicate_input: when the long-running server re-invokes the handler with input=[] to resume from the session (the agnostic resume contract validated by prototyping), messages[-1] blew up. Now we return [] for empty input — the session already has prior turns so there is nothing to dedupe. No change to either template's agent.py.

Makes the bundled chat UI durable end-to-end without any client-side changes. The Express /invocations proxy in e2e-chatbot-app-next now: - Rewrites streaming POSTs to { ...body, background: true, stream: true }, so every user turn persists each SSE event to Lakebase via LongRunningAgentServer. - Sniffs response.id + sequence_number out of the forwarded SSE stream. - If upstream closes before [DONE] (pod died, lost connection), the proxy transparently reconnects via GET /responses/{id}?stream=true&starting_after=N and resumes emitting events to the still-connected browser client. The browser sees one continuous stream. Non-streaming requests and non-POST methods keep the original passthrough behavior. Also points agent-openai-advanced/scripts/start_app.py at the dhruv0811/durable-execution-templates branch of app-templates so the new proxy code is actually deployed (override via APP_TEMPLATES_BRANCH env var). Revert once this lands on main.

… actually fires Previous attempt left the proxy dead-code: the Node AI SDK honored API_PROXY verbatim and sent requests straight to http://localhost:8000/invocations (FastAPI), skipping the Express /invocations handler at :3000 entirely. Confirmed in logs: requests reached the backend with {"stream": true} but never with "background": true. Split the two concerns across env vars: API_PROXY=http://localhost:3000/invocations (AI SDK -> Express proxy) AGENT_BACKEND_URL=http://localhost:8000/invocations (Express proxy -> FastAPI) Express handler prefers AGENT_BACKEND_URL, falls back to API_PROXY for backwards compat so existing templates don't break.

response_id is buried in the raw backend SSE stream and never surfaces to the browser because the Vercel AI SDK re-wraps the stream as its own message format before sending to the client. Log it on the server side instead so test instructions can `grep 'background started response_id=' ` from apps logs. Also distinguish the startup log so it's clear the durable-resume code path is live. No behavior change; pure observability.

app.yaml env vars were overriding databricks.yml at runtime, so the AI SDK was still talking directly to the Python FastAPI backend and the Express /invocations proxy never saw the request. Keep both files in sync.

…RL to FastAPI The script was unconditionally overwriting API_PROXY with the backend URL right before launching the frontend, which defeated our whole durable- resume-rewrite story: the Node AI SDK bypassed the Express /invocations handler and streamed straight from FastAPI. Fix: API_PROXY now points at CHAT_APP_PORT (the Express proxy), and we default AGENT_BACKEND_URL (previously unset) to the Python backend. Use os.environ.setdefault for AGENT_BACKEND_URL so operators can still override via databricks.yml or app.yaml.

…resp_* Broadens the response_id parser so it works whether the backend tags frames with top-level response_id (preferred) or the older nested-only shape.

…tally Matches the [/invocations] prefix so the full story is greppable from apps logs without correlating Node and Python timestamps.

The library logger inherits from root (default WARNING) so INFO-level lifecycle messages from LongRunningAgentServer (heartbeat, claim, resume, stream lifecycle) were being dropped. Set both the ai-bridge logger and the root level to LOG_LEVEL so apps logs carry the full durable-resume story without requiring callers to tune logging themselves.

When a response is killed mid-stream, the partial assistant text that was already rendered to the client kept receiving fresh deltas from attempt 2 — users saw attempt-1-partial + attempt-2-full concatenated in one bubble. Express /invocations proxy now seals the in-progress assistant message across an attempt boundary: 1. On upstream close without [DONE], immediately append a '(connection interrupted — reconnecting…)' suffix delta to the active message so the user sees something is happening during the ~10s stale window. 2. On the response.resumed sentinel, emit synthetic response.content_part.done + response.output_item.done events for the active message — effectively ending the first assistant bubble at OpenAI Responses API level. 3. Attempt 2's natural response.output_item.added (with a fresh item_id) then creates a clean second bubble showing the full answer. Tool calls naturally de-dup by call_id across attempts, so no closure synthesis needed for them. Also mirrors the routing + logging fixes previously applied to agent-openai-advanced onto agent-langgraph-advanced so both templates get durable resume with the full [durable] log lifecycle visible: - app.yaml + databricks.yml: split API_PROXY (-> Express :3000) from AGENT_BACKEND_URL (-> FastAPI :8000). - scripts/start_app.py: honor AGENT_BACKEND_URL, point API_PROXY at the Express proxy, clone e2e-chatbot-app-next from the durable-execution branch. - agent_server/start_server.py: raise databricks_ai_bridge + root logger to LOG_LEVEL so [durable] INFO lines surface in apps logs.

Durable-resume can interrupt the pod between an LLM emitting tool_calls and the SDK finishing the tool executions — the Session is left with function_call items whose matching function_call_output never got written. The next LLM request over that session fails: 400 BAD_REQUEST: An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_xxx, call_yyy, ... Piggy-back on deduplicate_input (which already touches the session each turn) to inject synthetic function_call_output items for every orphan function_call. Message is plain-text, so the LLM sees 'tool X was interrupted, please retry if needed' and can decide whether to re-call or continue. No change to agent.py.

The previous heal added synthetic function_call_output at the END of the session (add_items only appends). When the conversation has a message between the orphan function_call and the synthetic output, the SDK rebuilds the LLM request as an assistant-with-tool_calls message that doesn't have its tool responses right after it, and the API rejects with 'assistant message with tool_calls must be followed by tool messages'. Also: the Vercel AI SDK client echoes the full conversation back each turn. deduplicate_input drops most of it but the Runner.run path can still re-persist prior items, leaving DUPLICATE function_call rows for the same call_id. Replace with a clear+rebuild sanitize pass: dedupe function_call / function_call_output by call_id, inject synthetic outputs immediately after any orphan function_call, clear the session, and re-add the canonical sequence. No-op when already clean.

Keep the UI minimal but fix the doubled-text issue: when a mid-stream kill happens, the AI SDK merges all deltas within one streamText call into one UIMessage — so our proxy-level seal events were valid but invisible, and attempt 2's text kept appending to attempt 1's partial. Minimal solution: 1. Express /invocations proxy already emits response.resumed at the attempt boundary (unchanged). 2. chat.ts server: detect response.resumed via onChunk and forward it to the UI stream as { type: 'data-resumed', data: { attempt } }. 3. chat.tsx client: on 'data-resumed', call setMessages to drop all text parts from the last (assistant) message. Tool call parts stay because they dedupe by call_id naturally. Also: fix auto-resume loop burning MAX_RESUME_ATTEMPTS on terminal errors by exiting early when an error event with code=task_failed or code=task_timeout comes through the proxy. No changes to agent.py. Agnosticism tenet intact.

Your 'clean up at end of stream' idea — much more robust than relying on mid-stream mutation sticking. On data-resumed we now snapshot the attempt-1 text length, and in onFinish we slice exactly that many chars off the front of the last assistant message's text parts. Whatever the AI SDK accumulator did during streaming, the final rendered state contains only attempt 2's content. The mid-stream mutation wipe stays in place too — when it sticks the text visibly clears during the 10s stale window, which is nicer UX than waiting for onFinish. When it doesn't stick, onFinish catches it.

PreviewMessage is memoized: while loading it compares prevProps.message to nextProps.message by reference; when not loading it deep-equals the parts array (which short-circuits on identical references). Our previous truncate mutated part.text in place and returned [...prev] — same message + same parts array refs, so the memo skipped the re-render and the old text stuck on screen even though state was technically updated. Map to NEW part objects with sliced text and wrap a NEW message object so both the reference check (loading path) and deep-equal (done path) see a change and re-render.

State-level wipes were getting clobbered by the AI SDK accumulator — ReactChatState.replaceMessage deep-clones state.message on every write(), and activeTextParts keeps mutating the originals behind the UI's back. Solution: transform at the VIEW layer instead of fighting the state machine. Chat component tracks attempt1TextLen per messageId (state, not ref, so it propagates to children). Messages maps each message through a render-time slice that drops the leading attempt-1 chars from text parts before passing to PreviewMessage. Creates new message + part objects so the memo's reference check trips and the component re-renders. onFinish still does the authoritative setMessages truncate so the persisted-to-DB final message reflects only attempt 2. That truncate now also clears attempt1TextLen, so the render-time slice becomes a no-op after completion (state is already truncated).

…cution-templates # Conflicts: # agent-openai-advanced/databricks.yml

Drop the [chat][onData] / [chat][onFinish] / [chat][onChunk] tracing statements that were used to trace the attempt-1 → attempt-2 flow while tuning the render-time slice and post-stream truncate. The server-side Express proxy still logs resume lifecycle (background started / resume fetch / terminal error / stream done) since that's operationally useful; the ai-bridge backend's [durable] INFO logs stay as-is. Co-authored-by: Isaac

Move the per-template workarounds for mid-tool crash-resume into the databricks-ai-bridge library and wire them in: - agent-openai-advanced/utils.py: deduplicate_input now calls session.repair() (new public method on AsyncDatabricksSession) instead of the 100-line in-template _sanitize_session. Same behavior — dedupe function_call/function_call_output by call_id, inject synthetic outputs for orphans — just owned by the library. - agent-langgraph-advanced/agent.py: before agent.astream, call build_tool_resume_repair on the checkpointer's messages and apply via agent.aupdate_state(..., as_node="tools"). The as_node is critical — without it LangGraph re-evaluates the model→{tools,END} branch from the updated state and crashes with KeyError: 'model'. - agent-langgraph-advanced/agent.py: when the checkpointer already has a thread, only forward the latest user turn from request.input — the UI client (Vercel AI SDK) re-echoes the full history on every turn, which can re-inject orphan tool_uses from a previously-interrupted attempt that the client kept in its buffer. Both pyproject.toml files now pin databricks-openai / databricks-langchain to the same ai-bridge branch (subdirectory git sources) so the new helpers are picked up. Temporary; revert to registry once the bridge PR merges. Co-authored-by: Isaac

Library side (databricks-langchain, PR #416): - New build_tool_resume_repair_middleware() returns an AgentMiddleware whose before_model hook runs build_tool_resume_repair. Swaps the manual aget_state / aupdate_state(as_node="tools") surgery in the template for a one-line `middleware=[...]` arg to create_agent. - The as_node="tools" footgun (KeyError: 'model' in the model→{tools,END} conditional branch re-eval) disappears entirely; repair runs inside the graph's own execution flow, not as external state surgery. Template (agent-langgraph-advanced): - init_agent: add middleware=[build_tool_resume_repair_middleware()] to create_agent. stream_handler drops the 8-line repair block. - utils.py process_agent_astream_events: skip None node_data (the graph's updates stream emits {middleware_node: None} when the middleware is a no-op, which is every turn on the happy path). UI (e2e-chatbot-app-next): - On data-resumed from the backend, wipe text parts from the last assistant message in one setMessages. Tool-call parts are kept as-is (they already dedupe across attempts by call_id). Dropped: * attempt1TextLen state + per-message snapshot in onData * render-time text slice in Messages.tsx * onFinish authoritative post-stream truncate The AI SDK's seal-on-resume synthesis (Express proxy) still creates a fresh output_item_id for attempt 2, so new deltas land in a fresh text part — our wipe of the old text part is sufficient. Net: -99 LOC across 4 files. Same behavior for the "delete old text, leave tools alone" UX; substantially less state-machine choreography. Co-authored-by: Isaac

setMessages can't wipe mid-stream — the AI SDK's activeResponse.state is a snapshot taken at makeRequest time, and every text-delta calls write() → this.state.replaceMessage(lastIdx, activeResponse.state.message), which overwrites any setMessages we do. Our wipe was visible for a single chunk then reverted. Fix: snapshot the assistant message's parts.length at data-resumed, and at render time hide text parts at indices BEFORE that cutoff. Tool / step parts render normally at every index. Works for openai and langgraph because it transforms at the view layer rather than fighting the AI SDK state machine. Removes server-side debug log. Keeps the minimal delete-old-text UX. Co-authored-by: Isaac

…lper - Removed the "_(connection interrupted — reconnecting…)_" delta block. Render-time slice hides attempt-1 text on resume anyway, so the suffix was invisible past the 10s stale window and too subtle during it. - Extracted writeEvent(type, payload) helper; sealActiveMessage went from 45 → 22 lines, no behavior change. - Removed readActive() TS-widening helper (no longer needed without the suffix block). - Inlined onFirstResponseId helper into its single call site. Net: 92 lines removed, 36 added in this file. Co-authored-by: Isaac

Durability mechanics now live entirely in databricks-ai-bridge's LongRunningAgentServer (rotate conv_id on resume + full-history input sanitizer, see ai-bridge PR #416). Templates can drop the explicit repair surface: - agent-langgraph-advanced/agent.py: drop middleware=[build_tool_resume_repair_middleware()] from create_agent and the unused import. Also drop the stream_handler UI-echo dedupe block — the server sanitizer handles mid-history orphans end-to-end. - agent-openai-advanced/utils.py: drop await session.repair() from deduplicate_input. session.repair() stays available as a public method for callers who want destructive session cleanup. Net: agent.py / utils.py in both advanced templates have zero durability-specific lines. The contract becomes "use our checkpointer/ session classes with LongRunningAgentServer — durable resume + orphan repair is free." Co-authored-by: Isaac

Temporarily short-circuit the resumeCutIndex write so attempt-1's text stays visible while attempt-2 streams over it. Lets us see how the server-side inheritance + synthetic-output prompt shape the LLM's mid-turn continuation behavior without the visual wipe hiding what attempt-2 actually emits. Re-enable by uncommenting the block; the rest of the wipe plumbing (state hook, Messages prop threading, render-time slice) is left in place so re-enabling is a 1-line flip. Co-authored-by: Isaac

…les resume Server-side changes earlier in this branch (prior-attempt tool-event inheritance + partial-stream reassembly in databricks-ai-bridge) make the client-side "wipe attempt-1 text when resume fires" machinery unnecessary: attempt-2's LLM sees attempt-1's work as history and continues seamlessly instead of restarting. The wipe was also hiding the new continuation quality from the user. Turning the wipe off in UI testing confirmed the server-side story is sufficient. Delete the full stack: - packages/core/src/types.ts: drop `resumed` from CustomUIDataTypes. - server/src/routes/chat.ts: drop writerRef + emittedResumedAttempts + the onChunk raw-event branch that emitted data-resumed parts. Trace-extraction stays; only the resume-forwarding path is removed. - client/src/components/chat.tsx: drop resumeCutIndex state hook, the data-resumed onData handler (was already commented out), and the prop pass to <Messages/>. - client/src/components/messages.tsx: drop resumeCutIndex prop from MessagesProps + its destructuring + the render-time text-part slice. The server still emits `response.resumed` as a sentinel so the Express proxy's sealActiveMessage() call correctly closes attempt-1's open text part before attempt-2's fresh output_item.added creates a new one. The proxy no longer extracts it into a UI data part. Co-authored-by: Isaac

Remove everything that isn't strictly required for durable resume with the server-side-only approach in ai-bridge PR #416: - agent-langgraph-advanced/agent_server/agent.py: revert entirely. The test-scaffolding tools (get_weather, get_stock_price, deep_research) were only for crash-test harnesses; the asyncio import only existed to support them. User-space durability surface for this template is now zero lines. - agent-openai-advanced/agent_server/agent.py: revert entirely. Drop the test-scaffolding tools (get_weather, get_stock_price, search_best_restaurants, deep_research) and asyncio import. Same zero-user-space result. - agent-langgraph-advanced/agent_server/utils.py: revert. The "middleware nodes that no-op return None" guard was defensive against middleware we no longer install. - agent-openai-advanced/agent_server/utils.py: revert. The empty-input guard was defensive against the old input=[] resume replay that no longer happens — server always replays the original input. - e2e-chatbot-app-next/server/src/index.ts: drop the activeMessage / sealActiveMessage / writeEvent machinery. Was synthesizing closure events on response.resumed to seal attempt-1's text part for the UI wipe. UI wipe is gone; the AI SDK creates parts by item_id so attempt-2's fresh output_item.added naturally starts a new part and attempt-1's open part finalizes on stream end. - Plus the earlier UI cleanup (chat.tsx, messages.tsx, types.ts, routes/chat.ts) that removed the data-resumed / resumeCutIndex plumbing. Remaining essentials: - agent_server/start_server.py: log-level setup so [durable] logs surface in app logs. - scripts/start_app.py: API_PROXY / AGENT_BACKEND_URL wiring so the Node AI SDK routes streaming POSTs through the Express background-mode + auto-resume proxy. Clone-from-branch is marked TEMPORARY (revert when ai-bridge ships). - pyproject.toml: databricks-ai-bridge git source pointer (TEMPORARY). - e2e-chatbot-app-next/server/src/index.ts: background-mode rewrite + auto-resume proxy for the /invocations route. Co-authored-by: Isaac

Infinite Stream Resume loop seen with Claude multi-tool turns via durable retrieve. Root: - useChat's onStreamPart reset resumeAttemptCountRef on every chunk, so the 3-retry cap was only enforced when a stream ended empty. When Claude's provider failed to emit a clean `finish` UIMessageChunk at the end of the stream, lastPart.type !== 'finish' kept streamIncomplete = true. Each resume replayed the cached stream, delivered chunks, reset the counter to 0, onFinish fired without `finish`, looped. Fix: - Remove the per-chunk reset in onStreamPart. - Reset only in prepareSendMessagesRequest when the last message is a user message (a genuine new turn). Tool-result continuations (non-user-message continuations) don't reset. - Cap stays at 3; after that, fetchChatHistory() pulls the DB-persisted state so the user sees the final assistant output instead of spinning forever. Co-authored-by: Isaac

Final stable state for durable execution. End-to-end UI-validated scenarios that now work: - Multi-tool turn interrupted mid-sequence, durable resume inherits completed tool pairs + narrative (reordered) + synthetic output for the interrupted call, agent continues from where it left off. - Text-only mid-stream crash, partial-text reassembly + Claude prefill → continuation. - Cross-turn recall after crash-and-resume (stable thread via read- time checkpoint repair on LangGraph / session auto-repair on OpenAI). - Multi-tool on GPT-5 + openai-agents (single-response-per-turn). Template fix here: process_agent_stream_events now disambiguates by (a) item.type bucket for delta routing and (b) call_id bucket for multiple open function_calls. The original single curr_item_id bucket worked for GPT-5's strictly serial events but collided on Claude's interleaved + parallel tool-call events, which produced two items sharing one id and broke the client's part tracking. Pairs with databricks-ai-bridge PR #416 changes (rotate + replay + full-history sanitizer + prior-attempt tool-pair inheritance + narrative hoist + checkpoint read-time repair + session auto-repair). Co-authored-by: Isaac

End-to-end UI test on Claude (via deployed agent-openai-advanced with the updated databricks-ai-bridge) confirmed that the bridge-side ordering fix (sanitizer + narrative hoist + tool-pair inheritance + session auto-repair) is sufficient on its own. The two template-side guards added in earlier commits are no longer needed: - Revert 0ddbd60: `process_agent_stream_events` per-type + per-call-id id tracking. The single-bucket implementation handles Claude's interleaved + parallel tool-call events correctly now that the upstream ordering is clean. - Revert 5f3c507: `chat.tsx` user-message-only resume-counter reset. Claude now emits a clean `finish` UIMessageChunk through the durable retrieve path, so the per-chunk reset no longer traps the 3-retry cap in an infinite loop. Keeps the advanced templates lean — durability logic lives entirely in databricks-ai-bridge (LongRunningAgentServer). Co-authored-by: Isaac

Extract three pure helpers above the route handler so the SSE frame loop reads like prose: - parseSseFrame(frame): classifies a frame as done / passthrough / data. - extractResponseId(payload): tolerates FastAPI's three response_id locations (response_id, response.id, top-level id with resp_ prefix). - isTerminalErrorFrame(payload): detects task_failed / task_timeout so the resume loop can short-circuit. pumpStream now just drives the reader + forwards bytes; the parsing logic is testable in isolation and the handler body is substantially shorter. Co-authored-by: Isaac

Both advanced templates were setting these env vars to hard-coded localhost URLs that match the bundled-process topology (Node on 3000, FastAPI on 8000). The values are fixed by the templates themselves — a customer deploying the advanced stack can't change them without breaking the bundle. Making them required in yaml adds noise without adding configurability. Push the defaults into the chatbot: - New ``getApiProxyUrl()`` helper in ``packages/ai-sdk-providers/src/ api-proxy.ts`` resolves the effective proxy URL: 1. explicit ``API_PROXY`` wins, 2. ``DATABRICKS_SERVING_ENDPOINT`` set → direct-endpoint mode, no proxy, 3. otherwise → ``http://localhost:${CHAT_APP_PORT|PORT|3000}/invocations`` (advanced-template convention). Used from ``providers-server.ts`` and ``request-context.ts`` so both agree on proxy activation. - ``server/src/index.ts`` defaults ``AGENT_BACKEND_URL`` to ``http://localhost:8000/invocations`` when unset. Explicit empty string still disables the ``/invocations`` proxy route. - Drop the ``API_PROXY`` / ``AGENT_BACKEND_URL`` block (and its comment) from both advanced templates' ``app.yaml`` and ``databricks.yml``. Preserves direct-serving-endpoint CUJs: when ``DATABRICKS_SERVING_ENDPOINT`` is set (basic chatbot deployments), the AI SDK talks straight to the endpoint and never hits ``/invocations``. Co-authored-by: Isaac

Prior cleanup commit dropped ``API_PROXY=http://localhost:8000/invocations`` from the advanced templates' ``app.yaml`` and ``databricks.yml``. That line pre-existed on ``main``; the PR never meant to remove it. Scope of the previous change was only the *newly-added* ``API_PROXY`` + ``AGENT_BACKEND_URL`` block that activated the Node proxy path. Restore the four files to exactly match ``main``. The chatbot-side ``getApiProxyUrl()`` default only fires when ``API_PROXY`` is unset, so users with main's explicit setting keep their existing behavior. Co-authored-by: Isaac

Both helpers answer routing-decision questions for the provider layer (proxy URL + context-injection gate), and the separate file wasn't buying isolation — providers-server.ts already imports from request-context.ts. One file, same logic. Co-authored-by: Isaac

…surface Companion to databricks-ai-bridge#425 (POC for prose-recovery + always-rotate durable-resume). Minimal template-side changes: - agent-{openai,langgraph}-advanced/pyproject.toml: switch the databricks-ai-bridge / databricks-openai / databricks-langchain branch pins from `dhruv0811/durable-execution-resume` (the structured-repair PR #416) to `dhruv0811/durable-execution-prose-recovery` (the new POC). - agent-langgraph-advanced/agent_server/agent.py: invoke_handler now returns the resolved `thread_id` in `custom_outputs`. After a crash + resume, the bridge rotates `context.conversation_id` to `{base}::attempt-N`. Surfacing it here lets the client pass it back as `custom_inputs.thread_id` on the next turn, so subsequent turns land on the rotated (clean) checkpointer row instead of the orphan-poisoned original. The OpenAI template already does this via `session.session_id` in custom_outputs; LangGraph just didn't. Status ====== POC for review alongside databricks-ai-bridge#425. Not intended to merge unless empirical data justifies the trade vs PR #195. Co-authored-by: Isaac

Companion to databricks-ai-bridge#425 prose-recovery design. The bridge's always-rotate flow rotates `context.conversation_id` to `{base}::attempt-N` on every durable-resume and emits the rotated value in the `response.resumed` SSE event. This patch: - Maintains an in-memory `Map<chat_id, rotated_conversation_id>` in the shared AI-SDK provider's databricksFetch. - Captures the rotation by sniffing the SSE response for `response.resumed { conversation_id: ... }` events. - On subsequent requests for the same chat, swaps the rotated value into `context.conversation_id` before forwarding. Net effect: turn N+1 after a crash lands on the rotated (clean) SDK session instead of the orphan-poisoned original — closing the multi-turn gap without requiring SDK adapter wrappers in the bridge. In-memory only (single Express process). A multi-pod deployment would persist this on the chat row. Co-authored-by: Isaac

…dedup The previous heuristic compared `session_items >= messages - 1` to decide whether to forward only the latest user message. Under prose-recovery + always-rotate, the rotated session has FEWER items than the chatbot's accumulated UI echo (attempt 2's session is fresh; UI accumulated events from both attempts), so the heuristic was returning all messages, including duplicates of attempt 2's tool_calls and the orphan from attempt 1. The Runner then combined session+input, producing duplicate function_call items that the OpenAI SDK groups into a malformed assistant.tool_calls block — Anthropic 400 with "tool_call_ids did not have response messages". Fix: if the session has any items at all, treat it as the authoritative source of cross-turn history and only forward the new user message. First-turn path (empty session) still returns the full input. Co-authored-by: Isaac

…or cross-turn dedup Mirrors the same fix applied to agent-openai-advanced/utils.py (deduplicate_input). When the rotated checkpointer state exists for the current thread_id, only forward the latest user message from the chatbot's request input. Without this, the chatbot's full-history echo (including any orphan tool_use AIMessage from a crashed attempt 1 that the rotated checkpointer doesn't have) would be merged into state via `add_messages` and poison the next LLM call with an unpaired tool_use. Closes the multi-turn gap on the LangGraph side. The bridge (databricks-ai-bridge#425) no longer needs the input sanitizer (`tool_repair.py` + `_sanitize_request_input`) — between this LangGraph dedup and the OpenAI session-as-authoritative dedup, both templates handle UI-echo cleanly. Co-authored-by: Isaac

Cross-turn echo dedup is now handled SDK-agnostically inside the bridge via _trim_echoed_history (databricks-ai-bridge#425). Both templates' agent.py / utils.py go back to main — no per-SDK calls into session.get_items() / agent.aget_state(), no thread_id surface in custom_outputs. The remaining template-side change for the always-rotate flow is e2e-chatbot-app-next/packages/ai-sdk-providers/src/providers-server.ts (alias capture from response.resumed sentinel + injection on outgoing requests). Co-authored-by: Isaac

Replace `response.body!` with an explicit early-return guard. Functionally identical (the SSE check above already implies a body exists), but satisfies Biome's lint/style/noNonNullAssertion rule introduced by my prior commit.

Per Bryan's review feedback, the test framework's _MANAGED_SCHEMAS list isn't the right layer for handling memory- schema permissions — that crosses a layer boundary into per- template configuration. The right shape is: * Workspace setup grants USAGE on workspace-managed schemas to the writer role; SPs inherit it automatically. * Per-template grant_lakebase_permissions.py owns its own table list and grants relative to whichever schema the agent is configured to use (via LAKEBASE_AGENT_MEMORY_SCHEMA env var). In the autoscaling test branch we already have: databricks_writer_16401=UC/... on agent_langgraph_memory which means new SPs created by the test deploy already have USAGE through role inheritance. Combined with the workspace-side ALTER DATABASE search_path that exposes the schema by default, the deployed app resolves the pgvector type without any test- framework grants on this schema. Co-authored-by: Isaac

This reverts commit 9d4af20.

Companion to databricks-ai-bridge#425's removal of `_trim_echoed_history`. Per design discussion with Bryan, echo dedup is an agent-layer concern — the agent owns its SDK session/checkpointer and is the right layer to know what's already persisted vs what's a new turn. agent-openai-advanced/agent_server/utils.py - Update `deduplicate_input` heuristic from `len(session_items) >= len(messages) - 1` to `session_items and len(messages) > 1`. The old count-based check broke under prose-recovery + always-rotate (rotated session has fewer items than the chatbot's accumulated UI echo). The new check trusts the session as authoritative for prior turns whenever it has any items. agent-langgraph-advanced/agent_server/agent.py - Add `aget_state` probe in `stream_handler`. When the checkpointer already has messages for this thread, drop everything in input except the latest user message before passing to `agent.astream`. Without this, `add_messages` would append the chatbot's full-history echo — it dedupes by `id`, but MLflow's `responses_to_cc` doesn't preserve IDs, so dedup never fires across the bridge boundary. Both: ~10 lines per template, runs at the same point the SDK session read happens, no SDK adapter wrapping. Co-authored-by: Isaac

dhruv0811

Overall this seems really heavy on the UI side. Is there a way to circumvent the whole express proxy thing? I want to simplify this a lot. This is too large and clunky. Can you think through only the minimum required changes based on the final version of the server here: databricks/databricks-ai-bridge#425 to see what is required in the template and in the UI, and how we can simplify the approach.

dhruv0811 · 2026-05-01T20:27:37Z

 logging.getLogger("mlflow.utils.autologging_utils").setLevel(logging.ERROR)
 sp_workspace_client = WorkspaceClient()

 LLM_ENDPOINT_NAME = "databricks-gpt-5-2"


This is a local change for testing, should not be in the PR.

dhruv0811 · 2026-05-01T20:46:25Z

            # For on-behalf-of user authentication, pass get_user_workspace_client() to init_agent.
            agent = await init_agent(store=store, checkpointer=checkpointer)

+            # When the checkpointer already has prior turns for this thread,


Can we move this logic to a function that is equivalent to the deduplicate_input in the openai template? Location and usage wise.

dhruv0811 · 2026-05-01T20:46:55Z

    poll_interval_seconds=float(os.getenv("POLL_INTERVAL_SECONDS", "1.0")),
 )

+log_level = os.getenv("LOG_LEVEL", "INFO")


This feels like a very messy way to propogate logs, can we see if its possible to clean this up? and change the comments to be bridge specific, not durable specific since all bridge logs will be shown with this.

dhruv0811 · 2026-05-01T20:48:26Z

            "  Option 3 (provisioned): LAKEBASE_INSTANCE_NAME=<your-instance-name>\n"
        )

    memory_schema = os.getenv("LAKEBASE_AGENT_MEMORY_SCHEMA") or None


Why is this addition needed? Doesn't seem durable related??

dhruv0811 added 30 commits April 16, 2026 22:27

Match API_PROXY + AGENT_BACKEND_URL in app.yaml too

399ffde

app.yaml env vars were overriding databricks.yml at runtime, so the AI SDK was still talking directly to the Python FastAPI backend and the Express /invocations proxy never saw the request. Keep both files in sync.

Proxy: accept response_id from top-level, nested response.id, or id= …

948f7b4

…resp_* Broadens the response_id parser so it works whether the backend tags frames with top-level response_id (preferred) or the older nested-only shape.

Proxy: log upstream close + each resume-fetch attempt + final stream …

cea0508

…tally Matches the [/invocations] prefix so the full story is greppable from apps logs without correlating Node and Python timestamps.

Add debug logs for durable-resume data-resumed event propagation

75dae5d

Catch-all log in onData to trace which data parts reach client

47a063f

Wipe text in-place on data-resumed instead of removing the part

c27c016

Merge remote-tracking branch 'origin/main' into dhruv0811/durable-exe…

89f096b

…cution-templates # Conflicts: # agent-openai-advanced/databricks.yml

debug: log response.resumed detection in chat.ts onChunk

e9b4064

debug: log every dataPart in chat.tsx onData to diagnose UI drop

337c39f

debug: log setMessages wipe details on data-resumed

d144ec0

dhruv0811 added 10 commits April 23, 2026 00:07

dhruv0811 mentioned this pull request Apr 28, 2026

LongRunningAgentServer: Durable prose-recovery + always-rotate databricks/databricks-ai-bridge#425

Merged

5 tasks

dhruv0811 added 5 commits April 28, 2026 01:02

dhruv0811 marked this pull request as ready for review April 28, 2026 18:18

dhruv0811 changed the title ~~[POC] Templates: prose-recovery + always-rotate (companion to ai-bridge#425)~~ Templates: Durable prose-recovery + always-rotate (companion to ai-bridge#425) Apr 28, 2026

dhruv0811 added 3 commits April 28, 2026 18:51

Revert "Revert agent_langgraph_memory from _MANAGED_SCHEMAS"

2a5c1cf

This reverts commit 9d4af20.

dhruv0811 requested a review from bbqiu April 30, 2026 17:17

dhruv0811 commented May 1, 2026

View reviewed changes

dhruv0811 mentioned this pull request May 1, 2026

Durable execution: minimal advanced-template + chatbot integration #207

Open

5 tasks

dhruv0811 marked this pull request as draft May 1, 2026 22:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Templates: Durable prose-recovery + always-rotate (companion to ai-bridge#425)#204

Templates: Durable prose-recovery + always-rotate (companion to ai-bridge#425)#204
dhruv0811 wants to merge 48 commits intomainfrom
dhruv0811/durable-execution-prose-recovery-templates

dhruv0811 commented Apr 28, 2026 •

edited

Loading

Uh oh!

dhruv0811 left a comment

Uh oh!

dhruv0811 May 1, 2026

Uh oh!

dhruv0811 May 1, 2026

Uh oh!

dhruv0811 May 1, 2026

Uh oh!

dhruv0811 May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dhruv0811 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Chatbot alias map (e2e-chatbot-app-next/packages/ai-sdk-providers/src/providers-server.ts)

2. OpenAI template UI-echo dedup (agent-openai-advanced/agent_server/utils.py::deduplicate_input)

3. LangGraph template UI-echo dedup (agent-langgraph-advanced/agent_server/agent.py::stream_handler)

4. pyproject.toml branch pin (both advanced templates)

Test plan

How to run the crash test

Pre-merge checklist

Uh oh!

dhruv0811 left a comment

Choose a reason for hiding this comment

Uh oh!

dhruv0811 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

dhruv0811 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

dhruv0811 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

dhruv0811 May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dhruv0811 commented Apr 28, 2026 •

edited

Loading

1. Chatbot alias map (`e2e-chatbot-app-next/packages/ai-sdk-providers/src/providers-server.ts`)

2. OpenAI template UI-echo dedup (`agent-openai-advanced/agent_server/utils.py::deduplicate_input`)

3. LangGraph template UI-echo dedup (`agent-langgraph-advanced/agent_server/agent.py::stream_handler`)

4. `pyproject.toml` branch pin (both advanced templates)