fix: flush pending spans before HTTP response to prevent loss in hosted sandboxes by nagkumar91 · Pull Request #46154 · Azure/azure-sdk-for-python

nagkumar91 · 2026-04-06T18:23:01Z

Problem

BatchSpanProcessor exports spans on a background timer (default 5 seconds). In hosted sandbox environments (Azure AI Foundry vNext), the platform may suspend the process immediately after an HTTP response is sent, before the batch timer fires. This causes short-lived spans to be lost.

What gets lost: Per-node invoke_agent spans from LangGraph/LangChain auto-instrumentation (<1ms each) that end just before the response is returned.

What survives without the fix: chat spans (3-8s) and execute_tool spans — these end during graph execution while subsequent LLM calls create enough wall-clock delay for the batch timer to fire.

Before vs After

Before (without `force_flush`) — 11 spans

request:  invoke_agent naarkalg-langgraph-travel-agent:1       (29s)
  └─ dependency: invoke_agent naarkalg-langgraph-travel-agent  (29s)
       ├─ gen_ai.retriever                                      (0ms)
       ├─ chat gpt-4o                                           (5.1s)
       ├─ chat gpt-4o                                           (2.6s)
       ├─ execute_tool search_flights                           (0ms)
       ├─ execute_tool search_hotels                            (0ms)
       ├─ execute_tool get_destination_weather                  (0ms)
       ├─ execute_tool estimate_trip_cost                       (1ms)
       ├─ chat gpt-4o                                           (9.3s)
       └─ chat gpt-4o                                           (12.3s)

All per-node invoke_agent spans (user_proxy, orchestrator, draft_plan, run_tools, finalize, etc.) are missing — lost in the BatchSpanProcessor buffer when the sandbox suspended.

After (with `force_flush`) — 19 spans ✅

request:  invoke_agent naarkalg-langgraph-travel-agent:1       (29s)
  └─ dependency: invoke_agent naarkalg-langgraph-travel-agent  (29s)
       ├─ invoke_agent user_proxy                               (0ms)
       ├─ invoke_agent orchestrator                             (0ms)
       ├─ invoke_agent retrieve_context                         (0ms)
       │  └─ gen_ai.retriever                                   (0ms)
       ├─ invoke_agent research_destination                     (0ms)
       ├─ invoke_agent draft_plan                               (5.1s)
       │  └─ chat gpt-4o                                        (5.1s)
       ├─ invoke_agent run_tools                                (11.9s)
       │  ├─ chat gpt-4o                                        (2.6s)
       │  ├─ execute_tool search_flights                        (0ms)
       │  ├─ execute_tool search_hotels                         (0ms)
       │  ├─ execute_tool get_destination_weather                (0ms)
       │  ├─ execute_tool estimate_trip_cost                     (1ms)
       │  └─ chat gpt-4o                                        (9.3s)
       ├─ invoke_agent evaluate_constraints                     (0ms)
       └─ invoke_agent finalize                                 (12.3s)
            └─ chat gpt-4o                                      (12.3s)

Full graph-node hierarchy preserved. For traces with replan loops (budget exceeded), span count grows to 31 with the full replan path visible.

Hosted validation (5 invokes, same agent image)

Prompt	Total Spans	invoke_agent	chat	tool
Tokyo 3-day, $3000	19	10	4	4
Paris 5-day, $2500	31	14	8	8
Rome weekend, $1800	31	14	8	8
Seoul 6-day, $5000	19	10	4	4
Bali honeymoon, $8000	19	10	4	4

Paris and Rome trigger a replan loop (budget exceeded → replan → re-run tools), producing 31 spans.

Fix

Add flush_spans() to the azure-ai-agentserver-core public API (_tracing.py)
Call it in _endpoint_handler.py finally block (covers all non-streaming exit paths)
Call it in trace_stream() finally block (covers the streaming path)
Gracefully no-ops when OTel SDK is not installed or provider does not support force_flush

Changes

File	Change
`core/_tracing.py`	Add `flush_spans()` function + call from `trace_stream`
`core/__init__.py`	Export `flush_spans`
`responses/hosting/_endpoint_handler.py`	Import + call `flush_spans()` in finally block

Environment

Agent: naarkalg-langgraph-travel-agent:1 on hosted-agents-evals-bugbash-wus2
Image: hostedagentsevals.azurecr.io/naarkalg-langgraph-travel-agent:20260406112724
Tracer: langchain-azure-ai[opentelemetry] from langchain-ai/langchain-azure@main
Model: gpt-4o

…ed sandboxes BatchSpanProcessor exports spans on a background timer (default 5s). In hosted sandbox environments the platform may suspend the process immediately after the HTTP response is sent, before the timer fires. This causes short-lived spans — such as LangGraph per-node invoke_agent spans created by third-party tracers — to be lost. Add flush_spans() to the core public API and call it: - In _endpoint_handler.py's finally block (covers all non-streaming exits) - In trace_stream's finally block (covers the streaming path) Locally verified: same agent code produces 30 spans with flush vs 11 without flush, confirming the BatchSpanProcessor timing issue. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ankitbko · 2026-04-07T04:51:03Z

sdk/agentserver/azure-ai-agentserver-core/azure/ai/agentserver/core/__init__.py

        AgentServerHost,
        create_error_response,
        end_span,
+        flush_spans,


Please move all the core package changes to https://github.com/Azure/azure-sdk-for-python/pull/46127/changes

Done! #46181 is the other PR

…-before-response

nagkumar91 requested review from JC-386 and lusu-msft as code owners April 6, 2026 18:23

github-actions bot added the Hosted Agents sdk/agentserver/* label Apr 6, 2026

ankitbko requested changes Apr 7, 2026

View reviewed changes

nagkumar91 changed the base branch from agentserver/responses to agentserver/invoke April 7, 2026 14:17

nagkumar91 requested review from lmazuel, msyyc and scbedd as code owners April 7, 2026 14:17

nagkumar91 changed the base branch from agentserver/invoke to agentserver/responses April 7, 2026 14:18

nagkumar91 mentioned this pull request Apr 7, 2026

fix(core): add flush_spans() to drain BatchSpanProcessor before sandbox suspend #46181

Merged

Merge branch 'agentserver/responses' into fix/agentserver-flush-spans…

7339122

…-before-response

ankitbko merged commit 925b491 into agentserver/responses Apr 7, 2026
2 checks passed

ankitbko deleted the fix/agentserver-flush-spans-before-response branch April 7, 2026 19:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: flush pending spans before HTTP response to prevent loss in hosted sandboxes#46154

fix: flush pending spans before HTTP response to prevent loss in hosted sandboxes#46154
ankitbko merged 2 commits intoagentserver/responsesfrom
fix/agentserver-flush-spans-before-response

nagkumar91 commented Apr 6, 2026 •

edited

Loading

Uh oh!

ankitbko Apr 7, 2026

Uh oh!

nagkumar91 Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nagkumar91 commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Before vs After

Before (without force_flush) — 11 spans

After (with force_flush) — 19 spans ✅

Hosted validation (5 invokes, same agent image)

Fix

Changes

Environment

Uh oh!

ankitbko Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

nagkumar91 Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nagkumar91 commented Apr 6, 2026 •

edited

Loading

Before (without `force_flush`) — 11 spans

After (with `force_flush`) — 19 spans ✅