Python: Fix spans not correctly nested when using streaming by TaoChenOSU · Pull Request #5552 · microsoft/agent-framework

TaoChenOSU · 2026-04-29T00:10:41Z

Motivation and Context

Address: #5528

Description

Before:

After:

Tests are also added to prevent regression.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

moonbox3 · 2026-04-29T00:13:16Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/core/agent_framework
_types.py	1111	87	92%	59, 68–69, 123, 128, 147, 149, 153, 157, 159, 161, 163, 181, 185, 211, 233, 238, 243, 247, 277, 690–691, 850–851, 1286, 1358, 1393, 1413, 1423, 1475, 1607–1609, 1791, 1894–1899, 1924, 2018, 2026–2028, 2033, 2136, 2159, 2414, 2438, 2537, 2791, 3001, 3060, 3099, 3110, 3112–3116, 3118, 3121–3129, 3139, 3228, 3363, 3368, 3373, 3378, 3382, 3466–3468, 3497, 3574–3578
observability.py	769	80	89%	378, 380–381, 384, 387, 390–391, 396–397, 403–404, 410–411, 418, 420–422, 425–427, 432–433, 439–440, 446–447, 454, 611–612, 740, 744–746, 748, 752–753, 757, 795, 797, 808–810, 812–814, 818, 826, 950–951, 1113, 1355–1356, 1461–1466, 1473–1476, 1480–1488, 1495, 1583, 1623–1624, 1767, 1903, 2100, 2318, 2320
TOTAL	30531	3557	88%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
6144	30 💤	0 ❌	0 🔥	1m 42s ⏱️

github-actions

Automated Code Review

Reviewers: 4 | Confidence: 80%

✓ Correctness

The PR adds a with_pull_context_manager mechanism to ResponseStream for activating OTel spans around each iterator pull, fixing span parenting for streaming operations. The implementation is sound overall. One asymetry exists: the chat telemetry streaming path moves super_get_response() after span creation without wrapping it in a try/except (unlike the agent invocation path which properly calls _close_span() on failure), creating a potential span leak if stream construction throws.

✓ Security Reliability

The PR introduces a well-designed mechanism for attaching OTel spans to streaming iterator pulls via with_pull_context_manager. The _activate_span context manager correctly uses try/finally for detach, and the ExitStack in __anext__ properly scopes the context managers. However, in the chat streaming path, _start_streaming_span() creates a span before super_get_response() is called without a try/except guard — if super_get_response() raises, the span is never ended (leaked). The agent invocation path correctly handles this case by wrapping execute() in try/except with _close_span(), but the chat path lacks equivalent protection.

✓ Test Coverage

The PR adds a with_pull_context_manager mechanism to ResponseStream and uses it to properly parent OTel spans during streaming. Four new integration tests validate parent-child span relationships for streaming and non-streaming paths. However, there are notable test coverage gaps: no unit test for the with_pull_context_manager method in isolation (independent of OTel), and no test for the new error-handling path in _trace_agent_invocation where _close_span() is called when execute() fails during streaming.

✓ Design Approach

The new per-pull context manager is the right direction for streaming span parenting, but the implementation is still too narrow: it only wraps ResponseStream.__anext__, while this codebase also uses ResponseStream.__await__ as a first-class way to resolve streaming sources before iteration. Because core streaming/tool code awaits streams before pulling updates, work done during _get_stream() resolution can still run outside the intended parent span, so the fix does not fully address the underlying context-propagation problem.

Automated review by TaoChenOSU's agents

Copilot

Pull request overview

Fixes OpenTelemetry span parent/child nesting for streaming operations by ensuring the “current span” context is active during stream pulls (and initial stream resolution), and adds regression tests to validate correct trace hierarchy (agent → chat → inner spans).

Changes:

Add per-pull context activation support to ResponseStream and use it to activate spans during streaming iteration.
Refactor streaming span creation for chat + agent telemetry to use shared helpers and to attach spans during stream pulls.
Add tests asserting correct span nesting for agent/chat/tool/inner spans in both streaming and non-streaming flows.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.

File	Description
python/packages/core/agent_framework/observability.py	Refactors streaming tracing to create spans up-front and activate them per stream pull to preserve parent/child relationships.
python/packages/core/agent_framework/_types.py	Extends `ResponseStream` with per-pull context manager factories and applies them around iterator pulls (and stream resolution within `__anext__`).
python/packages/core/tests/core/test_observability.py	Adds regression tests validating correct span nesting across streaming/non-streaming, tool execution, and inner spans.
python/uv.lock	Updates lockfile metadata/deps (notably lock revision and wheel entries).

github-actions

Automated Code Review

Reviewers: 4 | Confidence: 89%

✓ Correctness

This PR implements a well-designed mechanism for OpenTelemetry span nesting in streaming operations. It uses pull context managers to activate spans around each anext call, leveraging the lazy resolution of from_awaitable() streams to ensure child spans (chat, HTTP) are correctly parented under outer spans (agent invoke). The error handling for span cleanup on failure is properly addressed. The implementation is correct and the tests are comprehensive.

✓ Security Reliability

This PR adds proper OTel span parenting for streaming operations via a per-pull context manager mechanism, and fixes span leaks when stream construction fails. The implementation correctly uses attach/detach within the same async context to avoid cross-context cleanup issues. Error handling paths properly close spans and reset context variables. The ExitStack in anext ensures symmetrical enter/exit even on exceptions. No significant security or reliability issues found.

✓ Test Coverage

The PR adds comprehensive tests for the new span nesting behavior and with_pull_context_manager mechanism. The streaming error paths (chat and agent) are both tested. The main gap is that _resolve_stream_with_pull_contexts is tested only via the __await__ path but not via get_final_response() without prior iteration—a distinct code path that the diff explicitly modifies. All other new behaviors have meaningful test coverage with proper assertions.

✓ Design Approach

The new per-pull context manager is a good direction for spans created during streaming iteration, but the approach is still too narrow: both streaming wrappers leave the synchronous stream-construction phase outside the span context. Since this code explicitly supports direct ResponseStream returns, any child spans or context-sensitive work done before the first pull still won't be parented correctly. Wrapping super_get_response() / execute() in _activate_span(span) as well would address the full lifecycle instead of only the iterator pulls.

Automated review by TaoChenOSU's agents

Add the streaming-span observability fix to the Fixed section. PR is on upstream/main but not yet pulled into origin/main; the code itself will land via the PR merge.

* Python: bump package versions for 1.2.2 release PATCH bump (1.2.1 -> 1.2.2) for the released cohort. Five PRs land in this window: - agent-framework-openai: fix file_search citations breaking the assistant- message history roundtrip (#5557) — drives the released-tier PATCH - agent-framework-orchestrations: [BREAKING] standardize orchestration terminal outputs as AgentResponse (#5301) - agent-framework-core, agent-framework-declarative: preserve Workflow.run() shared state across calls, accept list[Message] in declarative start executor, and coerce Enum values when serializing PowerFx symbols (#5531) - agent-framework-foundry-hosting: add hosted Durable Workflow support (#5531) - agent-framework-azure-contentunderstanding: new alpha package — Azure AI Content Understanding context provider (#4829) - dependencies: workspace package dependency refresh (#5555) Per lockstep convention, all 21 beta packages stamp 1.0.0b260429 and all 4 alpha packages (now including the new contentunderstanding) stamp 1.0.0a260429. Date stamp reflects 2026-04-29 Pacific. Every non-core package floor on agent-framework-core is raised to >=1.2.2; the new contentunderstanding package's stale >=1.0.0 floor is brought into line. Two follow-on fixes bundled to keep validate-dependency-bounds-test green at lowest-direct resolution: - Bump agent-framework-azure-contentunderstanding's azure-ai-content understanding lower bound from >=1.0.0 to >=1.0.1 (1.0.0 ships without proper typing — pyright reports 65 unknown-type errors) - Add pyright ignore comments to core/foundry/__init__.pyi for the new alpha package's type-stub imports, since alpha packages are not in core's [all] extra and therefore aren't installed at lowest-direct * Python: add #5552 to 1.2.2 CHANGELOG Add the streaming-span observability fix to the Fixed section. PR is on upstream/main but not yet pulled into origin/main; the code itself will land via the PR merge. * Python: address PR #5561 review feedback on dependency bounds Two packaging fixes flagged in review: 1. agent-framework-azure-contentunderstanding: add agent-framework-foundry as a runtime dependency. The package's README directs users to `pip install agent-framework-azure-contentunderstanding --pre` and the basic example imports `FoundryChatClient` from `agent_framework.foundry`, so the documented install path was failing with ImportError. Pulling agent-framework-foundry into deps makes the advertised entry path self-contained. 2. agent-framework-foundry: bump agent-framework-openai lower bound from >=1.1.0 to >=1.2.2,<2. Foundry imports private modules from agent_framework_openai (`_chat_client.py:22`, `_agent.py:34`), so resolvers were free to pair foundry==1.2.2 with older OpenAI versions that lack this release's coordinated Responses/history fix. Lockstep the floor with the released cohort to prevent mismatched installs. Both changes pass `validate-dependency-bounds-test` lower + upper at their respective packages.

Fix spans not correctly nested when using streaming

2701e22

TaoChenOSU self-assigned this Apr 29, 2026

Copilot AI review requested due to automatic review settings April 29, 2026 00:10

TaoChenOSU added this to Agent Framework Apr 29, 2026

moonbox3 added the python label Apr 29, 2026

github-actions Bot changed the title ~~Fix spans not correctly nested when using streaming~~ Python: Fix spans not correctly nested when using streaming Apr 29, 2026

Copilot started reviewing on behalf of TaoChenOSU April 29, 2026 00:11 View session

github-actions Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread python/packages/core/agent_framework/observability.py Outdated

Comment thread python/packages/core/tests/core/test_observability.py

Comment thread python/packages/core/agent_framework/observability.py

Copilot AI reviewed Apr 29, 2026

View reviewed changes

Comment thread python/packages/core/agent_framework/observability.py Outdated

Comment thread python/packages/core/agent_framework/_types.py

TaoChenOSU added 2 commits April 28, 2026 21:25

fix pre commit

b62ce6d

Address comments

b9926c0

TaoChenOSU marked this pull request as ready for review April 29, 2026 04:44

moonbox3 approved these changes Apr 29, 2026

View reviewed changes

github-actions Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread python/packages/core/tests/core/test_observability.py

Comment thread python/packages/core/agent_framework/observability.py

Comment thread python/packages/core/agent_framework/observability.py

eavanvalkenburg approved these changes Apr 29, 2026

View reviewed changes

moonbox3 mentioned this pull request Apr 29, 2026

Python: Extend streaming span parenting to synchronous stream construction and additional resolution paths #5559

Open

moonbox3 added this pull request to the merge queue Apr 29, 2026

Merged via the queue into main with commit 03e47b5 Apr 29, 2026
35 checks passed

github-project-automation Bot moved this to Done in Agent Framework Apr 29, 2026

This was referenced Apr 29, 2026

Bump the microsoft group with 7 updates razeone/CloudEngAgent#27

Open

Bump the dotnet group with 1 update dotnet/docs#53477

Merged

rickywck mentioned this pull request Apr 29, 2026

Python: [Bug]: executor.process spans intermittently never close, leaving child gen_ai.* spans orphan in App Insights #5577

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Fix spans not correctly nested when using streaming#5552

Python: Fix spans not correctly nested when using streaming#5552
moonbox3 merged 3 commits intomainfrom
fix/span-not-correctly-nested-when-sreaming

TaoChenOSU commented Apr 29, 2026

Uh oh!

moonbox3 commented Apr 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

TaoChenOSU commented Apr 29, 2026

Motivation and Context

Description

Contribution Checklist

Uh oh!

moonbox3 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

✓ Design Approach

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

✓ Design Approach

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

moonbox3 commented Apr 29, 2026 •

edited

Loading