Python: Fix spans not correctly nested when using streaming#5552
Python: Fix spans not correctly nested when using streaming#5552
Conversation
Python Test Coverage Report •
Python Unit Test Overview
|
|||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 80%
✓ Correctness
The PR adds a
with_pull_context_managermechanism toResponseStreamfor activating OTel spans around each iterator pull, fixing span parenting for streaming operations. The implementation is sound overall. One asymetry exists: the chat telemetry streaming path movessuper_get_response()after span creation without wrapping it in a try/except (unlike the agent invocation path which properly calls_close_span()on failure), creating a potential span leak if stream construction throws.
✓ Security Reliability
The PR introduces a well-designed mechanism for attaching OTel spans to streaming iterator pulls via
with_pull_context_manager. The_activate_spancontext manager correctly uses try/finally for detach, and the ExitStack in__anext__properly scopes the context managers. However, in the chat streaming path,_start_streaming_span()creates a span beforesuper_get_response()is called without a try/except guard — ifsuper_get_response()raises, the span is never ended (leaked). The agent invocation path correctly handles this case by wrappingexecute()in try/except with_close_span(), but the chat path lacks equivalent protection.
✓ Test Coverage
The PR adds a
with_pull_context_managermechanism toResponseStreamand uses it to properly parent OTel spans during streaming. Four new integration tests validate parent-child span relationships for streaming and non-streaming paths. However, there are notable test coverage gaps: no unit test for thewith_pull_context_managermethod in isolation (independent of OTel), and no test for the new error-handling path in_trace_agent_invocationwhere_close_span()is called whenexecute()fails during streaming.
✓ Design Approach
The new per-pull context manager is the right direction for streaming span parenting, but the implementation is still too narrow: it only wraps
ResponseStream.__anext__, while this codebase also usesResponseStream.__await__as a first-class way to resolve streaming sources before iteration. Because core streaming/tool code awaits streams before pulling updates, work done during_get_stream()resolution can still run outside the intended parent span, so the fix does not fully address the underlying context-propagation problem.
Automated review by TaoChenOSU's agents
There was a problem hiding this comment.
Pull request overview
Fixes OpenTelemetry span parent/child nesting for streaming operations by ensuring the “current span” context is active during stream pulls (and initial stream resolution), and adds regression tests to validate correct trace hierarchy (agent → chat → inner spans).
Changes:
- Add per-pull context activation support to
ResponseStreamand use it to activate spans during streaming iteration. - Refactor streaming span creation for chat + agent telemetry to use shared helpers and to attach spans during stream pulls.
- Add tests asserting correct span nesting for agent/chat/tool/inner spans in both streaming and non-streaming flows.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| python/packages/core/agent_framework/observability.py | Refactors streaming tracing to create spans up-front and activate them per stream pull to preserve parent/child relationships. |
| python/packages/core/agent_framework/_types.py | Extends ResponseStream with per-pull context manager factories and applies them around iterator pulls (and stream resolution within __anext__). |
| python/packages/core/tests/core/test_observability.py | Adds regression tests validating correct span nesting across streaming/non-streaming, tool execution, and inner spans. |
| python/uv.lock | Updates lockfile metadata/deps (notably lock revision and wheel entries). |
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 89%
✓ Correctness
This PR implements a well-designed mechanism for OpenTelemetry span nesting in streaming operations. It uses pull context managers to activate spans around each anext call, leveraging the lazy resolution of from_awaitable() streams to ensure child spans (chat, HTTP) are correctly parented under outer spans (agent invoke). The error handling for span cleanup on failure is properly addressed. The implementation is correct and the tests are comprehensive.
✓ Security Reliability
This PR adds proper OTel span parenting for streaming operations via a per-pull context manager mechanism, and fixes span leaks when stream construction fails. The implementation correctly uses attach/detach within the same async context to avoid cross-context cleanup issues. Error handling paths properly close spans and reset context variables. The ExitStack in anext ensures symmetrical enter/exit even on exceptions. No significant security or reliability issues found.
✓ Test Coverage
The PR adds comprehensive tests for the new span nesting behavior and
with_pull_context_managermechanism. The streaming error paths (chat and agent) are both tested. The main gap is that_resolve_stream_with_pull_contextsis tested only via the__await__path but not viaget_final_response()without prior iteration—a distinct code path that the diff explicitly modifies. All other new behaviors have meaningful test coverage with proper assertions.
✓ Design Approach
The new per-pull context manager is a good direction for spans created during streaming iteration, but the approach is still too narrow: both streaming wrappers leave the synchronous stream-construction phase outside the span context. Since this code explicitly supports direct
ResponseStreamreturns, any child spans or context-sensitive work done before the first pull still won't be parented correctly. Wrappingsuper_get_response()/execute()in_activate_span(span)as well would address the full lifecycle instead of only the iterator pulls.
Automated review by TaoChenOSU's agents
Add the streaming-span observability fix to the Fixed section. PR is on upstream/main but not yet pulled into origin/main; the code itself will land via the PR merge.
* Python: bump package versions for 1.2.2 release PATCH bump (1.2.1 -> 1.2.2) for the released cohort. Five PRs land in this window: - agent-framework-openai: fix file_search citations breaking the assistant- message history roundtrip (#5557) — drives the released-tier PATCH - agent-framework-orchestrations: [BREAKING] standardize orchestration terminal outputs as AgentResponse (#5301) - agent-framework-core, agent-framework-declarative: preserve Workflow.run() shared state across calls, accept list[Message] in declarative start executor, and coerce Enum values when serializing PowerFx symbols (#5531) - agent-framework-foundry-hosting: add hosted Durable Workflow support (#5531) - agent-framework-azure-contentunderstanding: new alpha package — Azure AI Content Understanding context provider (#4829) - dependencies: workspace package dependency refresh (#5555) Per lockstep convention, all 21 beta packages stamp 1.0.0b260429 and all 4 alpha packages (now including the new contentunderstanding) stamp 1.0.0a260429. Date stamp reflects 2026-04-29 Pacific. Every non-core package floor on agent-framework-core is raised to >=1.2.2; the new contentunderstanding package's stale >=1.0.0 floor is brought into line. Two follow-on fixes bundled to keep validate-dependency-bounds-test green at lowest-direct resolution: - Bump agent-framework-azure-contentunderstanding's azure-ai-content understanding lower bound from >=1.0.0 to >=1.0.1 (1.0.0 ships without proper typing — pyright reports 65 unknown-type errors) - Add pyright ignore comments to core/foundry/__init__.pyi for the new alpha package's type-stub imports, since alpha packages are not in core's [all] extra and therefore aren't installed at lowest-direct * Python: add #5552 to 1.2.2 CHANGELOG Add the streaming-span observability fix to the Fixed section. PR is on upstream/main but not yet pulled into origin/main; the code itself will land via the PR merge. * Python: address PR #5561 review feedback on dependency bounds Two packaging fixes flagged in review: 1. agent-framework-azure-contentunderstanding: add agent-framework-foundry as a runtime dependency. The package's README directs users to `pip install agent-framework-azure-contentunderstanding --pre` and the basic example imports `FoundryChatClient` from `agent_framework.foundry`, so the documented install path was failing with ImportError. Pulling agent-framework-foundry into deps makes the advertised entry path self-contained. 2. agent-framework-foundry: bump agent-framework-openai lower bound from >=1.1.0 to >=1.2.2,<2. Foundry imports private modules from agent_framework_openai (`_chat_client.py:22`, `_agent.py:34`), so resolvers were free to pair foundry==1.2.2 with older OpenAI versions that lack this release's coordinated Responses/history fix. Lockstep the floor with the released cohort to prevent mismatched installs. Both changes pass `validate-dependency-bounds-test` lower + upper at their respective packages.
Motivation and Context
Address: #5528
Description
Before:

After:

Tests are also added to prevent regression.
Contribution Checklist