Skip to content

Python: Fix spans not correctly nested when using streaming#5552

Merged
moonbox3 merged 3 commits intomainfrom
fix/span-not-correctly-nested-when-sreaming
Apr 29, 2026
Merged

Python: Fix spans not correctly nested when using streaming#5552
moonbox3 merged 3 commits intomainfrom
fix/span-not-correctly-nested-when-sreaming

Conversation

@TaoChenOSU
Copy link
Copy Markdown
Contributor

Motivation and Context

Address: #5528

Description

Before:
image

After:
image

Tests are also added to prevent regression.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

@TaoChenOSU TaoChenOSU self-assigned this Apr 29, 2026
Copilot AI review requested due to automatic review settings April 29, 2026 00:10
@github-actions github-actions Bot changed the title Fix spans not correctly nested when using streaming Python: Fix spans not correctly nested when using streaming Apr 29, 2026
@moonbox3
Copy link
Copy Markdown
Contributor

moonbox3 commented Apr 29, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework
   _types.py11118792%59, 68–69, 123, 128, 147, 149, 153, 157, 159, 161, 163, 181, 185, 211, 233, 238, 243, 247, 277, 690–691, 850–851, 1286, 1358, 1393, 1413, 1423, 1475, 1607–1609, 1791, 1894–1899, 1924, 2018, 2026–2028, 2033, 2136, 2159, 2414, 2438, 2537, 2791, 3001, 3060, 3099, 3110, 3112–3116, 3118, 3121–3129, 3139, 3228, 3363, 3368, 3373, 3378, 3382, 3466–3468, 3497, 3574–3578
   observability.py7698089%378, 380–381, 384, 387, 390–391, 396–397, 403–404, 410–411, 418, 420–422, 425–427, 432–433, 439–440, 446–447, 454, 611–612, 740, 744–746, 748, 752–753, 757, 795, 797, 808–810, 812–814, 818, 826, 950–951, 1113, 1355–1356, 1461–1466, 1473–1476, 1480–1488, 1495, 1583, 1623–1624, 1767, 1903, 2100, 2318, 2320
TOTAL30531355788% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
6144 30 💤 0 ❌ 0 🔥 1m 42s ⏱️

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 80%

✓ Correctness

The PR adds a with_pull_context_manager mechanism to ResponseStream for activating OTel spans around each iterator pull, fixing span parenting for streaming operations. The implementation is sound overall. One asymetry exists: the chat telemetry streaming path moves super_get_response() after span creation without wrapping it in a try/except (unlike the agent invocation path which properly calls _close_span() on failure), creating a potential span leak if stream construction throws.

✓ Security Reliability

The PR introduces a well-designed mechanism for attaching OTel spans to streaming iterator pulls via with_pull_context_manager. The _activate_span context manager correctly uses try/finally for detach, and the ExitStack in __anext__ properly scopes the context managers. However, in the chat streaming path, _start_streaming_span() creates a span before super_get_response() is called without a try/except guard — if super_get_response() raises, the span is never ended (leaked). The agent invocation path correctly handles this case by wrapping execute() in try/except with _close_span(), but the chat path lacks equivalent protection.

✓ Test Coverage

The PR adds a with_pull_context_manager mechanism to ResponseStream and uses it to properly parent OTel spans during streaming. Four new integration tests validate parent-child span relationships for streaming and non-streaming paths. However, there are notable test coverage gaps: no unit test for the with_pull_context_manager method in isolation (independent of OTel), and no test for the new error-handling path in _trace_agent_invocation where _close_span() is called when execute() fails during streaming.

✓ Design Approach

The new per-pull context manager is the right direction for streaming span parenting, but the implementation is still too narrow: it only wraps ResponseStream.__anext__, while this codebase also uses ResponseStream.__await__ as a first-class way to resolve streaming sources before iteration. Because core streaming/tool code awaits streams before pulling updates, work done during _get_stream() resolution can still run outside the intended parent span, so the fix does not fully address the underlying context-propagation problem.


Automated review by TaoChenOSU's agents

Comment thread python/packages/core/agent_framework/observability.py Outdated
Comment thread python/packages/core/tests/core/test_observability.py
Comment thread python/packages/core/agent_framework/observability.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes OpenTelemetry span parent/child nesting for streaming operations by ensuring the “current span” context is active during stream pulls (and initial stream resolution), and adds regression tests to validate correct trace hierarchy (agent → chat → inner spans).

Changes:

  • Add per-pull context activation support to ResponseStream and use it to activate spans during streaming iteration.
  • Refactor streaming span creation for chat + agent telemetry to use shared helpers and to attach spans during stream pulls.
  • Add tests asserting correct span nesting for agent/chat/tool/inner spans in both streaming and non-streaming flows.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.

File Description
python/packages/core/agent_framework/observability.py Refactors streaming tracing to create spans up-front and activate them per stream pull to preserve parent/child relationships.
python/packages/core/agent_framework/_types.py Extends ResponseStream with per-pull context manager factories and applies them around iterator pulls (and stream resolution within __anext__).
python/packages/core/tests/core/test_observability.py Adds regression tests validating correct span nesting across streaming/non-streaming, tool execution, and inner spans.
python/uv.lock Updates lockfile metadata/deps (notably lock revision and wheel entries).

Comment thread python/packages/core/agent_framework/observability.py Outdated
Comment thread python/packages/core/agent_framework/_types.py
@TaoChenOSU TaoChenOSU marked this pull request as ready for review April 29, 2026 04:44
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 89%

✓ Correctness

This PR implements a well-designed mechanism for OpenTelemetry span nesting in streaming operations. It uses pull context managers to activate spans around each anext call, leveraging the lazy resolution of from_awaitable() streams to ensure child spans (chat, HTTP) are correctly parented under outer spans (agent invoke). The error handling for span cleanup on failure is properly addressed. The implementation is correct and the tests are comprehensive.

✓ Security Reliability

This PR adds proper OTel span parenting for streaming operations via a per-pull context manager mechanism, and fixes span leaks when stream construction fails. The implementation correctly uses attach/detach within the same async context to avoid cross-context cleanup issues. Error handling paths properly close spans and reset context variables. The ExitStack in anext ensures symmetrical enter/exit even on exceptions. No significant security or reliability issues found.

✓ Test Coverage

The PR adds comprehensive tests for the new span nesting behavior and with_pull_context_manager mechanism. The streaming error paths (chat and agent) are both tested. The main gap is that _resolve_stream_with_pull_contexts is tested only via the __await__ path but not via get_final_response() without prior iteration—a distinct code path that the diff explicitly modifies. All other new behaviors have meaningful test coverage with proper assertions.

✓ Design Approach

The new per-pull context manager is a good direction for spans created during streaming iteration, but the approach is still too narrow: both streaming wrappers leave the synchronous stream-construction phase outside the span context. Since this code explicitly supports direct ResponseStream returns, any child spans or context-sensitive work done before the first pull still won't be parented correctly. Wrapping super_get_response() / execute() in _activate_span(span) as well would address the full lifecycle instead of only the iterator pulls.


Automated review by TaoChenOSU's agents

Comment thread python/packages/core/tests/core/test_observability.py
Comment thread python/packages/core/agent_framework/observability.py
Comment thread python/packages/core/agent_framework/observability.py
@moonbox3 moonbox3 added this pull request to the merge queue Apr 29, 2026
Merged via the queue into main with commit 03e47b5 Apr 29, 2026
35 checks passed
moonbox3 added a commit to moonbox3/agent-framework that referenced this pull request Apr 29, 2026
Add the streaming-span observability fix to the Fixed section. PR is on
upstream/main but not yet pulled into origin/main; the code itself will
land via the PR merge.
moonbox3 added a commit that referenced this pull request Apr 29, 2026
* Python: bump package versions for 1.2.2 release

PATCH bump (1.2.1 -> 1.2.2) for the released cohort. Five PRs land in this
window:

- agent-framework-openai: fix file_search citations breaking the assistant-
  message history roundtrip (#5557) — drives the released-tier PATCH
- agent-framework-orchestrations: [BREAKING] standardize orchestration
  terminal outputs as AgentResponse (#5301)
- agent-framework-core, agent-framework-declarative: preserve Workflow.run()
  shared state across calls, accept list[Message] in declarative start
  executor, and coerce Enum values when serializing PowerFx symbols (#5531)
- agent-framework-foundry-hosting: add hosted Durable Workflow support
  (#5531)
- agent-framework-azure-contentunderstanding: new alpha package — Azure AI
  Content Understanding context provider (#4829)
- dependencies: workspace package dependency refresh (#5555)

Per lockstep convention, all 21 beta packages stamp 1.0.0b260429 and all 4
alpha packages (now including the new contentunderstanding) stamp
1.0.0a260429. Date stamp reflects 2026-04-29 Pacific. Every non-core package
floor on agent-framework-core is raised to >=1.2.2; the new
contentunderstanding package's stale >=1.0.0 floor is brought into line.

Two follow-on fixes bundled to keep validate-dependency-bounds-test green
at lowest-direct resolution:
- Bump agent-framework-azure-contentunderstanding's azure-ai-content
  understanding lower bound from >=1.0.0 to >=1.0.1 (1.0.0 ships without
  proper typing — pyright reports 65 unknown-type errors)
- Add pyright ignore comments to core/foundry/__init__.pyi for the new
  alpha package's type-stub imports, since alpha packages are not in
  core's [all] extra and therefore aren't installed at lowest-direct

* Python: add #5552 to 1.2.2 CHANGELOG

Add the streaming-span observability fix to the Fixed section. PR is on
upstream/main but not yet pulled into origin/main; the code itself will
land via the PR merge.

* Python: address PR #5561 review feedback on dependency bounds

Two packaging fixes flagged in review:

1. agent-framework-azure-contentunderstanding: add agent-framework-foundry
   as a runtime dependency. The package's README directs users to
   `pip install agent-framework-azure-contentunderstanding --pre` and the
   basic example imports `FoundryChatClient` from `agent_framework.foundry`,
   so the documented install path was failing with ImportError. Pulling
   agent-framework-foundry into deps makes the advertised entry path
   self-contained.

2. agent-framework-foundry: bump agent-framework-openai lower bound from
   >=1.1.0 to >=1.2.2,<2. Foundry imports private modules from
   agent_framework_openai (`_chat_client.py:22`, `_agent.py:34`), so
   resolvers were free to pair foundry==1.2.2 with older OpenAI versions
   that lack this release's coordinated Responses/history fix. Lockstep the
   floor with the released cohort to prevent mismatched installs.

Both changes pass `validate-dependency-bounds-test` lower + upper at
their respective packages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants