Description
Summary
When running a multi-executor workflow with HITL pauses, a dropped-type-mismatch edge group, and per-row asyncio.gather work, MAF's executor
instrumentation intermittently fails to close executor.process spans. The spans are created (their span_ids appear as parent_id of child spans
inside the executor), but are never .end()-ed, never exported via SimpleSpanProcessor, and never appear in App Insights / OTLP collectors.
The visible symptom is broken trace trees: child spans (LLM calls, manually-instrumented gen_ai.* spans inside per-row work) appear in App
Insights with parent_id values pointing at executor.process spans that don't exist anywhere in the exported trace.
Environment
agent-framework==1.2.2
agent-framework-foundry==1.2.2
opentelemetry-sdk==1.40.0
azure-monitor-opentelemetry (latest as of 2026-04-29)
- Python 3.13
- Linux container (mcr.microsoft.com/devcontainers/python:3.13)
Verified NOT a duplicate of #5552
Tested both agent-framework==1.0.1 and agent-framework==1.2.2 — bug reproduces identically. Upstream fix #5552 ("Fix observability spans not
being correctly nested when using streaming") does not address this.
Reproduction
Workflow shape that triggers the bug:
- 14+ executors built via
WorkflowBuilder.
- One edge produces
dropped type mismatch (intentional misroute that delivery handler drops). In our case: prepare_contract_validation → ContractIntelligence (dispatch on a different edge actually delivers; the type-mismatch edge is the one that drops).
- One or more validators that send
HITLRequest via ctx.send_message() then return — workflow pauses, resumes after external response.
- Per-row work inside some validators uses
asyncio.gather over a Semaphore-bounded set of LLM calls.
- OTel instrumentation:
enable_instrumentation() called from FastAPI lifespan; BatchSpanProcessor to Azure Monitor + SimpleSpanProcessor to
ConsoleExporter as overlay (via OTEL_DEBUG_CONSOLE-style debug flag).
Run a representative input end-to-end. Inspect stdout for printed executor.process span bodies.
Expected behaviour
Every executor that processes a message during the workflow run should emit an executor.process <name> span via SimpleSpanProcessor (i.e.
printed to stdout) before workflow.run ends.
Actual behaviour
Several executor.process spans never appear in stdout despite their child spans being printed with valid parent_id references back to them.
Affected executors observed (varies between runs of the same input):
validate_unbilled_transactions (sync, no gather)
route_after_unbilled (sync, no gather)
validate_timecards (no LLM calls — pure dict aggregation)
validate_expenses (asyncio.gather + HITL)
validate_ap_invoices (asyncio.gather + HITL)
Early-pipeline executors (discover_and_prepare_files, extract_documents, persist_and_emit_result, prepare_contract_validation)
consistently DO emit their executor.process spans correctly. The break is specifically downstream of the first dropped-type-mismatch edge group.
Evidence
Excerpt from OTEL_DEBUG_CONSOLE=true stdout (parallelism = 6, single workflow run before HITL pause):
{
"name": "chat gpt-5.3-chat",
"context": {"trace_id": "0x185d...", "span_id": "0x89fe...d159"},
"parent_id": "0x27e049d02c35e8b0",
"...": "..."
}
The span_id 0x27e049d02c35e8b0 is the executor.process validate_expenses span. It is referenced as parent_id by 10 LLM-call child spans, but
never appears as a span_id in any printed JSON — neither before nor after workflow.run ends. The span is created and live during the
executor's handle() execution (otherwise child spans couldn't reference it as parent), but is never .end()-ed.
Ruled-out alternative explanations:
- Not BSP queue saturation. Tested with default queue (2048) and tuned queue (8192). Reproduces identically.
- Not OTel context propagation. All children correctly inherit
trace_id from the workflow root and reference correct parent_id values.
- Not asyncio.gather context loss. Reproduces in
route_after_unbilled and validate_unbilled_transactions which don't use gather.
- Not parallelism-driven. Reproduces at
AGENT4_VALIDATOR_CONCURRENCY=6 (lowest tested) just as readily as at 10 or 30.
- Not Anthropic/OpenAI SDK retries. The orphan spans are MAF's own
executor.process spans, not SDK spans.
Request
- Confirm or deny that this is a known bug.
- If unknown, please investigate the executor.process span lifecycle in MAF's workflow runner — specifically how span closure interacts with: (a)
HITL ctx.send_message(HITLRequest) paths that yield control to the runner, (b) executors downstream of a dropped type mismatch edge group,
and (c) sync (non-async) executors triggered by InternalEdgeGroup messages.
- If reproducible, fix or document the workaround.
Workaround currently in use
None viable. Application-side wrappers (tracer.start_as_current_span around each Executor.handle()) were considered and rejected — the bug
surfaces in executors with diverse shapes (gather, no-gather, sync, async), so a wrapper strategy can't be comprehensive without effectively
re-implementing enable_instrumentation(). We accept the trace gaps until upstream fix.
Code Sample
Error Messages / Stack Traces
Package Versions
agent-framework==1.2.2, agent-framework-foundry==1.2.2
Python Version
Python 3.13
Additional Context
No response
Description
Summary
When running a multi-executor workflow with HITL pauses, a dropped-type-mismatch edge group, and per-row
asyncio.gatherwork, MAF's executorinstrumentation intermittently fails to close
executor.processspans. The spans are created (their span_ids appear asparent_idof child spansinside the executor), but are never
.end()-ed, never exported viaSimpleSpanProcessor, and never appear in App Insights / OTLP collectors.The visible symptom is broken trace trees: child spans (LLM calls, manually-instrumented
gen_ai.*spans inside per-row work) appear in AppInsights with
parent_idvalues pointing at executor.process spans that don't exist anywhere in the exported trace.Environment
agent-framework==1.2.2agent-framework-foundry==1.2.2opentelemetry-sdk==1.40.0azure-monitor-opentelemetry(latest as of 2026-04-29)Verified NOT a duplicate of #5552
Tested both
agent-framework==1.0.1andagent-framework==1.2.2— bug reproduces identically. Upstream fix #5552 ("Fix observability spans notbeing correctly nested when using streaming") does not address this.
Reproduction
Workflow shape that triggers the bug:
WorkflowBuilder.dropped type mismatch(intentional misroute that delivery handler drops). In our case:prepare_contract_validation → ContractIntelligence(dispatch on a different edge actually delivers; the type-mismatch edge is the one that drops).HITLRequestviactx.send_message()then return — workflow pauses, resumes after external response.asyncio.gatherover aSemaphore-bounded set of LLM calls.enable_instrumentation()called from FastAPI lifespan;BatchSpanProcessorto Azure Monitor +SimpleSpanProcessortoConsoleExporter as overlay (via
OTEL_DEBUG_CONSOLE-style debug flag).Run a representative input end-to-end. Inspect stdout for printed
executor.processspan bodies.Expected behaviour
Every executor that processes a message during the workflow run should emit an
executor.process <name>span viaSimpleSpanProcessor(i.e.printed to stdout) before
workflow.runends.Actual behaviour
Several
executor.processspans never appear in stdout despite their child spans being printed with validparent_idreferences back to them.Affected executors observed (varies between runs of the same input):
validate_unbilled_transactions(sync, no gather)route_after_unbilled(sync, no gather)validate_timecards(no LLM calls — pure dict aggregation)validate_expenses(asyncio.gather + HITL)validate_ap_invoices(asyncio.gather + HITL)Early-pipeline executors (
discover_and_prepare_files,extract_documents,persist_and_emit_result,prepare_contract_validation)consistently DO emit their
executor.processspans correctly. The break is specifically downstream of the first dropped-type-mismatch edge group.Evidence
Excerpt from
OTEL_DEBUG_CONSOLE=truestdout (parallelism = 6, single workflow run before HITL pause):{ "name": "chat gpt-5.3-chat", "context": {"trace_id": "0x185d...", "span_id": "0x89fe...d159"}, "parent_id": "0x27e049d02c35e8b0", "...": "..." }The span_id
0x27e049d02c35e8b0is theexecutor.process validate_expensesspan. It is referenced asparent_idby 10 LLM-call child spans, butnever appears as a span_id in any printed JSON — neither before nor after
workflow.runends. The span is created and live during theexecutor's
handle()execution (otherwise child spans couldn't reference it as parent), but is never.end()-ed.Ruled-out alternative explanations:
trace_idfrom the workflow root and reference correctparent_idvalues.route_after_unbilledandvalidate_unbilled_transactionswhich don't use gather.AGENT4_VALIDATOR_CONCURRENCY=6(lowest tested) just as readily as at 10 or 30.executor.processspans, not SDK spans.Request
HITL
ctx.send_message(HITLRequest)paths that yield control to the runner, (b) executors downstream of adropped type mismatchedge group,and (c) sync (non-async) executors triggered by InternalEdgeGroup messages.
Workaround currently in use
None viable. Application-side wrappers (
tracer.start_as_current_spanaround eachExecutor.handle()) were considered and rejected — the bugsurfaces in executors with diverse shapes (gather, no-gather, sync, async), so a wrapper strategy can't be comprehensive without effectively
re-implementing
enable_instrumentation(). We accept the trace gaps until upstream fix.Code Sample
Error Messages / Stack Traces
Package Versions
agent-framework==1.2.2, agent-framework-foundry==1.2.2
Python Version
Python 3.13
Additional Context
No response