tracing: support tracing also agent steps by schuellerf · Pull Request #13221 · langflow-ai/langflow

schuellerf · 2026-05-19T19:56:55Z

Adds support to trace (with phoenix) also into agent steps/decisions.

Summary by CodeRabbit

New Features
- Enhanced tracing with explicit component-span context management for improved observability
- New callback handler for comprehensive LangChain operation tracing and nested span creation
- Configurable LangChain instrumentation via environment variable control
Tests
- Added comprehensive unit tests for tracing functionality, context management, and span handling

Adds support to trace (with phoenix) also into agent steps/decisions.

coderabbitai · 2026-05-19T19:57:21Z

Walkthrough

This PR enhances Arize Phoenix tracing in Langflow by introducing explicit OpenTelemetry context management through ArizePhoenixTracer, adding PhoenixCallbackHandler to emit nested OTEL spans for LangChain lifecycle events, integrating component span activation into TracingService, and providing unit tests to validate the complete flow.

Changes

OpenTelemetry LangChain Tracing Integration

Layer / File(s)	Summary
ArizePhoenixTracer - OTEL context infrastructure `src/backend/base/langflow/services/tracing/arize_phoenix.py`	Imports OpenTelemetry `context` and `trace` modules, adds instance state for tracking context tokens and active component spans, and implements `activate_component_span()`, `deactivate_component_span()`, `start_langchain_span()`, and `end_langchain_span()` methods to manage nested span lifecycles with proper parent context and attributes.
ArizePhoenixTracer - Setup, span parenting, and callback mechanism `src/backend/base/langflow/services/tracing/arize_phoenix.py`	Updates `setup_arize_phoenix()` to conditionally enable LangChain instrumentation based on `ARIZE_PHOENIX_USE_INSTRUMENTOR` environment flag, changes `add_trace()` span parenting to use the root span as parent context, extends `end_trace()` to safely remove child spans and clear component markers, makes `end()` uninstrumentation conditional, and implements `get_langchain_callback()` to return `PhoenixCallbackHandler` when prerequisites are met.
PhoenixCallbackHandler - LangChain to OTEL span bridge `src/backend/base/langflow/services/tracing/phoenix_callback.py`	Implements `PhoenixCallbackHandler` to translate LangChain lifecycle events (`on_llm_start`, `on_chain_start`, `on_tool_start`, `on_agent_action`, `on_retriever_start`, etc.) into nested OpenTelemetry spans. Manages run-to-span tracking, resolves parent spans from run hierarchy or explicit parent, assigns span kinds (llm, chain, agent, tool, retriever), extracts structured inputs/outputs, sets LLM model name attributes, and handles completion or error closure.
TracingService - Component span activation and lifecycle `src/backend/base/langflow/services/tracing/service.py`	Moves component span initialization to synchronous execution so context attaches on the current task. Adds component span activation via `ArizePhoenixTracer.activate_component_span()` before yielding, and ensures deactivation on both success and exception paths before queuing `_end_component_traces`. Exception path deactivates before re-raising.
Unit tests - Fixtures and behavior validation `src/backend/tests/unit/services/tracing/test_arize_phoenix.py`	Provides in-memory OpenTelemetry tracer provider and exporter fixtures and a `phoenix_tracer` fixture that patches environment and mocks setup. Tests verify span parenting to root span, context activation/deactivation correctness, callback handler creation with component context, nested tool and agent span emission with correct span kinds and parentage, and instrumentor disabled-by-default behavior.

🎯 4 (Complex) | ⏱️ ~60 minutes

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 3 warnings)

Check name	Status	Explanation	Resolution
Test Coverage For New Implementations	❌ Error	Test coverage is incomplete: 2 new ArizePhoenixTracer methods and 12 of 15 callback methods in PhoenixCallbackHandler lack tests.	Add tests for start_langchain_span, end_langchain_span, and the untested callback methods (on_llm_start, on_chain_start, on_retriever_start, error handlers) to achieve comprehensive coverage.
Docstring Coverage	⚠️ Warning	Docstring coverage is 31.82% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Quality And Coverage	⚠️ Warning	Only 6 tests cover 15 implemented callback methods with no error/edge case testing. Start/end_langchain_span methods completely untested. Review comment already flagged missing negative scenarios.	Add tests for: error callbacks (on_llm_error, on_chain_error, etc.), agent_finish, retriever methods, start/end_langchain_span, edge cases (non-existent trace_id, end_trace without add_trace), and invalid inputs.
Test File Naming And Structure	⚠️ Warning	Test file follows backend pytest naming and structure correctly with descriptive names and fixtures, but lacks edge cases, error conditions, and negative scenarios (zero pytest.raises tests).	Add negative tests using pytest.raises for: end_trace() without prior add_trace(), activate_component_span() with non-existent trace_id, and callback error handling failures.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'tracing: support tracing also agent steps' directly reflects the main change: adding Phoenix tracing support for agent steps/decisions. It is concise, clear, and accurately summarizes the primary objective.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Excessive Mock Usage Warning	✅ Passed	Only 2 substantive mocks for external dependencies; real OpenTelemetry objects test actual span creation and context behavior without excessive mocking or obscuring logic.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/backend/tests/unit/services/tracing/test_arize_phoenix.py (1)
28-62: 🏗️ Heavy lift

Consider reducing manual state setup through dependency injection.

The fixture bypasses ArizePhoenixTracer's normal initialization by patching setup_arize_phoenix and manually setting 10+ internal state variables. While this works for isolating the tests from external Phoenix dependencies, it means tests run against artificially constructed state rather than the real initialization flow. Additionally, mocking propagator with MagicMock() (line 59) might hide real integration issues if propagator is core logic rather than an external dependency.

For improved testability, consider refactoring ArizePhoenixTracer to accept key dependencies (provider, tracer, propagator) through constructor injection, allowing tests to pass in-memory test doubles without manual state manipulation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/backend/tests/unit/services/tracing/test_arize_phoenix.py` around lines
28 - 62, The phoenix_tracer fixture is bypassing ArizePhoenixTracer's real
initialization by patching setup_arize_phoenix and manually setting many
internals (root_span, child_spans, _context_tokens, _current_component_id,
_langchain_instrumentor_enabled, _ready, propagator, carrier) which couples
tests to internal state; refactor ArizePhoenixTracer to accept key dependencies
via constructor parameters (e.g., tracer_provider/provider, tracer, propagator,
exporter or a config object) so tests can inject the in-memory provider/exporter
and a real or lightweight propagator, then update the phoenix_tracer fixture to
construct ArizePhoenixTracer with those injected dependencies instead of
patching setup_arize_phoenix and mutating internals.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/backend/tests/unit/services/tracing/test_arize_phoenix.py`:
- Around line 64-181: Add negative/unit tests to
src/backend/tests/unit/services/tracing/test_arize_phoenix.py that exercise
error paths: verify calling end_trace(trace_id) without a prior
add_trace(trace_id) does not crash and returns/handles error as expected; assert
activate_component_span(nonexistent_trace_id) returns None or raises the
documented exception; simulate PhoenixCallbackHandler usage where span creation
fails (e.g., mock tracer.child_spans or tracer.start_span to raise) and assert
callback methods (on_tool_start/on_tool_end/on_agent_action) handle the failure
gracefully; and add tests for malformed inputs to add_trace (invalid
trace_type/empty trace_id) asserting validation behavior. Reference the
functions/methods add_trace, end_trace, activate_component_span,
PhoenixCallbackHandler, and on_tool_start/on_tool_end/on_agent_action when
adding these new test cases.

---

Nitpick comments:
In `@src/backend/tests/unit/services/tracing/test_arize_phoenix.py`:
- Around line 28-62: The phoenix_tracer fixture is bypassing
ArizePhoenixTracer's real initialization by patching setup_arize_phoenix and
manually setting many internals (root_span, child_spans, _context_tokens,
_current_component_id, _langchain_instrumentor_enabled, _ready, propagator,
carrier) which couples tests to internal state; refactor ArizePhoenixTracer to
accept key dependencies via constructor parameters (e.g.,
tracer_provider/provider, tracer, propagator, exporter or a config object) so
tests can inject the in-memory provider/exporter and a real or lightweight
propagator, then update the phoenix_tracer fixture to construct
ArizePhoenixTracer with those injected dependencies instead of patching
setup_arize_phoenix and mutating internals.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e2754cad-4ec4-48f4-bab3-d074f5a4e8f5

📥 Commits

Reviewing files that changed from the base of the PR and between 500b5e9 and 9352406.

📒 Files selected for processing (4)

src/backend/base/langflow/services/tracing/arize_phoenix.py
src/backend/base/langflow/services/tracing/phoenix_callback.py
src/backend/base/langflow/services/tracing/service.py
src/backend/tests/unit/services/tracing/test_arize_phoenix.py

coderabbitai · 2026-05-19T20:04:28Z

+def test_add_trace_parents_to_root_span(phoenix_tracer):
+    tracer, exporter = phoenix_tracer
+    trace_id = "agent-vertex-1"
+
+    tracer.add_trace(
+        trace_id=trace_id,
+        trace_name="Agent (agent-vertex-1)",
+        trace_type="agent",
+        inputs={"input_value": "hello"},
+    )
+
+    assert trace_id in tracer.child_spans
+    spans = exporter.get_finished_spans()
+    assert len(spans) == 0  # component span still open
+
+    tracer.end_trace(trace_id=trace_id, trace_name="Agent (agent-vertex-1)", outputs={"response": "hi"})
+    finished = exporter.get_finished_spans()
+    assert len(finished) == 1
+    assert finished[0].name == "Agent (agent-vertex-1)"
+    assert finished[0].parent.span_id == tracer.root_span.get_span_context().span_id
+
+
+def test_activate_component_span_sets_current_context(phoenix_tracer):
+    tracer, _exporter = phoenix_tracer
+    trace_id = "agent-vertex-2"
+
+    tracer.add_trace(
+        trace_id=trace_id,
+        trace_name="Agent",
+        trace_type="agent",
+        inputs={},
+    )
+    token = tracer.activate_component_span(trace_id)
+    assert token is not None
+    current = trace.get_current_span()
+    assert current.get_span_context().span_id == tracer.child_spans[trace_id].get_span_context().span_id
+
+    tracer.deactivate_component_span(trace_id)
+    current_after = trace.get_current_span()
+    assert current_after.get_span_context().span_id != tracer.child_spans[trace_id].get_span_context().span_id
+
+
+def test_get_langchain_callback_returns_handler_when_component_context_set(phoenix_tracer):
+    tracer, _exporter = phoenix_tracer
+    trace_id = "agent-vertex-3"
+
+    tracer.add_trace(trace_id=trace_id, trace_name="Agent", trace_type="agent", inputs={})
+    tracer.activate_component_span(trace_id)
+
+    component_context_var.set(
+        ComponentTraceContext(
+            trace_id=trace_id,
+            trace_name="Agent",
+            trace_type="agent",
+            vertex=None,
+            inputs={},
+        )
+    )
+    callback = tracer.get_langchain_callback()
+    assert callback is not None
+    assert isinstance(callback, PhoenixCallbackHandler)
+    assert callback.parent_span is tracer.child_spans[trace_id]
+
+    tracer.deactivate_component_span(trace_id)
+    component_context_var.set(None)
+
+
+def test_phoenix_callback_creates_nested_langchain_spans(phoenix_tracer):
+    tracer, exporter = phoenix_tracer
+    trace_id = "agent-vertex-4"
+
+    tracer.add_trace(trace_id=trace_id, trace_name="Agent", trace_type="agent", inputs={})
+    tracer.activate_component_span(trace_id)
+
+    callback = PhoenixCallbackHandler(tracer, parent_span=tracer.child_spans[trace_id])
+    run_id = uuid4()
+    callback.on_tool_start(
+        serialized={"name": "search"},
+        input_str="query",
+        run_id=run_id,
+        parent_run_id=None,
+    )
+    callback.on_tool_end(output="result", run_id=run_id)
+
+    component_span_id = tracer.child_spans[trace_id].get_span_context().span_id
+    tracer.deactivate_component_span(trace_id)
+    tracer.end_trace(trace_id=trace_id, trace_name="Agent", outputs={})
+
+    finished = exporter.get_finished_spans()
+    tool_spans = [s for s in finished if s.name == "search"]
+    assert len(tool_spans) == 1
+    assert tool_spans[0].attributes.get(SpanAttributes.OPENINFERENCE_SPAN_KIND) == "tool"
+    assert tool_spans[0].parent.span_id == component_span_id
+
+
+def test_on_agent_action_creates_agent_span(phoenix_tracer):
+    tracer, exporter = phoenix_tracer
+    trace_id = "agent-vertex-5"
+
+    tracer.add_trace(trace_id=trace_id, trace_name="Agent", trace_type="agent", inputs={})
+    tracer.activate_component_span(trace_id)
+
+    from langchain_classic.schema import AgentAction
+
+    callback = PhoenixCallbackHandler(tracer, parent_span=tracer.child_spans[trace_id])
+    callback.on_agent_action(
+        AgentAction(tool="search", tool_input="weather", log=""),
+        run_id=uuid4(),
+    )
+
+    tracer.deactivate_component_span(trace_id)
+    tracer.end_trace(trace_id=trace_id, trace_name="Agent", outputs={})
+
+    finished = exporter.get_finished_spans()
+    action_spans = [s for s in finished if "Agent Action" in s.name]
+    assert len(action_spans) == 1
+    assert action_spans[0].attributes.get(SpanAttributes.OPENINFERENCE_SPAN_KIND) == "agent"
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Consider adding negative test scenarios for robustness.

The test suite currently covers only happy-path scenarios. To improve confidence in error handling and edge cases, consider adding tests for:

Calling end_trace() without a prior add_trace() for the same trace_id

Calling activate_component_span() with a non-existent trace_id

Callback error handling when span creation fails

Invalid or malformed inputs to trace methods

These additions would help catch regressions in error paths and provide better documentation of expected failure behavior.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/backend/tests/unit/services/tracing/test_arize_phoenix.py` around lines 64 - 181, Add negative/unit tests to src/backend/tests/unit/services/tracing/test_arize_phoenix.py that exercise error paths: verify calling end_trace(trace_id) without a prior add_trace(trace_id) does not crash and returns/handles error as expected; assert activate_component_span(nonexistent_trace_id) returns None or raises the documented exception; simulate PhoenixCallbackHandler usage where span creation fails (e.g., mock tracer.child_spans or tracer.start_span to raise) and assert callback methods (on_tool_start/on_tool_end/on_agent_action) handle the failure gracefully; and add tests for malformed inputs to add_trace (invalid trace_type/empty trace_id) asserting validation behavior. Reference the functions/methods add_trace, end_trace, activate_component_span, PhoenixCallbackHandler, and on_tool_start/on_tool_end/on_agent_action when adding these new test cases.

ringerc · 2026-05-19T20:32:52Z

Shouldn't this be part of the generic trace infrastructure, not specific to Phoenix?

You may want to take a look at my PR #12223 which factors out the common OpenTelemetry support code from the Arize/Phoenix and Langwatch exporters and adds a generic OTLP tracing exporter.

If you adopted that, you could then build your additional tracing capabilities into the common API so all tracers would benefit.

At the very least IMO your new API activate_component_span and deactivate_component_span should be provided as no-op stubs for all tracer implementations rather than gated by hasattr at the call site, unless there is an unavoidable need in which case it should be clearly explained.

Also, if this adds lots of additional trace spans, consider guarding it behind a configuration option, especially since people using Arize, Phoenix, Langwatch, etc typically are not routing their traces through an opentelemetry collector that can do trace filtering and downsamping.

Your PR might be easier to review and approve if it showed a before/after trace to help reviewers understand the utility and impact of the desired change.

schuellerf · 2026-05-20T08:53:31Z

@ringerc I agree!
for me the tracing into the agent's decisions just makes sense to get some overview but sure increases the amount of data. Where would you put the configuration? Environment variables? There is no good section in the UI to control this, I'm afraid.
Doing this in common code makes sense for sure. I was just into testing phoenix and didn't connect the dots to make this generic 🫣.
I'll check if I can work directly ontop of your PR (which I saw but, for some reason didn't want to extend with this)

tracing: support tracing also agent steps

9352406

Adds support to trace (with phoenix) also into agent steps/decisions.

schuellerf force-pushed the support-tracing-of-agent-steps branch from d6ba567 to 9352406 Compare May 19, 2026 19:57

schuellerf mentioned this pull request May 19, 2026

tracing: support tracing also agent steps #13222

Closed

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tracing: support tracing also agent steps#13221

tracing: support tracing also agent steps#13221
schuellerf wants to merge 1 commit into
langflow-ai:mainfrom
schuellerf:support-tracing-of-agent-steps

schuellerf commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Pre-merge checks failed

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 19, 2026

Uh oh!

ringerc commented May 19, 2026 •

edited

Loading

Uh oh!

schuellerf commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

schuellerf commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Pre-merge checks failed

❌ Failed checks (1 error, 3 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

ringerc commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

schuellerf commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

schuellerf commented May 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading

ringerc commented May 19, 2026 •

edited

Loading