Skip to content

tracing: support tracing also agent steps#13221

Open
schuellerf wants to merge 1 commit into
langflow-ai:mainfrom
schuellerf:support-tracing-of-agent-steps
Open

tracing: support tracing also agent steps#13221
schuellerf wants to merge 1 commit into
langflow-ai:mainfrom
schuellerf:support-tracing-of-agent-steps

Conversation

@schuellerf
Copy link
Copy Markdown
Contributor

@schuellerf schuellerf commented May 19, 2026

Adds support to trace (with phoenix) also into agent steps/decisions.

Summary by CodeRabbit

  • New Features

    • Enhanced tracing with explicit component-span context management for improved observability
    • New callback handler for comprehensive LangChain operation tracing and nested span creation
    • Configurable LangChain instrumentation via environment variable control
  • Tests

    • Added comprehensive unit tests for tracing functionality, context management, and span handling

Review Change Stack

Adds support to trace (with phoenix) also into agent steps/decisions.
@schuellerf schuellerf force-pushed the support-tracing-of-agent-steps branch from d6ba567 to 9352406 Compare May 19, 2026 19:57
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

Walkthrough

This PR enhances Arize Phoenix tracing in Langflow by introducing explicit OpenTelemetry context management through ArizePhoenixTracer, adding PhoenixCallbackHandler to emit nested OTEL spans for LangChain lifecycle events, integrating component span activation into TracingService, and providing unit tests to validate the complete flow.

Changes

OpenTelemetry LangChain Tracing Integration

Layer / File(s) Summary
ArizePhoenixTracer - OTEL context infrastructure
src/backend/base/langflow/services/tracing/arize_phoenix.py
Imports OpenTelemetry context and trace modules, adds instance state for tracking context tokens and active component spans, and implements activate_component_span(), deactivate_component_span(), start_langchain_span(), and end_langchain_span() methods to manage nested span lifecycles with proper parent context and attributes.
ArizePhoenixTracer - Setup, span parenting, and callback mechanism
src/backend/base/langflow/services/tracing/arize_phoenix.py
Updates setup_arize_phoenix() to conditionally enable LangChain instrumentation based on ARIZE_PHOENIX_USE_INSTRUMENTOR environment flag, changes add_trace() span parenting to use the root span as parent context, extends end_trace() to safely remove child spans and clear component markers, makes end() uninstrumentation conditional, and implements get_langchain_callback() to return PhoenixCallbackHandler when prerequisites are met.
PhoenixCallbackHandler - LangChain to OTEL span bridge
src/backend/base/langflow/services/tracing/phoenix_callback.py
Implements PhoenixCallbackHandler to translate LangChain lifecycle events (on_llm_start, on_chain_start, on_tool_start, on_agent_action, on_retriever_start, etc.) into nested OpenTelemetry spans. Manages run-to-span tracking, resolves parent spans from run hierarchy or explicit parent, assigns span kinds (llm, chain, agent, tool, retriever), extracts structured inputs/outputs, sets LLM model name attributes, and handles completion or error closure.
TracingService - Component span activation and lifecycle
src/backend/base/langflow/services/tracing/service.py
Moves component span initialization to synchronous execution so context attaches on the current task. Adds component span activation via ArizePhoenixTracer.activate_component_span() before yielding, and ensures deactivation on both success and exception paths before queuing _end_component_traces. Exception path deactivates before re-raising.
Unit tests - Fixtures and behavior validation
src/backend/tests/unit/services/tracing/test_arize_phoenix.py
Provides in-memory OpenTelemetry tracer provider and exporter fixtures and a phoenix_tracer fixture that patches environment and mocks setup. Tests verify span parenting to root span, context activation/deactivation correctness, callback handler creation with component context, nested tool and agent span emission with correct span kinds and parentage, and instrumentor disabled-by-default behavior.

🎯 4 (Complex) | ⏱️ ~60 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 3 warnings)

Check name Status Explanation Resolution
Test Coverage For New Implementations ❌ Error Test coverage is incomplete: 2 new ArizePhoenixTracer methods and 12 of 15 callback methods in PhoenixCallbackHandler lack tests. Add tests for start_langchain_span, end_langchain_span, and the untested callback methods (on_llm_start, on_chain_start, on_retriever_start, error handlers) to achieve comprehensive coverage.
Docstring Coverage ⚠️ Warning Docstring coverage is 31.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Quality And Coverage ⚠️ Warning Only 6 tests cover 15 implemented callback methods with no error/edge case testing. Start/end_langchain_span methods completely untested. Review comment already flagged missing negative scenarios. Add tests for: error callbacks (on_llm_error, on_chain_error, etc.), agent_finish, retriever methods, start/end_langchain_span, edge cases (non-existent trace_id, end_trace without add_trace), and invalid inputs.
Test File Naming And Structure ⚠️ Warning Test file follows backend pytest naming and structure correctly with descriptive names and fixtures, but lacks edge cases, error conditions, and negative scenarios (zero pytest.raises tests). Add negative tests using pytest.raises for: end_trace() without prior add_trace(), activate_component_span() with non-existent trace_id, and callback error handling failures.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'tracing: support tracing also agent steps' directly reflects the main change: adding Phoenix tracing support for agent steps/decisions. It is concise, clear, and accurately summarizes the primary objective.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Excessive Mock Usage Warning ✅ Passed Only 2 substantive mocks for external dependencies; real OpenTelemetry objects test actual span creation and context behavior without excessive mocking or obscuring logic.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/backend/tests/unit/services/tracing/test_arize_phoenix.py (1)

28-62: 🏗️ Heavy lift

Consider reducing manual state setup through dependency injection.

The fixture bypasses ArizePhoenixTracer's normal initialization by patching setup_arize_phoenix and manually setting 10+ internal state variables. While this works for isolating the tests from external Phoenix dependencies, it means tests run against artificially constructed state rather than the real initialization flow. Additionally, mocking propagator with MagicMock() (line 59) might hide real integration issues if propagator is core logic rather than an external dependency.

For improved testability, consider refactoring ArizePhoenixTracer to accept key dependencies (provider, tracer, propagator) through constructor injection, allowing tests to pass in-memory test doubles without manual state manipulation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/backend/tests/unit/services/tracing/test_arize_phoenix.py` around lines
28 - 62, The phoenix_tracer fixture is bypassing ArizePhoenixTracer's real
initialization by patching setup_arize_phoenix and manually setting many
internals (root_span, child_spans, _context_tokens, _current_component_id,
_langchain_instrumentor_enabled, _ready, propagator, carrier) which couples
tests to internal state; refactor ArizePhoenixTracer to accept key dependencies
via constructor parameters (e.g., tracer_provider/provider, tracer, propagator,
exporter or a config object) so tests can inject the in-memory provider/exporter
and a real or lightweight propagator, then update the phoenix_tracer fixture to
construct ArizePhoenixTracer with those injected dependencies instead of
patching setup_arize_phoenix and mutating internals.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/backend/tests/unit/services/tracing/test_arize_phoenix.py`:
- Around line 64-181: Add negative/unit tests to
src/backend/tests/unit/services/tracing/test_arize_phoenix.py that exercise
error paths: verify calling end_trace(trace_id) without a prior
add_trace(trace_id) does not crash and returns/handles error as expected; assert
activate_component_span(nonexistent_trace_id) returns None or raises the
documented exception; simulate PhoenixCallbackHandler usage where span creation
fails (e.g., mock tracer.child_spans or tracer.start_span to raise) and assert
callback methods (on_tool_start/on_tool_end/on_agent_action) handle the failure
gracefully; and add tests for malformed inputs to add_trace (invalid
trace_type/empty trace_id) asserting validation behavior. Reference the
functions/methods add_trace, end_trace, activate_component_span,
PhoenixCallbackHandler, and on_tool_start/on_tool_end/on_agent_action when
adding these new test cases.

---

Nitpick comments:
In `@src/backend/tests/unit/services/tracing/test_arize_phoenix.py`:
- Around line 28-62: The phoenix_tracer fixture is bypassing
ArizePhoenixTracer's real initialization by patching setup_arize_phoenix and
manually setting many internals (root_span, child_spans, _context_tokens,
_current_component_id, _langchain_instrumentor_enabled, _ready, propagator,
carrier) which couples tests to internal state; refactor ArizePhoenixTracer to
accept key dependencies via constructor parameters (e.g.,
tracer_provider/provider, tracer, propagator, exporter or a config object) so
tests can inject the in-memory provider/exporter and a real or lightweight
propagator, then update the phoenix_tracer fixture to construct
ArizePhoenixTracer with those injected dependencies instead of patching
setup_arize_phoenix and mutating internals.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e2754cad-4ec4-48f4-bab3-d074f5a4e8f5

📥 Commits

Reviewing files that changed from the base of the PR and between 500b5e9 and 9352406.

📒 Files selected for processing (4)
  • src/backend/base/langflow/services/tracing/arize_phoenix.py
  • src/backend/base/langflow/services/tracing/phoenix_callback.py
  • src/backend/base/langflow/services/tracing/service.py
  • src/backend/tests/unit/services/tracing/test_arize_phoenix.py

Comment on lines +64 to +181
def test_add_trace_parents_to_root_span(phoenix_tracer):
tracer, exporter = phoenix_tracer
trace_id = "agent-vertex-1"

tracer.add_trace(
trace_id=trace_id,
trace_name="Agent (agent-vertex-1)",
trace_type="agent",
inputs={"input_value": "hello"},
)

assert trace_id in tracer.child_spans
spans = exporter.get_finished_spans()
assert len(spans) == 0 # component span still open

tracer.end_trace(trace_id=trace_id, trace_name="Agent (agent-vertex-1)", outputs={"response": "hi"})
finished = exporter.get_finished_spans()
assert len(finished) == 1
assert finished[0].name == "Agent (agent-vertex-1)"
assert finished[0].parent.span_id == tracer.root_span.get_span_context().span_id


def test_activate_component_span_sets_current_context(phoenix_tracer):
tracer, _exporter = phoenix_tracer
trace_id = "agent-vertex-2"

tracer.add_trace(
trace_id=trace_id,
trace_name="Agent",
trace_type="agent",
inputs={},
)
token = tracer.activate_component_span(trace_id)
assert token is not None
current = trace.get_current_span()
assert current.get_span_context().span_id == tracer.child_spans[trace_id].get_span_context().span_id

tracer.deactivate_component_span(trace_id)
current_after = trace.get_current_span()
assert current_after.get_span_context().span_id != tracer.child_spans[trace_id].get_span_context().span_id


def test_get_langchain_callback_returns_handler_when_component_context_set(phoenix_tracer):
tracer, _exporter = phoenix_tracer
trace_id = "agent-vertex-3"

tracer.add_trace(trace_id=trace_id, trace_name="Agent", trace_type="agent", inputs={})
tracer.activate_component_span(trace_id)

component_context_var.set(
ComponentTraceContext(
trace_id=trace_id,
trace_name="Agent",
trace_type="agent",
vertex=None,
inputs={},
)
)
callback = tracer.get_langchain_callback()
assert callback is not None
assert isinstance(callback, PhoenixCallbackHandler)
assert callback.parent_span is tracer.child_spans[trace_id]

tracer.deactivate_component_span(trace_id)
component_context_var.set(None)


def test_phoenix_callback_creates_nested_langchain_spans(phoenix_tracer):
tracer, exporter = phoenix_tracer
trace_id = "agent-vertex-4"

tracer.add_trace(trace_id=trace_id, trace_name="Agent", trace_type="agent", inputs={})
tracer.activate_component_span(trace_id)

callback = PhoenixCallbackHandler(tracer, parent_span=tracer.child_spans[trace_id])
run_id = uuid4()
callback.on_tool_start(
serialized={"name": "search"},
input_str="query",
run_id=run_id,
parent_run_id=None,
)
callback.on_tool_end(output="result", run_id=run_id)

component_span_id = tracer.child_spans[trace_id].get_span_context().span_id
tracer.deactivate_component_span(trace_id)
tracer.end_trace(trace_id=trace_id, trace_name="Agent", outputs={})

finished = exporter.get_finished_spans()
tool_spans = [s for s in finished if s.name == "search"]
assert len(tool_spans) == 1
assert tool_spans[0].attributes.get(SpanAttributes.OPENINFERENCE_SPAN_KIND) == "tool"
assert tool_spans[0].parent.span_id == component_span_id


def test_on_agent_action_creates_agent_span(phoenix_tracer):
tracer, exporter = phoenix_tracer
trace_id = "agent-vertex-5"

tracer.add_trace(trace_id=trace_id, trace_name="Agent", trace_type="agent", inputs={})
tracer.activate_component_span(trace_id)

from langchain_classic.schema import AgentAction

callback = PhoenixCallbackHandler(tracer, parent_span=tracer.child_spans[trace_id])
callback.on_agent_action(
AgentAction(tool="search", tool_input="weather", log=""),
run_id=uuid4(),
)

tracer.deactivate_component_span(trace_id)
tracer.end_trace(trace_id=trace_id, trace_name="Agent", outputs={})

finished = exporter.get_finished_spans()
action_spans = [s for s in finished if "Agent Action" in s.name]
assert len(action_spans) == 1
assert action_spans[0].attributes.get(SpanAttributes.OPENINFERENCE_SPAN_KIND) == "agent"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Consider adding negative test scenarios for robustness.

The test suite currently covers only happy-path scenarios. To improve confidence in error handling and edge cases, consider adding tests for:

  • Calling end_trace() without a prior add_trace() for the same trace_id
  • Calling activate_component_span() with a non-existent trace_id
  • Callback error handling when span creation fails
  • Invalid or malformed inputs to trace methods

These additions would help catch regressions in error paths and provide better documentation of expected failure behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/backend/tests/unit/services/tracing/test_arize_phoenix.py` around lines
64 - 181, Add negative/unit tests to
src/backend/tests/unit/services/tracing/test_arize_phoenix.py that exercise
error paths: verify calling end_trace(trace_id) without a prior
add_trace(trace_id) does not crash and returns/handles error as expected; assert
activate_component_span(nonexistent_trace_id) returns None or raises the
documented exception; simulate PhoenixCallbackHandler usage where span creation
fails (e.g., mock tracer.child_spans or tracer.start_span to raise) and assert
callback methods (on_tool_start/on_tool_end/on_agent_action) handle the failure
gracefully; and add tests for malformed inputs to add_trace (invalid
trace_type/empty trace_id) asserting validation behavior. Reference the
functions/methods add_trace, end_trace, activate_component_span,
PhoenixCallbackHandler, and on_tool_start/on_tool_end/on_agent_action when
adding these new test cases.

@ringerc
Copy link
Copy Markdown

ringerc commented May 19, 2026

Shouldn't this be part of the generic trace infrastructure, not specific to Phoenix?

You may want to take a look at my PR #12223 which factors out the common OpenTelemetry support code from the Arize/Phoenix and Langwatch exporters and adds a generic OTLP tracing exporter.

If you adopted that, you could then build your additional tracing capabilities into the common API so all tracers would benefit.

At the very least IMO your new API activate_component_span and deactivate_component_span should be provided as no-op stubs for all tracer implementations rather than gated by hasattr at the call site, unless there is an unavoidable need in which case it should be clearly explained.

Also, if this adds lots of additional trace spans, consider guarding it behind a configuration option, especially since people using Arize, Phoenix, Langwatch, etc typically are not routing their traces through an opentelemetry collector that can do trace filtering and downsamping.

Your PR might be easier to review and approve if it showed a before/after trace to help reviewers understand the utility and impact of the desired change.

@schuellerf
Copy link
Copy Markdown
Contributor Author

@ringerc I agree!
for me the tracing into the agent's decisions just makes sense to get some overview but sure increases the amount of data. Where would you put the configuration? Environment variables? There is no good section in the UI to control this, I'm afraid.
Doing this in common code makes sense for sure. I was just into testing phoenix and didn't connect the dots to make this generic 🫣.
I'll check if I can work directly ontop of your PR (which I saw but, for some reason didn't want to extend with this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants