tracing: support tracing also agent steps#13221
Conversation
Adds support to trace (with phoenix) also into agent steps/decisions.
d6ba567 to
9352406
Compare
WalkthroughThis PR enhances Arize Phoenix tracing in Langflow by introducing explicit OpenTelemetry context management through ChangesOpenTelemetry LangChain Tracing Integration
🎯 4 (Complex) | ⏱️ ~60 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 3 warnings)
✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/backend/tests/unit/services/tracing/test_arize_phoenix.py (1)
28-62: 🏗️ Heavy liftConsider reducing manual state setup through dependency injection.
The fixture bypasses
ArizePhoenixTracer's normal initialization by patchingsetup_arize_phoenixand manually setting 10+ internal state variables. While this works for isolating the tests from external Phoenix dependencies, it means tests run against artificially constructed state rather than the real initialization flow. Additionally, mockingpropagatorwithMagicMock()(line 59) might hide real integration issues if propagator is core logic rather than an external dependency.For improved testability, consider refactoring
ArizePhoenixTracerto accept key dependencies (provider, tracer, propagator) through constructor injection, allowing tests to pass in-memory test doubles without manual state manipulation.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/backend/tests/unit/services/tracing/test_arize_phoenix.py` around lines 28 - 62, The phoenix_tracer fixture is bypassing ArizePhoenixTracer's real initialization by patching setup_arize_phoenix and manually setting many internals (root_span, child_spans, _context_tokens, _current_component_id, _langchain_instrumentor_enabled, _ready, propagator, carrier) which couples tests to internal state; refactor ArizePhoenixTracer to accept key dependencies via constructor parameters (e.g., tracer_provider/provider, tracer, propagator, exporter or a config object) so tests can inject the in-memory provider/exporter and a real or lightweight propagator, then update the phoenix_tracer fixture to construct ArizePhoenixTracer with those injected dependencies instead of patching setup_arize_phoenix and mutating internals.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/backend/tests/unit/services/tracing/test_arize_phoenix.py`:
- Around line 64-181: Add negative/unit tests to
src/backend/tests/unit/services/tracing/test_arize_phoenix.py that exercise
error paths: verify calling end_trace(trace_id) without a prior
add_trace(trace_id) does not crash and returns/handles error as expected; assert
activate_component_span(nonexistent_trace_id) returns None or raises the
documented exception; simulate PhoenixCallbackHandler usage where span creation
fails (e.g., mock tracer.child_spans or tracer.start_span to raise) and assert
callback methods (on_tool_start/on_tool_end/on_agent_action) handle the failure
gracefully; and add tests for malformed inputs to add_trace (invalid
trace_type/empty trace_id) asserting validation behavior. Reference the
functions/methods add_trace, end_trace, activate_component_span,
PhoenixCallbackHandler, and on_tool_start/on_tool_end/on_agent_action when
adding these new test cases.
---
Nitpick comments:
In `@src/backend/tests/unit/services/tracing/test_arize_phoenix.py`:
- Around line 28-62: The phoenix_tracer fixture is bypassing
ArizePhoenixTracer's real initialization by patching setup_arize_phoenix and
manually setting many internals (root_span, child_spans, _context_tokens,
_current_component_id, _langchain_instrumentor_enabled, _ready, propagator,
carrier) which couples tests to internal state; refactor ArizePhoenixTracer to
accept key dependencies via constructor parameters (e.g.,
tracer_provider/provider, tracer, propagator, exporter or a config object) so
tests can inject the in-memory provider/exporter and a real or lightweight
propagator, then update the phoenix_tracer fixture to construct
ArizePhoenixTracer with those injected dependencies instead of patching
setup_arize_phoenix and mutating internals.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e2754cad-4ec4-48f4-bab3-d074f5a4e8f5
📒 Files selected for processing (4)
src/backend/base/langflow/services/tracing/arize_phoenix.pysrc/backend/base/langflow/services/tracing/phoenix_callback.pysrc/backend/base/langflow/services/tracing/service.pysrc/backend/tests/unit/services/tracing/test_arize_phoenix.py
| def test_add_trace_parents_to_root_span(phoenix_tracer): | ||
| tracer, exporter = phoenix_tracer | ||
| trace_id = "agent-vertex-1" | ||
|
|
||
| tracer.add_trace( | ||
| trace_id=trace_id, | ||
| trace_name="Agent (agent-vertex-1)", | ||
| trace_type="agent", | ||
| inputs={"input_value": "hello"}, | ||
| ) | ||
|
|
||
| assert trace_id in tracer.child_spans | ||
| spans = exporter.get_finished_spans() | ||
| assert len(spans) == 0 # component span still open | ||
|
|
||
| tracer.end_trace(trace_id=trace_id, trace_name="Agent (agent-vertex-1)", outputs={"response": "hi"}) | ||
| finished = exporter.get_finished_spans() | ||
| assert len(finished) == 1 | ||
| assert finished[0].name == "Agent (agent-vertex-1)" | ||
| assert finished[0].parent.span_id == tracer.root_span.get_span_context().span_id | ||
|
|
||
|
|
||
| def test_activate_component_span_sets_current_context(phoenix_tracer): | ||
| tracer, _exporter = phoenix_tracer | ||
| trace_id = "agent-vertex-2" | ||
|
|
||
| tracer.add_trace( | ||
| trace_id=trace_id, | ||
| trace_name="Agent", | ||
| trace_type="agent", | ||
| inputs={}, | ||
| ) | ||
| token = tracer.activate_component_span(trace_id) | ||
| assert token is not None | ||
| current = trace.get_current_span() | ||
| assert current.get_span_context().span_id == tracer.child_spans[trace_id].get_span_context().span_id | ||
|
|
||
| tracer.deactivate_component_span(trace_id) | ||
| current_after = trace.get_current_span() | ||
| assert current_after.get_span_context().span_id != tracer.child_spans[trace_id].get_span_context().span_id | ||
|
|
||
|
|
||
| def test_get_langchain_callback_returns_handler_when_component_context_set(phoenix_tracer): | ||
| tracer, _exporter = phoenix_tracer | ||
| trace_id = "agent-vertex-3" | ||
|
|
||
| tracer.add_trace(trace_id=trace_id, trace_name="Agent", trace_type="agent", inputs={}) | ||
| tracer.activate_component_span(trace_id) | ||
|
|
||
| component_context_var.set( | ||
| ComponentTraceContext( | ||
| trace_id=trace_id, | ||
| trace_name="Agent", | ||
| trace_type="agent", | ||
| vertex=None, | ||
| inputs={}, | ||
| ) | ||
| ) | ||
| callback = tracer.get_langchain_callback() | ||
| assert callback is not None | ||
| assert isinstance(callback, PhoenixCallbackHandler) | ||
| assert callback.parent_span is tracer.child_spans[trace_id] | ||
|
|
||
| tracer.deactivate_component_span(trace_id) | ||
| component_context_var.set(None) | ||
|
|
||
|
|
||
| def test_phoenix_callback_creates_nested_langchain_spans(phoenix_tracer): | ||
| tracer, exporter = phoenix_tracer | ||
| trace_id = "agent-vertex-4" | ||
|
|
||
| tracer.add_trace(trace_id=trace_id, trace_name="Agent", trace_type="agent", inputs={}) | ||
| tracer.activate_component_span(trace_id) | ||
|
|
||
| callback = PhoenixCallbackHandler(tracer, parent_span=tracer.child_spans[trace_id]) | ||
| run_id = uuid4() | ||
| callback.on_tool_start( | ||
| serialized={"name": "search"}, | ||
| input_str="query", | ||
| run_id=run_id, | ||
| parent_run_id=None, | ||
| ) | ||
| callback.on_tool_end(output="result", run_id=run_id) | ||
|
|
||
| component_span_id = tracer.child_spans[trace_id].get_span_context().span_id | ||
| tracer.deactivate_component_span(trace_id) | ||
| tracer.end_trace(trace_id=trace_id, trace_name="Agent", outputs={}) | ||
|
|
||
| finished = exporter.get_finished_spans() | ||
| tool_spans = [s for s in finished if s.name == "search"] | ||
| assert len(tool_spans) == 1 | ||
| assert tool_spans[0].attributes.get(SpanAttributes.OPENINFERENCE_SPAN_KIND) == "tool" | ||
| assert tool_spans[0].parent.span_id == component_span_id | ||
|
|
||
|
|
||
| def test_on_agent_action_creates_agent_span(phoenix_tracer): | ||
| tracer, exporter = phoenix_tracer | ||
| trace_id = "agent-vertex-5" | ||
|
|
||
| tracer.add_trace(trace_id=trace_id, trace_name="Agent", trace_type="agent", inputs={}) | ||
| tracer.activate_component_span(trace_id) | ||
|
|
||
| from langchain_classic.schema import AgentAction | ||
|
|
||
| callback = PhoenixCallbackHandler(tracer, parent_span=tracer.child_spans[trace_id]) | ||
| callback.on_agent_action( | ||
| AgentAction(tool="search", tool_input="weather", log=""), | ||
| run_id=uuid4(), | ||
| ) | ||
|
|
||
| tracer.deactivate_component_span(trace_id) | ||
| tracer.end_trace(trace_id=trace_id, trace_name="Agent", outputs={}) | ||
|
|
||
| finished = exporter.get_finished_spans() | ||
| action_spans = [s for s in finished if "Agent Action" in s.name] | ||
| assert len(action_spans) == 1 | ||
| assert action_spans[0].attributes.get(SpanAttributes.OPENINFERENCE_SPAN_KIND) == "agent" | ||
|
|
There was a problem hiding this comment.
Consider adding negative test scenarios for robustness.
The test suite currently covers only happy-path scenarios. To improve confidence in error handling and edge cases, consider adding tests for:
- Calling
end_trace()without a prioradd_trace()for the same trace_id - Calling
activate_component_span()with a non-existent trace_id - Callback error handling when span creation fails
- Invalid or malformed inputs to trace methods
These additions would help catch regressions in error paths and provide better documentation of expected failure behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/backend/tests/unit/services/tracing/test_arize_phoenix.py` around lines
64 - 181, Add negative/unit tests to
src/backend/tests/unit/services/tracing/test_arize_phoenix.py that exercise
error paths: verify calling end_trace(trace_id) without a prior
add_trace(trace_id) does not crash and returns/handles error as expected; assert
activate_component_span(nonexistent_trace_id) returns None or raises the
documented exception; simulate PhoenixCallbackHandler usage where span creation
fails (e.g., mock tracer.child_spans or tracer.start_span to raise) and assert
callback methods (on_tool_start/on_tool_end/on_agent_action) handle the failure
gracefully; and add tests for malformed inputs to add_trace (invalid
trace_type/empty trace_id) asserting validation behavior. Reference the
functions/methods add_trace, end_trace, activate_component_span,
PhoenixCallbackHandler, and on_tool_start/on_tool_end/on_agent_action when
adding these new test cases.
|
Shouldn't this be part of the generic trace infrastructure, not specific to Phoenix? You may want to take a look at my PR #12223 which factors out the common OpenTelemetry support code from the Arize/Phoenix and Langwatch exporters and adds a generic OTLP tracing exporter. If you adopted that, you could then build your additional tracing capabilities into the common API so all tracers would benefit. At the very least IMO your new API Also, if this adds lots of additional trace spans, consider guarding it behind a configuration option, especially since people using Arize, Phoenix, Langwatch, etc typically are not routing their traces through an opentelemetry collector that can do trace filtering and downsamping. Your PR might be easier to review and approve if it showed a before/after trace to help reviewers understand the utility and impact of the desired change. |
|
@ringerc I agree! |
Adds support to trace (with phoenix) also into agent steps/decisions.
Summary by CodeRabbit
New Features
Tests