Skip to content

feat: Exporting traces to Traceloop and Instana#8209

Closed
2getsandesh wants to merge 52 commits into
langflow-ai:mainfrom
2getsandesh:main
Closed

feat: Exporting traces to Traceloop and Instana#8209
2getsandesh wants to merge 52 commits into
langflow-ai:mainfrom
2getsandesh:main

Conversation

@2getsandesh
Copy link
Copy Markdown
Contributor

@2getsandesh 2getsandesh commented May 26, 2025

This PR introduces support for exporting traces to TraceLoop by:

Adding a new module "traceloop.py" for TraceLoop-specific tracing configuration and instrumentation which also supports IBM Instana.
Initializing TraceLoop tracing from "service.py" to ensure it's active when LangFlow starts.
Including necessary dependencies for TraceLoop and OpenTelemetry in "pyproject.toml".
Adding comprehensive tests for TraceLoop integration.

📁 Changes Made

➕ langflow/src/backend/base/langflow/services/tracing/traceloop.py:
Contains the TraceLoop tracer setup using OpenTelemetry.
🔁 Modified service.py:
Initialized TraceLoop tracer via import and setup during tracing service startup.
🧩 Updated pyproject.toml:
Added dependencies required for TraceLoop integration (e.g., opentelemetry-sdk, opentelemetry-exporter-otlp, etc.).
🧪 Enhanced test_tracing_service.py:
Added unit tests for TraceLoop tracer initialization, configuration validation, and integration with the tracing service.

🎯 Purpose
This update enables LangFlow to export telemetry data (spans, traces) to TraceLoop and Instana for improved observability and debugging, laying the foundation for robust distributed tracing.

Screenshots of the Traceloop and Instana Dashboards:
Screenshot 2025-05-19 at 11 10 32 AM
Screenshot 2025-05-19 at 11 09 48 AM

Future Scope
For provider specific instrumentation, is it accepted to include Traceloop SDK?

Summary by CodeRabbit

  • New Features
    • Added integration with a new tracing system for enhanced trace collection and visualization.
  • Chores
    • Updated dependencies to support OpenTelemetry-based tracing.
  • Tests
    • Expanded test coverage to include the new tracing integration.

@github-actions github-actions Bot added the enhancement New feature or request label May 26, 2025
@2getsandesh 2getsandesh marked this pull request as ready for review May 26, 2025 07:10
@dosubot dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label May 26, 2025
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 26, 2025

CodSpeed Performance Report

Merging #8209 will degrade performances by 15.75%

Comparing 2getsandesh:main (1dee256) with main (aaf36c4)

Summary

❌ 2 regressions
✅ 17 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
test_build_flow_invalid_job_id 8.4 ms 10 ms -15.75%
test_build_flow_start_with_inputs 213.6 ms 240.9 ms -11.33%

@2getsandesh
Copy link
Copy Markdown
Contributor Author

CI is currently skipped because should-run-ci is evaluated as false. Could you please help me look into this? @ogabrielluiz

Comment thread src/backend/base/langflow/services/tracing/traceloop.py Outdated
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels May 28, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 28, 2025
@2getsandesh 2getsandesh requested a review from nirga May 28, 2025 16:09
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 29, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 29, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 30, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

"""

Walkthrough

A new tracing integration called "traceloop" was added to the tracing service, including a dedicated tracer implementation using OpenTelemetry. The service and its tests were updated to support this tracer. Four new OpenTelemetry-related dependencies were added to the project configuration to support the new functionality.

Changes

File(s) Change Summary
pyproject.toml Added four OpenTelemetry dependencies for tracing: instrumentation-langchain, exporter-otlp, sdk, api.
src/backend/base/langflow/services/tracing/service.py Added traceloop tracer initialization and helper method to the tracing service.
src/backend/base/langflow/services/tracing/traceloop.py Introduced new TraceloopTracer class implementing OpenTelemetry-based tracing integration.
src/backend/tests/unit/services/tracing/test_tracing_service.py Updated tests to mock and expect the new traceloop tracer in the tracing service.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant TracingService
    participant TraceloopTracer
    participant OpenTelemetry

    User->>TracingService: start_tracers()
    TracingService->>TraceloopTracer: _initialize_traceloop_tracer()
    TraceloopTracer->>OpenTelemetry: setup_traceloop()
    TracingService->>TracingService: Register traceloop tracer in context

    User->>TracingService: Trace events (add_trace, end_trace, end)
    TracingService->>TraceloopTracer: add_trace / end_trace / end
    TraceloopTracer->>OpenTelemetry: Create/finish spans, record attributes
Loading

Poem

In the warren of code, a tracer hops in,
With Traceloop and friends, let the spans now begin!
OpenTelemetry carrots, dependencies anew,
Our tests count more bunnies—one, two, three, a few!
From root to the leaves, every hop leaves a trail,
With each span and callback, our stories unveil.
🥕
"""

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 30, 2025
@edwinjosechittilappilly
Copy link
Copy Markdown
Collaborator

@coderabbitai Please look into why the PR is effecting the language model component execution.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 2, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@edwinjosechittilappilly, I'll investigate the performance impact of the TraceLoop integration on language model component execution. Let me analyze the failing tests and the tracing implementation.


🏁 Script executed:

#!/bin/bash
# First, let's find and examine the failing test files
fd "test_build_flow" --type f --exec cat {} \; | head -100

# Check if there are any build_flow related tests
rg "test_build_flow" -A 20 -B 5

Length of output: 14439


🏁 Script executed:

#!/bin/bash
# Let's examine the TraceLoop implementation to understand the performance impact
cat "src/backend/base/langflow/services/tracing/traceloop.py" | head -50

# Check how tracing is initialized in the service
rg "setup_traceloop\|initialize_traceloop" -A 10 -B 5

Length of output: 1665


🏁 Script executed:

#!/bin/bash
# Let's look at the specific instrumentation of LangChain
rg "opentelemetry-instrumentation-langchain\|LangChainInstrumentor" -A 5 -B 5

# Check if there's any automatic instrumentation happening
rg "instrument\(\)" -A 5 -B 5

Length of output: 111


🏁 Script executed:

#!/bin/bash
# Let's examine what components are being traced and how spans are created
rg "add_trace\|start_span\|create_span" -A 10 -B 5 --type py

# Check for any performance-critical code in the TraceLoop tracer
rg "_convert_to_traceloop\|_safe_json_dumps" -A 10 -B 5

Length of output: 120


🏁 Script executed:

#!/bin/bash
# Let's look at the tracing service initialization to understand when tracing starts
cat "src/backend/base/langflow/services/tracing/service.py" | grep -A 20 -B 5 "start_tracers\|_initialize_traceloop"

# Check if tracing is always enabled or conditional
rg "tracing.*deactivated\|TRACING_ENABLED\|tracing.*enabled" -A 5 -B 5

Length of output: 3795


🏁 Script executed:

#!/bin/bash
# Let's examine the full TraceLoop tracer implementation to understand the performance bottlenecks
cat "src/backend/base/langflow/services/tracing/traceloop.py"

Length of output: 14433


🏁 Script executed:

#!/bin/bash
# Let's check what triggers the tracing in the build flow process
rg "start_tracers\|TracingService" -A 5 -B 5 | head -50

# Let's also see how the build flow works
rg "build.*flow\|flow.*build" -A 5 -B 5 --type py | head -50

Length of output: 120


🏁 Script executed:

#!/bin/bash
# Let's check if there's any LangChain instrumentation in the TraceLoop tracer
rg "LangChainInstrumentor\|instrument" src/backend/base/langflow/services/tracing/traceloop.py -A 10 -B 5

# Let's also check the imports in the file
head -30 src/backend/base/langflow/services/tracing/traceloop.py

Length of output: 1183


🏁 Script executed:

#!/bin/bash
# Let's look at the setup_traceloop method specifically
rg "setup_traceloop" -A 20 -B 5 src/backend/base/langflow/services/tracing/traceloop.py

# Check for any expensive operations in the tracer
rg "json\.dumps\|json\.loads\|_safe_json_dumps\|_convert_to_traceloop" -A 10 -B 5 src/backend/base/langflow/services/tracing/traceloop.py

Length of output: 2432


Based on my investigation of the performance degradation, I can identify several key reasons why this PR is affecting language model component execution:

Root Cause Analysis

1. Automatic LangChain Instrumentation

The primary culprit is in the setup_traceloop() method:

from opentelemetry.instrumentation.langchain import LangchainInstrumentor
LangchainInstrumentor().instrument(tracer_provider=self.tracer_provider, skip_dep_check=True)

This automatically instruments ALL LangChain operations, adding tracing overhead to every language model call, embedding, retriever, or any LangChain component execution.

2. Tracer Initialization Overhead

The TraceLoop tracer initializes on every flow build, which includes:

  • Creating TracerProvider and Resource objects
  • Setting up span processors (BatchSpanProcessor for Traceloop, SimpleSpanProcessor for Instana)
  • Starting root spans and setting multiple attributes
  • The expensive LangChain instrumentation setup

3. Per-Component Tracing Overhead

Each component execution now triggers:

  • Span creation via add_trace()
  • JSON serialization of inputs/outputs via _safe_json_dumps()
  • Data type conversion via _convert_to_traceloop_types()
  • Span completion with metadata processing

4. Always-On Tracing

The tracing appears to be enabled by default unless explicitly deactivated via settings, affecting all flow builds.

Performance Impact Breakdown

From the CodSpeed report:

  • test_build_flow_invalid_job_id: 15.75% slower (8.4ms → 10ms)
  • test_build_flow_start_with_inputs: 11.33% slower (213.6ms → 240.9ms)

The language model components are particularly affected because they're being traced at multiple levels - both the LangFlow component level and the underlying LangChain operation level.

Recommended Solutions

  1. Make tracing conditional - Only enable when environment variables are present
  2. Lazy instrumentation - Only instrument LangChain when tracing is actually needed
  3. Optimize data serialization - Reduce JSON processing overhead
  4. Consider using simpler span processors for development/testing environments

The automatic LangChain instrumentation is the most significant contributor to the performance degradation, as it wraps every LangChain operation with tracing logic that wasn't present before this PR.


🧠 Learnings used
Learnt from: 2getsandesh
PR: langflow-ai/langflow#8209
File: src/backend/base/langflow/services/tracing/traceloop.py:184-188
Timestamp: 2025-06-03T09:20:27.781Z
Learning: In LangFlow's TraceLoop integration, setting span.kind attributes with mixed types (SpanKind enum objects for "prompt", lowercase strings for others) works correctly with OpenTelemetry exporters and doesn't cause issues in practice.

Learnt from: 2getsandesh
PR: langflow-ai/langflow#8209
File: src/backend/base/langflow/services/tracing/traceloop.py:184-188
Timestamp: 2025-06-03T08:26:39.319Z
Learning: In LangFlow's tracing integrations (TraceLoop and Arize Phoenix), there is an established pattern of special-casing the "prompt" trace type with different span kind mappings compared to other trace types.

@2getsandesh
Copy link
Copy Markdown
Contributor Author

@coderabbitai can you please compare the "traceloop.py" file with "arizephoenix.py" and re-analyse why this PR is affecting language model component execution? I've tried to follow the same approach as "arizephoenix.py" to implement "traceloop.py".

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 3, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@2getsandesh, let me compare both implementations to understand the differences that might be causing the performance impact.


🏁 Script executed:

#!/bin/bash
# Read the arizephoenix.py file to compare with traceloop.py
echo "=== ARIZE PHOENIX IMPLEMENTATION ==="
cat "src/backend/base/langflow/services/tracing/arizephoenix.py"

echo -e "\n\n=== TRACELOOP IMPLEMENTATION ==="
cat "src/backend/base/langflow/services/tracing/traceloop.py"

Length of output: 14756


🏁 Script executed:

#!/bin/bash
# Search for Phoenix-related tracing files
echo "=== Searching for Phoenix tracing files ==="
fd -t f "phoenix" src/backend/base/langflow/services/tracing/
fd -t f "arize" src/backend/base/langflow/services/tracing/

# List all files in the tracing directory
echo -e "\n=== All tracing files ==="
ls -la src/backend/base/langflow/services/tracing/

# Search for Phoenix references in the codebase
echo -e "\n=== Phoenix references in tracing service ==="
rg -i "phoenix|arize" src/backend/base/langflow/services/tracing/ -A 3 -B 3

Length of output: 20233


🏁 Script executed:

#!/bin/bash
# Read the arize_phoenix.py file to compare with traceloop.py
cat "src/backend/base/langflow/services/tracing/arize_phoenix.py"

Length of output: 15331


@2getsandesh, excellent point! After comparing both implementations, I found the critical difference that explains the performance degradation. You're right that both implementations follow similar patterns, but there's one key difference in the setup logic:

Root Cause: Conditional vs Always-On LangChain Instrumentation

Arize Phoenix (No Performance Impact)

def setup_arize_phoenix(self) -> bool:
    # Check if tracing is actually enabled
    if not (enable_arize_tracing or enable_phoenix_tracing):
        return False  # ✅ EARLY EXIT - No instrumentation happens
    
    # Only proceed with setup and instrumentation if tracing is enabled
    try:
        # ... setup tracer provider ...
        LangChainInstrumentor().instrument(tracer_provider=self.tracer_provider, skip_dep_check=True)

TraceLoop (Causes Performance Impact)

def setup_traceloop(self) -> bool:
    try:
        # ❌ ALWAYS sets up tracer provider and resource
        # ❌ ALWAYS instruments LangChain regardless of env vars
        LangchainInstrumentor().instrument(tracer_provider=self.tracer_provider, skip_dep_check=True)
    except ImportError:
        return False
    return True  # ❌ Always returns True even if no exporters configured

The Problem

TraceLoop always instruments LangChain even when no TRACELOOP_API_KEY or OTEL_EXPORTER_OTLP_ENDPOINT environment variables are set. This means:

  1. Every LangChain operation gets traced (language models, embeddings, retrievers, etc.)
  2. Span creation overhead is added to all operations
  3. JSON serialization happens for all inputs/outputs
  4. The tracer remains "ready" (self._ready = True) even when not needed

Phoenix only instruments when tracing is actually enabled, so in test environments with no environment variables, there's zero tracing overhead.

The Fix

You need to add the same conditional logic as Phoenix:

def setup_traceloop(self) -> bool:
    # Add this early exit logic like Phoenix
    traceloop_api_key = os.getenv("TRACELOOP_API_KEY")
    instana_baseurl = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT")
    
    if not (traceloop_api_key or instana_baseurl):
        return False  # Early exit - no instrumentation
    
    try:
        # Only proceed with setup if tracing is actually configured
        # ... rest of setup ...

This simple change should eliminate the 11-15% performance degradation in environments without tracing configuration, making TraceLoop behave exactly like Phoenix.

The performance impact you're seeing is entirely due to unnecessary LangChain instrumentation in test/development environments where no tracing backends are configured.


🧠 Learnings used
Learnt from: 2getsandesh
PR: langflow-ai/langflow#8209
File: src/backend/base/langflow/services/tracing/traceloop.py:184-188
Timestamp: 2025-06-03T08:26:39.319Z
Learning: In LangFlow's tracing integrations (TraceLoop and Arize Phoenix), there is an established pattern of special-casing the "prompt" trace type with different span kind mappings compared to other trace types.

Learnt from: 2getsandesh
PR: langflow-ai/langflow#8209
File: src/backend/base/langflow/services/tracing/traceloop.py:184-188
Timestamp: 2025-06-03T09:20:27.781Z
Learning: In LangFlow's TraceLoop integration, setting span.kind attributes with mixed types (SpanKind enum objects for "prompt", lowercase strings for others) works correctly with OpenTelemetry exporters and doesn't cause issues in practice.

Learnt from: 2getsandesh
PR: langflow-ai/langflow#8209
File: src/backend/base/langflow/services/tracing/traceloop.py:304-305
Timestamp: 2025-06-05T02:29:01.638Z
Learning: In LangFlow's tracing services, the pattern `traceback.format_exception(error)` followed by inclusion in f-strings is used consistently across all tracer implementations (langsmith.py, arize_phoenix.py, traceloop.py). This should not be flagged as a bug since it's the established codebase pattern.

Learnt from: 2getsandesh
PR: langflow-ai/langflow#8209
File: src/backend/base/langflow/services/tracing/traceloop.py:304-305
Timestamp: 2025-06-05T02:29:01.638Z
Learning: In LangFlow's tracing services, the pattern `traceback.format_exception(error)` followed by inclusion in f-strings is used consistently across all tracer implementations (langsmith.py, arize_phoenix.py, traceloop.py). This should not be flagged as a bug since it's the established codebase pattern.

@2getsandesh
Copy link
Copy Markdown
Contributor Author

@mfortman11 Hi, Could you please re-review this PR?

Copy link
Copy Markdown
Collaborator

@mfortman11 mfortman11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edwinjosechittilappilly
Copy link
Copy Markdown
Collaborator

@2getsandesh Please fix the backend tests in this PR.

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
26.0% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@2getsandesh
Copy link
Copy Markdown
Contributor Author

Closing this PR as it is outdated. The same functionality, along with extended metrics support, has been implemented in PR #9317

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants