Skip to content

fix: OpenTelemetry trace context propagation#12962

Merged
erichare merged 11 commits into
langflow-ai:release-1.10.0from
ringerc:upstream-otel-trace-context-propagation
May 11, 2026
Merged

fix: OpenTelemetry trace context propagation#12962
erichare merged 11 commits into
langflow-ai:release-1.10.0from
ringerc:upstream-otel-trace-context-propagation

Conversation

@ringerc
Copy link
Copy Markdown

@ringerc ringerc commented May 4, 2026

Add trace context propagation to outbound requests to Langflow, so that an incoming OpenTelemetry trace context is properly linked to outbound trace contexts.

Fixes #12961

This fix makes it possible to use a fake Langwatch or Arize endpoint to deliver traces to an otel-collector with working trace context propagation to downstream services.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 4, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eeb6690f-d205-497d-b8e5-77005f3e7358

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ringerc ringerc changed the base branch from main to release-1.10.0 May 4, 2026 02:14
@ringerc ringerc force-pushed the upstream-otel-trace-context-propagation branch 3 times, most recently from a268ec2 to 3ec806c Compare May 4, 2026 04:39
@Qodo-Free-For-OSS
Copy link
Copy Markdown

Hi, Both tracers instrument and uninstrument requests/urllib3 per tracer instance even though these instrumentors monkeypatch globally; ending one trace can disable propagation for other in-flight traces or other tracer instances still running.

Severity: action required | Category: reliability

How to fix: Make instrumentation process-scoped

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

RequestsInstrumentor/URLLib3Instrumentor patch globally, but tracers call instrument()/uninstrument() per trace. If multiple graph runs overlap, the first run to call end() will uninstrument globally and break context propagation for other active runs.

Issue Context

Instrumentation should be enabled once per process (or reference-counted), not tied to a single tracer instance’s lifecycle.

Fix Focus Areas

  • src/backend/base/langflow/services/tracing/langwatch.py[94-130]
  • src/backend/base/langflow/services/tracing/arize_phoenix.py[219-252]
  • src/backend/base/langflow/services/tracing/service.py[264-347]

Implementation notes

  • Introduce a shared instrumentation manager (module-level singleton) with ref-counting:
    • enable(tracer_provider) increments count and instruments only on first enable.
    • disable() decrements count and uninstrument only when count reaches zero.
  • Alternatively, instrument once at application startup (TracingService init) and never uninstrument; rely on OTel context to control propagation.
  • Ensure provider selection doesn’t unexpectedly change if already instrumented (document or keep first provider).

Qodo code review - free for open-source.

@Qodo-Free-For-OSS
Copy link
Copy Markdown

Hi, _uninstrument_http_clients suppresses all Exceptions, so failures to remove global monkeypatches are silent and leave the process in an unknown instrumentation state.

Severity: remediation recommended | Category: observability

How to fix: Log unexpected uninstrument errors

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

Uninstrumentation currently suppresses any exception, hiding unexpected failures and making debugging difficult.

Issue Context

It’s fine to suppress ImportError when optional instrumentation packages are missing, but other exceptions should be logged (at least debug/warn).

Fix Focus Areas

  • src/backend/base/langflow/services/tracing/arize_phoenix.py[240-252]
  • src/backend/base/langflow/services/tracing/langwatch.py[117-129]

Implementation notes

  • Replace contextlib.suppress(ImportError, Exception) with:
    • suppress(ImportError) for the import
    • try/except Exception as e around uninstrument() with logger.debug(...) or logger.warning(...) including exception info.
  • Keep behavior non-fatal (do not raise).

Found by Qodo code review

@ringerc
Copy link
Copy Markdown
Author

ringerc commented May 5, 2026

Those comments look reasonable. I'll look into it.

ringerc and others added 5 commits May 6, 2026 10:08
Add opentelemetry-instrumentation-requests and opentelemetry-instrumentation-urllib3
to enable W3C TraceContext propagation on outgoing HTTP calls.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instrument requests and urllib3 HTTP clients with the tracer provider
so that traceparent headers are injected on outgoing HTTP requests
(e.g., to LLM APIs). This enables end-to-end distributed tracing.

Both Arize Phoenix and LangWatch tracers now:
- Call _instrument_http_clients() during setup
- Call _uninstrument_http_clients() during cleanup
- Handle missing packages gracefully with debug logging

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Verify that:
- RequestsInstrumentor and URLLib3Instrumentor are called during tracer setup
- Instrumentors are uninstrumented during cleanup
- traceparent headers are injected with valid W3C format on outgoing requests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use contextlib.suppress instead of try-except-pass in uninstrument methods
- Fix nested with statements using parenthesized context managers
- Fix unused arguments by prefixing with underscore
- Add timeout to requests.get calls
- Skip integration tests that mock at wrong layer (Session.send bypasses OTel)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…f-counting

The OpenTelemetry RequestsInstrumentor and URLLib3Instrumentor monkeypatch
globally, but tracers were calling instrument()/uninstrument() per instance.
If multiple graph runs overlapped, the first run to end() would uninstrument
globally and break context propagation for other active runs.

Changes:
- Add HTTPClientInstrumentationManager singleton with reference counting
- enable() increments count, instruments only on first call
- disable() decrements count, uninstruments only when count reaches zero
- Replace silent exception suppression with proper logging for debugging

Addresses PR review feedback from Qodo.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ringerc ringerc force-pushed the upstream-otel-trace-context-propagation branch from 3d81468 to b6dd9dc Compare May 5, 2026 22:08
@ringerc
Copy link
Copy Markdown
Author

ringerc commented May 5, 2026

These revisions should address the concerns expressed; see b6dd9dc in particular.

LLM used to generate the shared context manager as Python isn't my string suit.

@ringerc ringerc changed the title Upstream otel trace context propagation OpenTelemetry trace context propagation May 6, 2026
@erichare erichare self-requested a review May 11, 2026 14:43
@erichare erichare changed the title OpenTelemetry trace context propagation fix: OpenTelemetry trace context propagation May 11, 2026
@github-actions github-actions Bot added the bug Something isn't working label May 11, 2026
…el-trace-context-propagation

# Conflicts:
#	uv.lock
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels May 11, 2026
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels May 11, 2026
Copy link
Copy Markdown
Collaborator

@erichare erichare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @ringerc !

@erichare erichare enabled auto-merge May 11, 2026 15:18
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels May 11, 2026
@erichare erichare disabled auto-merge May 11, 2026 15:21
@erichare erichare merged commit b3f3ba1 into langflow-ai:release-1.10.0 May 11, 2026
29 of 31 checks passed
@ringerc
Copy link
Copy Markdown
Author

ringerc commented May 12, 2026

Greatly appreciated @erichare

If you get a chance to look over my related PR to support generic OTLP protocol, that'd be great. I'll rebase it onto this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Langflow does not propagate trace context to downstream requests

3 participants