Skip to content

feat(job_queue): export cancel_stats and active_jobs to OpenTelemetry#13123

Draft
ogabrielluiz wants to merge 1 commit into
feat/redis-job-queue-cancel-and-gracefrom
feat/job-queue-otel-instrumentation
Draft

feat(job_queue): export cancel_stats and active_jobs to OpenTelemetry#13123
ogabrielluiz wants to merge 1 commit into
feat/redis-job-queue-cancel-and-gracefrom
feat/job-queue-otel-instrumentation

Conversation

@ogabrielluiz
Copy link
Copy Markdown
Contributor

Summary

Wires the Redis job queue's cancel_stats and active_jobs into the existing OpenTelemetry instrumentation. When LANGFLOW_PROMETHEUS_ENABLED=true, the new /monitor/job_queue counters now flow to the Prometheus exporter alongside file_uploads / num_files_uploaded. Until now they only existed in the JSON snapshot — usable for ad-hoc debugging, useless for ongoing ops dashboards or alerting.

What's new

Two metrics registered in services/telemetry/opentelemetry.py:

  • langflow_job_queue_cancel_events_total — Counter, label event_type. One Counter covering all 11 cancel_stats keys (published, marker_hit, dispatched_owned, dispatched_foreign, publish_errors, dispatcher_reconnects, dispatcher_internal_errors, polling_watchdog_kills, activity_touch_errors, activity_get_errors, activity_parse_errors).
  • langflow_job_queue_active_jobs — UpDownCounter, label backend (memory or redis). +1 in JobQueueService.create_queue, -1 in cleanup_job.

A new _bump_cancel_stat helper on RedisJobQueueService is the sole mutation point for the dict, so the JSON snapshot and Prometheus stay in lockstep — no risk of forgetting to bump one or the other.

Reliability

The OTel emit path is best-effort:

  • Cached lazy resolver fetches the telemetry singleton on first use.
  • contextlib.suppress(Exception) wraps the emit so a broken or unavailable telemetry layer never propagates into the queue hot path.
  • Tests verify silent-failure behavior when the telemetry handle raises.

Stacks on #13084

Depends on the cancel_stats introduced there. Until #13084 merges, this PR's diff will show those commits too — only 89379bb46e is unique to this branch.

Wire the new Redis job queue counters into the existing langflow OTel
instrumentation so they flow to the Prometheus exporter when
LANGFLOW_PROMETHEUS_ENABLED=true. Until now cancel_stats lived only in
the JSON /monitor/job_queue snapshot — fine for ad-hoc inspection,
useless for ongoing ops dashboards or alerting.

Two new metrics are registered:

* langflow_job_queue_cancel_events_total (Counter, label: event_type) —
  one Counter per distinct event_type covering all eleven cancel_stats
  keys (published, marker_hit, dispatched_owned, dispatched_foreign,
  publish_errors, dispatcher_reconnects, dispatcher_internal_errors,
  polling_watchdog_kills, activity_touch_errors, activity_get_errors,
  activity_parse_errors).
* langflow_job_queue_active_jobs (UpDownCounter, label: backend) —
  bumped +1 in JobQueueService.create_queue and -1 in cleanup_job, so
  both the in-memory ("memory") and Redis ("redis") backends report.

A single _bump_cancel_stat helper on RedisJobQueueService is now the
sole mutation point for the cancel_stats dict, guaranteeing the JSON
snapshot and Prometheus stay in lockstep.

The OTel emit path is best-effort: a cached lazy resolver fetches the
telemetry singleton once, and contextlib.suppress wraps the emit so a
broken or unavailable telemetry layer never propagates into the queue
hot path.

Tests cover dict + counter parity, active_jobs delta on create/cleanup,
silent failure when telemetry is unavailable, and that every
cancel_stats key routes through the helper.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0ca64d81-623e-482a-b76a-38d012ee56a0

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/job-queue-otel-instrumentation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the enhancement New feature or request label May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant