Skip to content

feat: Redis-backed job queue for multi-worker deployments#12588

Merged
jordanrfrazier merged 25 commits into
langflow-ai:release-1.10.0from
severfire:feat_redis_job_queue
May 13, 2026
Merged

feat: Redis-backed job queue for multi-worker deployments#12588
jordanrfrazier merged 25 commits into
langflow-ai:release-1.10.0from
severfire:feat_redis_job_queue

Conversation

@severfire
Copy link
Copy Markdown
Contributor

@severfire severfire commented Apr 9, 2026

Summary

Adds an optional Redis-backed job queue so flow build events work when multiple Gunicorn/Uvicorn workers handle the start request and later poll/stream requests on different processes. Keeps the default in-memory asyncio queue unchanged.

Problem

With workers > 1, the in-memory JobQueueService is per-process, so a job started on one worker can hit JobQueueNotFoundError (or lose events) when the client talks to another worker. This change routes queue traffic through Redis Streams so any worker can consume the same job stream.

What changed

  • RedisJobQueueService and RedisQueueWrapper: producer path bridges the existing asyncio.Queue + EventManager to Redis Streams (XADD); consumer path reads via XREAD into a local buffer so build.py keeps an asyncio.Queue-like API.
  • JobQueueServiceFactory: selects implementation from settings — job_queue_type == "redis" → Redis, otherwise existing in-memory service.
  • Settings (lfx): job_queue_type (asyncio | redis), redis_queue_* (host/port/db/url/ttl) with defaults aligned to separate queue DB from cache.
  • event_delivery validation: multi-worker no longer forces direct delivery when job_queue_type=redis, since cross-worker queue state is shared.
  • Ownership: register_job_owner / get_job_owner can use Redis for cross-worker auth checks.
  • Tests: fakeredis (async) unit tests for the Redis queue wrapper and service behavior without a real Redis.
  • Logging: Redis experimental warning for RedisCache is emitted once per run (with cross-process sentinel), avoiding log spam when Redis is used for multiple subsystems.
  • Docs: LANGFLOW_JOB_QUEUE_TYPE, LANGFLOW_REDIS_QUEUE_DB, LANGFLOW_GUNICORN_PRELOAD, and high-load guidance updated (current + versioned docs).

Configuration (operators)

Variable Role
LANGFLOW_JOB_QUEUE_TYPE=redis Enable Redis-backed queue
LANGFLOW_REDIS_QUEUE_URL or host/port Redis connection (queue-specific or fallback to general Redis settings)
LANGFLOW_REDIS_QUEUE_DB DB index for queue streams (default 1, separate from typical cache DB 0)
LANGFLOW_REDIS_QUEUE_TTL TTL for stream/owner keys

Known limitations (documented in code)

  • Cross-worker cancel of a build started on another worker is effectively a no-op / best-effort; true cancel would need an extra Redis signal checked inside the build loop.

How to test

uv run pytest src/backend/tests/unit/test_redis_job_queue_service.py

related:
#12364
#12587

@jordanrfrazier

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 9, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c3889771-d10e-4989-9d2d-aa02e1b82e94

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Apr 9, 2026
@severfire
Copy link
Copy Markdown
Contributor Author

@ogabrielluiz Hi, what do you think about it?

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Apr 16, 2026
@severfire severfire changed the base branch from release-1.9.0 to release-1.9.1 April 17, 2026 13:27
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Apr 17, 2026
@erichare erichare force-pushed the release-1.9.1 branch 2 times, most recently from 981fe5f to dc26d19 Compare April 24, 2026 01:07
@severfire
Copy link
Copy Markdown
Contributor Author

@erichare let me know if there is anything I can help here with :-)

@severfire
Copy link
Copy Markdown
Contributor Author

@ogabrielluiz would this feature would be useful?

@severfire severfire changed the base branch from release-1.9.1 to release-1.10.0 May 5, 2026 09:16
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 5, 2026
severfire added 3 commits May 7, 2026 11:53
- Refactored the job queue service to support Redis-backed management for cross-worker scaling.
- Added environment variables for configuration:
    - `LANGFLOW_JOB_QUEUE_TYPE=redis`
    - `LANGFLOW_REDIS_QUEUE_DB=1`
- Updated job ownership methods to be asynchronous for improved concurrency handling.
- Enhanced Redis cache service with namespacing via key prefixes.
- Introduced `fakeredis` for in-memory Redis simulation in testin>
- Added comprehensive unit tests for Redis job queue components.
- Introduced a mechanism to emit a one-time warning for the RedisCache experimental feature during server runtime.
- The warning is logged only if no other worker has already emitted it, ensuring clarity for users regarding the experimental status of RedisCache.
- The implementation includes a temporary file check to prevent multiple warnings across different processes.
- Added documentation for LANGFLOW_GUNICORN_PRELOAD to explain preloading for better performance.
- Detailed the use of LANGFLOW_JOB_QUEUE_TYPE for specifying backends (e.g., Redis).
- Included LANGFLOW_REDIS_QUEUE_DB to define the database index for job queues.
- Updated the "High-Load Environments" guide with these optimal configurations.
…ncellation and buffer management

- Introduced a structural protocol `_CancellableQueue` to ensure queues can handle cancellation properly during client disconnects.
- Updated `RedisQueueWrapper` to implement this protocol, allowing for graceful cancellation of background tasks.
- Added a maximum size limit to the internal buffer to prevent unbounded memory usage and ensure backpressure on slow consumers.
- Implemented a done callback to handle unexpected fill task crashes, ensuring consumers are not left hanging indefinitely.
- Enhanced unit tests to verify compliance with the new protocol and the behavior of the buffer under various conditions.
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 12, 2026
@severfire
Copy link
Copy Markdown
Contributor Author

@ogabrielluiz did some enhancements.

Copy link
Copy Markdown
Contributor

@ogabrielluiz ogabrielluiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all 10 review threads addressed with dedicated fixes, verified end-to-end against a two-worker harness on real Redis. 23/23 unit tests + 7/7 cross-worker scenarios pass on fbd774a.

The two blockers from the last pass are closed:

  • streaming/direct cross-worker no longer 404s (9cf34bf)
  • early-poll race resolved with the 30s _STARTUP_GRACE_S + _observed_stream flag (88b60cf)

The remaining cross-worker passive-disconnect case (worker B sees client leave but worker A keeps producing) is now an explicit, logged limitation rather than a silent no-op — happy to ship that as a follow-up PR with a Redis pubsub side-channel.

Nice work on the iteration!

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 12, 2026
@severfire
Copy link
Copy Markdown
Contributor Author

@ogabrielluiz thanks! I merged conflicts :-) ready to go. Thanks!

@ogabrielluiz ogabrielluiz force-pushed the feat_redis_job_queue branch from 244f455 to 86722e2 Compare May 12, 2026 19:08
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 12, 2026
…l response

- Deliver end-of-stream sentinel on fill-task cancellation (_on_fill_done now
  handles both cancelled and exception paths so consumers are never left hanging)
- Add _error_start time-bound to xread and exists() error loops: after
  _STARTUP_GRACE_S seconds of continuous Redis errors the sentinel is delivered
  instead of retrying forever
- Advance _last_id cursor only after buffer.put() succeeds so cancellation mid-put
  does not silently skip that message in the Redis cursor
- Return False from cancel_flow_build when event_task is None (cross-worker path)
  so the HTTP response correctly reports success=False instead of false success
Copy link
Copy Markdown
Collaborator

@jordanrfrazier jordanrfrazier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@severfire Awesome work again. Skimmed over and briefly tested manually, all seems good to me. Going to get this in our QA's field now.

I also added one commit -- it had some fixes that made sense in the explanation (in commit message). Please take a look and feel free to revert and update as you see fit.

Did you by chance test whether tracing continues to work as expected?

Comment thread docs/versioned_docs/version-1.8.0/Develop/environment-variables.mdx
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 13, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 13, 2026
@severfire
Copy link
Copy Markdown
Contributor Author

severfire commented May 13, 2026

@jordanrfrazier Hmm... To tell you the truth, I tested manually chat and ran automated tests that are in Langflow. Also some things might have been fixed in #13084 - I wonder if fixes should be done there or here. Did you see any issues with tracing?

@jordanrfrazier
Copy link
Copy Markdown
Collaborator

Did you see any issues with tracing?

I didn't test myself, it's just the first thing that I thought of that may not have been accounted for in a more distributed world. I'll make sure we get some manual tests done on this.

@severfire
Copy link
Copy Markdown
Contributor Author

@jordanrfrazier thank you very much! I will check it as well tonight. Btw - I am starting work on OAuth Accounts manager for Langflow :-) I would be awesome to connect to Google Cloud and others using OAuth

@jordanrfrazier jordanrfrazier added this pull request to the merge queue May 13, 2026
@severfire
Copy link
Copy Markdown
Contributor Author

@jordanrfrazier - Tracing works on my machine. tested with 3 chats running at once :-)
image

From what I see, as you queued to merge, you tested it as well :-) glad it worked!

Merged via the queue into langflow-ai:release-1.10.0 with commit ea29bcc May 13, 2026
103 of 104 checks passed
@severfire
Copy link
Copy Markdown
Contributor Author

oh, I think my update went 3 minutes too late! :-D Pull request merged. LOL.

ogabrielluiz added a commit that referenced this pull request May 14, 2026
Conflicts resolved by taking the PR (HEAD) side for the production-hardened
job_queue service, factory, settings, build/monitor APIs, and tests. The PR's
RedisJobQueueService is a deliberate superset of the version that landed via
#12588: it adds cross-worker cancel via PSUBSCRIBE, cancel-marker fallback,
dispatcher auto-reconnect, polling watchdog, ops metrics endpoint, and
client-disconnect propagation via signal_cancel.

Other resolutions:
- pyproject.toml: keep the Python 3.14 onnxruntime split from release-1.10.0
- docs env-variables.mdx: drop duplicate 'High-load and multi-worker' heading
- uv.lock: regenerated against the merged pyproject.toml
ogabrielluiz added a commit that referenced this pull request May 14, 2026
…rted delivery

- RedisQueueWrapper: restore _BUFFER_MAXSIZE bounded buffer and the
  _on_fill_done done-callback safety net that release-1.10.0 added in #12588,
  so a slow consumer cannot grow the buffer without bound and a crashing or
  cancelled _fill_task cannot leave consumers stuck on await get().
- build.get_flow_events_response: explicit exhaustiveness guard. Unknown
  EventDeliveryType values now return HTTP 400 with the supported set and a
  remediation hint instead of silently falling through to the polling path.
- lfx settings.set_event_delivery: when workers > 1 without a redis queue,
  upgrade the warning to name the requested mode, the forced fallback, and
  the LANGFLOW_JOB_QUEUE_TYPE env var that would preserve the original mode.
- Tests: port the three RedisQueueWrapper safety tests from release-1.10.0
  and add coverage for the new event_delivery guard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants