perf: Enhance Gunicorn preload functionality for Langflow#12778
perf: Enhance Gunicorn preload functionality for Langflow#12778jordanrfrazier merged 27 commits into
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@jordanrfrazier Hey Jordan! Regarding your comment on PR #12587 about a potential blog post—I’ve just added the memory usage benchmarks to that PR. I’ve identified a few more areas for memory optimization that I’d like to tackle first. I think the post would be much more impactful if we bundled all these improvements together. Specifically, I'd love your eyes on PR #12588 when you have a moment. I’ve documented the Memory Usage Results in that PR, but I’d like to hold off on the post until I finish a few more optimizations I’ve identified. I think we’ll have a much stronger narrative if we showcase a comprehensive "Memory & Stability" package. The StrategyI'm thinking we align this content with the v1.10 release. We can frame it as a deep dive into Langflow’s production readiness, specifically highlighting:
What do you think about bundling this for the 1.10 launch news? Later on I would also like to work bit on ISO 27001. It relates also to #12615 which is, in my opinion, important :-) Please let me know! Thanks! |
|
@erichare I did some research regarding my memory tests, I got some insights: Memory Reduction Analysis: v1.8.3 → v1.9.0Executive SummaryTest results show a dramatic 85% memory reduction from v1.8.3 (20.55 GB) to v1.9.0 (~3 GB) with 30 workers. Additionally, the preload on/off setting makes almost no difference in v1.9.0. Test Results
Key Observation: Preload on/off differs by only ~400MB (13% variation), not the multiple GB difference you'd expect from memory sharing. Root Cause: LangChain 1.0 UpgradeThe Critical CommitCommit: What Changed in LangChain 1.0Dependencies before (v1.8.3):
- "langchain~=0.3.27",
- "langchain-community>=0.3.28,<1.0.0",
- "langchain-core>=0.3.81,<1.0.0",
Dependencies after (v1.9.0):
+ "langchain~=1.2.0",
+ "langchain-community~=0.4.1",
+ "langchain-core>=1.2.28,<2.0.0",Key Architectural Changes in LangChain 1.0
Memory Impact CalculationWith 30 workers:
This suggests v1.8.3 was loading significantly more dependencies per worker, likely:
Other Contributing Factors1. Gunicorn Upgrade (v22 → v25)While no specific memory-related features were documented, the 3-version jump included:
2. SQLModel Upgrade (0.0.22 → 0.0.37)
3. Pydantic Upgrade (2.11.0 → 2.12.5)
4. LFX Upgrade (0.3.3 → 0.4.0)Relevant commit: This explicitly mentions lazy imports, suggesting LFX v0.4.0 introduced lazy loading optimizations. Why Preload Makes Little Difference in v1.9.0The @ogabrielluiz can you confirm, please? |
|
@severfire Perhaps a result of branching from release-1.9.1(?) but there are some unrelated commits in this PR. Maybe could do a force push to get rid of them, or they should be cleared out when we merge release-1.9.1 into main and then rebase onto release-1.10.0. Hopefully. |
|
@severfire Can you take a look at what claude found, if you agree with these issues? |
|
@jordanrfrazier I will investigate that :-) thank you! |
- Introduced a new preload module to optimize memory usage by running fork-safe initialization in the Gunicorn master process. - Updated the lifespan management in `main.py` to check if the master has preloaded resources, allowing workers to inherit state and skip redundant setup. - Adjusted the server loading process to accommodate the new preload logic, ensuring efficient resource management across worker processes.
1aa3606 to
8639a23
Compare
|
@jordanrfrazier - okay, I made order with my branch. Now I will investigate things you mentioned and I will try to address them. Thanks! |
…et leaks - Added a `teardown` method to the `RedisCache` class to close the Redis connection, addressing potential socket leaks during process forking. - Introduced unit tests to verify the functionality of the `teardown` method, ensuring it handles client closure and errors gracefully. - Tests cover scenarios including normal closure, error handling during closure, and teardown with URL-based connections.
…t leaks - Added a `teardown` method to the `RedisCache` class, ensuring proper closure of the Redis client connection before forking to avoid socket leaks. - Created comprehensive unit tests to validate the functionality of the `teardown` method, covering various scenarios including normal operation and error handling. - Updated existing tests to reflect the new teardown functionality and ensure RedisCache is recognized as an instance of `ExternalAsyncBaseCacheService`.
- Removed exception handling around the DB engine disposal to streamline the process, ensuring that the engine is disposed of without unnecessary error logging. This change enhances code clarity and maintains the intended functionality of resource management during the preload phase.
- Simplified the cache service socket closure process in the master preload function to prevent sharing across forks. This change enhances code clarity by removing unnecessary exception handling while maintaining the intended functionality of resource management during the preload phase.
… management - Added completion flags in the preload state to track the status of various initialization steps, including profile picture copying, starter project creation, agentic global variable initialization, MCP server configuration, and flow loading. - Updated the lifespan management in `main.py` to utilize these flags, allowing the system to skip redundant setup tasks if they have already been completed during the preload phase. - This enhancement improves resource management and ensures that the application behaves correctly in a multi-worker environment.
…irs function - Updated the lifespan management in `main.py` to utilize the new `get_owned_temp_dirs` function, which encapsulates the logic for determining temp directory ownership based on the process type (master or worker). - Removed the `is_master` check from the lifespan function, simplifying the code and enhancing clarity regarding temp directory cleanup responsibilities. - Added the `get_owned_temp_dirs` function in `preload.py` to centralize temp directory ownership logic, ensuring that workers do not attempt to clean up directories owned by the master process.
- Introduced conditional gates in the `get_lifespan` function to manage the initialization of profile pictures, super users, bundles, component types, and starter projects based on their completion status. - Improved logging to provide clearer insights into which steps are being skipped or executed, enhancing the overall clarity of the initialization process. - Updated the preload logic in `preload.py` to ensure that agentic global variables and MCP server configuration are only initialized when necessary, maintaining efficient resource management in a multi-worker environment.
…nd add preload tests - Fix double-call issue: setup_superuser() now handles AUTO_LOGIN completely with file lock - Add comprehensive unit tests for preload.py covering failure-fallback contract - Simplify code by doing superuser initialization in initialize_services() (called early in both preload and worker startup) - File lock protects multi-worker race conditions when preload is disabled - Tests verify critical step failures propagate, best-effort steps continue on failure Made-with: Cursor
…icts Fixed critical bugs introduced in c0e81a5 that caused preload failures: 1. Missing import: Added DEFAULT_SUPERUSER_PASSWORD to module-level imports - Was only imported inside AUTO_LOGIN block but used when AUTO_LOGIN=false - Caused NameError that crashed preload with "session scope error" 2. Removed agentic variable initialization from setup_superuser() - Prevents double-initialization conflict with preload's dedicated step - initialize_agentic_global_variables() in preload handles all users 3. Made teardown_superuser() more robust - Now skips deletion instead of raising errors on FK constraints - Prevents startup failures when default superuser has associated flows Resolves: "An error occurred during the session scope" preload error Resolves: Ghost thread warnings from incomplete initialization Made-with: Cursor
- Updated the `_PreloadState` class to include `bundles_loaded` and `types_cached` flags for better tracking of initialization steps. - Modified the `get_lifespan` function to utilize the new state flags, improving the conditional logic for loading bundles and caching component types. - Implemented a `reset` method in `_PreloadState` to ensure consistent state restoration after preload failures, enhancing reliability in multi-worker environments. - Simplified the teardown process in `ExternalAsyncBaseCacheService` by making `teardown` an abstract method, allowing direct calls without fallback checks. This refactor improves clarity and efficiency in the preload and lifespan management processes.
|
@jordanrfrazier I hope now it should be okay. I made memory tests, measured just after loading Langflow: no preload: Preload: |
|
@severfire Great. I'll block some time to look into this today and tomorrow! |
841d2d7 to
242c6c7
Compare
…o-superuser case The `initialized_services` fixture starts with `AUTO_LOGIN=false`, which runs `setup_superuser` through the credentials-fallback path and creates the default superuser. The "raises_when_no_superuser" test then mocked the lock to time out, but the existence check found that pre-created user and returned `AUTO_LOGIN_LOCK_TIMEOUT_SUPERUSER_PRESENT` instead of raising `RuntimeError`. Delete the default superuser before mocking the lock so the no-superuser branch is actually exercised.
78f82ca
|
@jordanrfrazier here is article I have written, please suggest changes: https://docs.google.com/document/d/12vOopCRs896_bJxY2_JtH9iTC--LMhH32NWssq_D-08/edit?tab=t.0#heading=h.pngb50tfb9bo |
|
@severfire Looks awesome, green light from me and our docs writer (@mendonk). Please feel free to post, and coordinate with Mendon on getting it published and linked onto the Langflow blog page - https://www.langflow.org/blog. Only notes were to check some paragraph spacing (likely a result of my copy/paste issue) on the And a question from me -- thoughts on reorganizing to move the benchmark results to the top? I see it states the ultimate savings early on, which is good, but you could see how it reads if you started from showing the results at top and then explanations of each section (in case we lose some readers through the technical parts). Either way, I think it reads great. Thanks for this. |
|
@jordanrfrazier I think it can be published when 1.10 will be released. Does it sound good? We could also add #12588 after we will be done with it, as it is also related to Reliability. So small section about it could be written. As how to publish it, I do not have access to edit/add to Langflow blog :-) so I guess I would require @mendonk help. |
|
@severfire Thanks for the great work. I can handle the blog publication for the 1.10.x release - I'll have a vercel build for you pretty soon. |
|
@severfire @jordanrfrazier Here's a Vercel preview build of the blog for 1.10.x. It's 1:1 with the google doc right now. Any suggestions? |
|
I like it! Thank you!@mendonk @jordanrfrazier Before running Langflow I checked RAM used with htop, |
|
@jordanrfrazier I'll run the doublecheck testing between versions and will follow up here. |
|
@mendonk thank you, not now, maybe next time :-D Maybe I will be able to write some tips on optimalizations for high load environments :-) |
…igrate_orphaned_mcp_servers_config PR #12778 (Gunicorn preload functionality) added migrate_orphaned_mcp_servers_config to langflow/services/utils.py but the AST-parity guard in test_services_utils_module_structure_unchanged was not updated. The test codifies the current function layout, so adding a function legitimately requires extending the expected list.
…igrate_orphaned_mcp_servers_config PR #12778 (Gunicorn preload functionality) added migrate_orphaned_mcp_servers_config to langflow/services/utils.py but the AST-parity guard in test_services_utils_module_structure_unchanged was not updated. The test codifies the current function layout, so adding a function legitimately requires extending the expected list.
Related PRs:
preload_appflag for Gunicorncc @erichare
Summary
This PR introduces a dedicated preload module that maximizes the memory-saving benefits of Gunicorn's
preload_appfeature. While PRs #12364 and #12587 enabled preload and fixed fork-safety bugs, workers were still duplicating significant initialization work post-fork. This PR moves all fork-safe operations into the master process so workers inherit the result via Linux Copy-on-Write (CoW), dramatically reducing per-worker memory consumption.What Changed
1. New
preload.pyModulepreload_master()) that runs exclusively in the Gunicorn master processlfx.interface.components.component_cache)2. Updated
main.pyLifespanis_preloaded()check3. Server Integration
LangflowApplication.load()inserver.pyto callpreload_master()whencfg.preload_appis enabledgc.freeze()call to prevent cyclic GC from unsharing CoW pages in workersMemory Usage Results
Tested with 30 workers on WSL:
Key Findings:
Technical Details
Fork-Safety
await get_db_service().engine.dispose()Copy-on-Write Optimization
gc.collect()+gc.freeze()moves preloaded objects into permanent generationState Detection
is_preloaded()returns True in any process forked from a master that ran preloadis_master()identifies if current process is the original master (for cleanup)get_preloaded_temp_dirs()returns bundle directories (master-owned)Safety & Compatibility
LANGFLOW_GUNICORN_PRELOAD=false(default), behavior is unchangedGhost Safety Analysis
Changes appear to be ghost safe. Here's why:
✅ Fork-Safe Practices Implemented
DB Connection Pool Disposal (preload.py:156-159):
Cache Service Teardown (preload.py:162-175):
Fork-Unsafe Resources Excluded:
Temp Directory Ownership (main.py:266-267):
COW Optimization (preload.py:201-204):
gc.freeze()to move preloaded objects into permanent generationIdempotent Service Initialization (main.py:200-202):
initialize_services()but it's documented as idempotent✅ Copy-On-Write (COW) Benefits
The commit correctly leverages COW for:
No Ghost State Detected
No dangling references, shared mutexes, or cross-process state that could cause:
Usage
No configuration changes required. Simply set the existing environment variable:
export LANGFLOW_GUNICORN_PRELOAD=true langflow run --workers 30The enhanced preload will automatically take effect.
Note: This PR focuses on maximizing CoW memory sharing for Python modules and in-memory state. Future work could explore sharing dynamically-loaded component libraries.