fix: implement weak references for vertex and custom component to pre…#11299
fix: implement weak references for vertex and custom component to pre…#11299ricofurtado wants to merge 2 commits into
Conversation
…vent memory leaks
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThis PR introduces weak references to break circular dependencies between Changes
Sequence DiagramsequenceDiagram
participant GC as Garbage Collector
participant Graph
participant EventLoop as Event Loop
participant TaskSet as end_trace_tasks
GC->>Graph: __del__() called
activate Graph
Graph->>EventLoop: asyncio.get_running_loop()
alt Loop running
EventLoop-->>Graph: loop instance
Graph->>Graph: cleanup_async_tasks()
activate Graph
Graph->>TaskSet: flush all tasks
Graph->>TaskSet: await with timeout
TaskSet-->>Graph: completion
Graph->>TaskSet: clear()
deactivate Graph
else No running loop
EventLoop-->>Graph: RuntimeError
Graph->>TaskSet: cancel pending tasks
end
deactivate Graph
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 5❌ Failed checks (1 error, 4 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR improves memory management by introducing weak references to prevent circular reference chains between Graph, Vertex, and CustomComponent objects. It also adds proper cleanup for asynchronous trace tasks to prevent resource leaks during object destruction.
Changes:
- Introduced weak references in
VertexandCustomComponentto break circular dependencies withGraphandVertexrespectively - Added async task cleanup mechanism in
Graph.__del__to handle pending trace tasks during garbage collection - Enhanced
_end_all_traces_asyncto gracefully handle event loop availability
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/lfx/src/lfx/graph/vertex/base.py | Converted strong Graph reference to weak reference with property-based access |
| src/lfx/src/lfx/graph/graph/base.py | Added async task cleanup in destructor and improved event loop handling in trace methods |
| src/lfx/src/lfx/custom/custom_component/custom_component.py | Converted strong Vertex reference to weak reference with property getter/setter |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """Cleanup when graph is destroyed.""" | ||
| if self._end_trace_tasks: | ||
| try: | ||
| loop = asyncio.get_event_loop() |
There was a problem hiding this comment.
Using asyncio.get_event_loop() in __del__ is deprecated in Python 3.10+ and can cause issues. Consider using asyncio.get_running_loop() with a try/except, or check if a loop exists before attempting to use it. The behavior of get_event_loop() in destructors is unreliable.
| def __del__(self): | ||
| """Cleanup when graph is destroyed.""" | ||
| if self._end_trace_tasks: | ||
| try: | ||
| loop = asyncio.get_event_loop() | ||
| if loop.is_running(): | ||
| cleanup_task = loop.create_task(self.cleanup_async_tasks()) | ||
|
|
||
| def _log_cleanup(t: asyncio.Task) -> None: | ||
| try: | ||
| exc = t.exception() | ||
| except Exception: # noqa: BLE001 | ||
| logger.exception("Cleanup task inspection failed") | ||
| return | ||
| if exc: | ||
| logger.warning("Cleanup task failed during __del__", exc_info=exc) | ||
| else: | ||
| logger.info("Cleanup task completed cleanly") | ||
|
|
||
| cleanup_task.add_done_callback(_log_cleanup) | ||
| elif loop.is_closed(): | ||
| for task in list(self._end_trace_tasks): | ||
| task.cancel() | ||
| self._end_trace_tasks.clear() | ||
| else: | ||
| loop.run_until_complete(self.cleanup_async_tasks()) | ||
| except Exception: # noqa: BLE001 | ||
| logger.exception("Error cleaning up async tasks") |
There was a problem hiding this comment.
The __del__ method creates a task at line 675 but doesn't await it when the loop is running. This fire-and-forget task may not complete before the object is fully destroyed, potentially leaving resources uncleaned. Additionally, calling run_until_complete at line 694 when the loop is not running could block if the loop has been stopped but not yet closed, which is problematic in a destructor.
There was a problem hiding this comment.
It is unlikely to happen, but I'll refactor it...
There was a problem hiding this comment.
Having a second thought about this: Graph keeps _end_trace_tasks (set of asyncio Tasks) and each task has a callback bound to self._end_trace_tasks.discard, while the task coroutine itself closes over self. That creates a cycle Graph -> set -> task -> callback -> Graph. Because Graph defines del, any such cycle is non-collectable, so a graph cached in ChatService.set_cache or kept alive by pending trace tasks will leak until the process exits.
There was a problem hiding this comment.
Python’s garbage collection improvements suggest that, in the long term, we should avoid using del, as it can interfere with optimal GC behavior and potentially introduce subtle issues. I’m removing it because the performance gains were negligible.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@src/lfx/src/lfx/custom/custom_component/custom_component.py`:
- Around line 118-125: The _vertex property currently returns None when the
weakref target was garbage-collected, causing downstream AttributeError in
callers like the graph property; change the _vertex getter to mirror
Vertex.graph by detecting when self._vertex_ref is set but self._vertex_ref() is
None and raise ValueError("Vertex has been garbage collected") instead of
returning None, leaving the setter (def _vertex) as-is so callers that
intentionally check `if self._vertex:` still work when no ref exists.
🧹 Nitpick comments (1)
src/lfx/src/lfx/graph/graph/base.py (1)
657-667: Consider reducing log verbosity for normal operation.INFO-level logging for every graph cleanup may produce excessive logs in production. Consider DEBUG level, or make logging conditional on whether there were actually tasks to clean up.
♻️ Suggested adjustment
async def cleanup_async_tasks(self) -> None: """Ensure all async tasks complete before cleanup.""" - logger.info(f"Cleaning up async tasks for run_id: {self._run_id}") + logger.debug(f"Cleaning up async tasks for run_id: {self._run_id}") if self._end_trace_tasks: - logger.info(f"Waiting for {len(self._end_trace_tasks)} trace tasks to complete") + logger.debug(f"Waiting for {len(self._end_trace_tasks)} trace tasks to complete") try: await asyncio.wait_for(asyncio.gather(*self._end_trace_tasks, return_exceptions=True), timeout=30.0) except asyncio.TimeoutError: logger.warning("Some trace tasks did not complete within timeout") finally: self._end_trace_tasks.clear()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/lfx/src/lfx/custom/custom_component/custom_component.pysrc/lfx/src/lfx/graph/graph/base.pysrc/lfx/src/lfx/graph/vertex/base.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-24T19:46:09.104Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/backend_development.mdc:0-0
Timestamp: 2025-11-24T19:46:09.104Z
Learning: Applies to src/backend/**/*.py : Use `asyncio.create_task()` for background tasks and implement proper cleanup with try/except for `asyncio.CancelledError`
Applied to files:
src/lfx/src/lfx/graph/graph/base.py
📚 Learning: 2025-11-24T19:47:28.997Z
Learnt from: CR
Repo: langflow-ai/langflow PR: 0
File: .cursor/rules/testing.mdc:0-0
Timestamp: 2025-11-24T19:47:28.997Z
Learning: Applies to src/backend/tests/**/*.py : Use async fixtures with proper cleanup using try/finally blocks to ensure resources are properly released after tests complete
Applied to files:
src/lfx/src/lfx/graph/graph/base.py
🧬 Code graph analysis (2)
src/lfx/src/lfx/graph/graph/base.py (2)
src/backend/base/langflow/services/task/backends/anyio.py (1)
cleanup_task(108-117)src/backend/base/langflow/server.py (1)
error(53-61)
src/lfx/src/lfx/custom/custom_component/custom_component.py (1)
src/lfx/src/lfx/graph/vertex/base.py (1)
Vertex(50-885)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Update Starter Projects
🔇 Additional comments (5)
src/lfx/src/lfx/graph/vertex/base.py (2)
7-7: LGTM: Weak reference import for breaking reference cycles.The import aligns with the PR objective to prevent memory leaks by avoiding reference cycles between
VertexandGraph.
118-126: LGTM: Property correctly dereferences weak reference with explicit error handling.The property pattern is well-implemented:
- Safely dereferences the weak reference
- Raises a clear
ValueErrorif the Graph has been garbage collected- Maintains the same public interface
src/lfx/src/lfx/graph/graph/base.py (1)
698-707: LGTM: Correct pattern for async trace scheduling.The implementation correctly:
- Uses
get_running_loop()to detect if we're in an async context- Falls back to synchronous execution via
run_until_completewhen no loop is running- Tracks tasks for later cleanup via
_end_trace_tasks- Uses
add_done_callbackwithdiscardfor automatic cleanupsrc/lfx/src/lfx/custom/custom_component/custom_component.py (2)
4-4: LGTM: Weak reference import for breaking reference cycles.The import aligns with the PR objective to prevent memory leaks.
91-91: LGTM: Weak reference storage for vertex.Correctly uses
weakref.ReferenceType[Vertex]type annotation for the weak reference.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…graph destruction
This pull request introduces improvements to memory management and resource cleanup in the
custom_componentandvertexmodules, and enhances the handling of asynchronous cleanup tasks in thegraphmodule. The main focus is on using weak references to avoid reference cycles and potential memory leaks, and ensuring async tasks are properly cleaned up when objects are destroyed.Memory management improvements:
The applied fixes reduce Langflow backend RSS growth by approximately 30–50% per run, eliminating the primary leak-like behaviour while leaving allocator-driven VSZ growth unchanged.
Tests were run in a 140-component flow, with 25 invocations of a subflow consisting of 12 components.
The flow was provided by Shuhei Kochi
Langflow Backend Memory Regression Report
Original Code vs Fixed Code
Summary
We ran repeated Langflow flow executions via the REST API and measured backend process memory usage after each run. Compared to the original implementation, the fixed version shows a significant reduction in RSS (resident memory) growth per run, indicating reduced memory retention across repeated executions.
While VSZ (virtual memory) growth remains largely unchanged, this behavior is consistent with allocator and arena behavior rather than a true memory leak.
Test Methodology
Flows executed repeatedly using Langflow REST API (
stream=false) within a single backend processMemory collected from
/proc/<PID>/status:RSS (MiB) →
VmRSS(primary signal for real memory retention)VSZ (MiB) →
VmSize(secondary signal; allocator-related)Each test consists of:
Warm-up runs (lazy loading, cache initialization)
A 15-run steady-state block using a fresh
baseline_beforeBoth successful (
200) and failed (500) runs were included to observe failure-path behaviorMemory deltas are computed relative to the
baseline_beforefor each blockData Sets Analyzed
Original Code
Run 1
Baseline RSS:
1327 MiBRSS after run 15:
1649 MiBRun 2
Baseline RSS:
1314 MiBRSS after run 15:
1715 MiBFixed Code
Run 1
Baseline RSS:
1374 MiBRSS after run 15:
1564 MiBRun 2
Baseline RSS:
1335 MiBRSS after run 15:
1617 MiBResults
1️⃣ RSS Growth Reduction (Primary Improvement)
RSS represents actual resident memory and is the strongest indicator of leak-like behavior.
Original Code
Run 1: +322 MiB over 15 runs → ~21.5 MiB/run
Run 2: +401 MiB over 15 runs → ~26.7 MiB/run
Fixed Code
Run 1: +190 MiB over 15 runs → ~12.7 MiB/run
Run 2: +282 MiB over 15 runs → ~18.8 MiB/run
Net Improvement
RSS growth reduced from ~21–27 MiB/run → ~13–19 MiB/run
Equivalent to approximately 30–50% reduction in resident memory growth per run
This strongly indicates that the fix reduced retained memory across repeated flow executions.
2️⃣ VSZ Behavior (Largely Unchanged)
VSZ growth remained similar across original and fixed versions:
Original: ~+5419 to +5476 MiB over 15 runs
(~361–365 MiB/run)
Fixed: ~+5285 to +5386 MiB over 15 runs
(~352–359 MiB/run)
Interpretation:
VSZ growth is consistent with Python allocator behavior (arenas, fragmentation,
mmap)The fact that RSS improved while VSZ did not suggests:
fewer live objects retained
allocator behavior remains the dominant VSZ factor
This does not indicate an active memory leak
3️⃣ Failure-Path Behavior (HTTP 500)
Failures were intentionally included in the run sets:
Version | Run | Failures -- | -- | -- Original | Run 1 | 4 / 15 Original | Run 2 | 4 / 15 Fixed | Run 1 | 5 / 15 Fixed | Run 2 | 4 / 15Even with intermittent
500responses:The fixed version consistently shows lower RSS growth
Failure paths no longer cause disproportionate memory retention
This suggests improved cleanup even under error conditions.
Key Conclusions
Resident memory retention per run improved significantly
~30–50% reduction in RSS growth rate
VSZ growth remains allocator-driven
No evidence of additional live-object leaks
Failure paths are safer
Errors no longer amplify memory growth trends
Recommendations / Next Steps
To further improve stability and observability:
Differentiate allocator behavior from true leaks
Add
tracemallocsnapshots (post-warmup vs post-run 15)Diff top allocation sites
Harden failure paths
Review resource cleanup on
500responsesEnsure request-scoped objects and buffers are released
Standardize benchmarks
Fixed warm-up count
Fixed delay between runs
Consistent baseline capture
One-Line Takeaway