feat: add s3 file storage implementation by jordanrfrazier · Pull Request #10526 · langflow-ai/langflow

jordanrfrazier · 2025-11-06T19:59:19Z

Adds s3 as a possible backing file storage service. Includes fixes to usage of database session scope.

Summary by CodeRabbit

Release Notes

New Features
- Added S3 storage backend support with async file operations and streaming capabilities for enterprise deployments.
- Introduced database migration advisory locks to safely support multiple concurrent instances sharing a database.
Improvements
- Enhanced file operation error handling with improved error messages and recovery.
- Optimized database transaction handling for improved reliability and concurrency.
- Streamlined profile picture management with filesystem-based access.
Documentation
- Added configuration documentation for PostgreSQL advisory lock namespace in multi-instance setups.

…egate from langflow to lfx

coderabbitai · 2025-11-06T19:59:39Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This pull request introduces comprehensive changes to database transaction management, storage abstraction, and session lifecycle handling. It replaces explicit commit patterns with flush-based operations and centralizes session management through session_scope. Additionally, it adds S3 storage support alongside local storage, updates file operations with enhanced error handling and cleanup logic, and introduces PostgreSQL migration locking via advisory locks. Dependencies are updated for AWS support.

Changes

Cohort / File(s)	Summary
Documentation `docs/docs/Develop/memory.mdx`	Added documentation for LANGFLOW_MIGRATION_LOCK_NAMESPACE environment variable describing optional PostgreSQL advisory lock namespace for migrations.
Dependencies `pyproject.toml`	Updated langchain-aws from exact version (==0.2.33) to range (>=0.2.33,<1.0.0); added new aioboto3 dependency (>=15.2.0,<16.0.0).
CLI & Migration Management `src/backend/base/langflow/__main__.py`, `src/backend/base/langflow/alembic/env.py`	Modified API key CLI flow to return result directly; enhanced PostgreSQL migrations with advisory lock mechanism (namespace-derived or default), lock timeout configuration, and conditional prepared statement disabling.
Session Management Refactoring `src/backend/base/langflow/api/utils/core.py`, `src/backend/base/langflow/services/deps.py`, `src/backend/base/langflow/services/auth/utils.py`	Replaced get_session dependency with session_scope for auto-commit; introduced DbSessionReadOnly using session_scope_readonly; added deprecation path for get_session with NotImplementedError.
Database Service Architecture `src/backend/base/langflow/services/database/service.py`, `src/backend/base/langflow/services/database/models/.../*`	Renamed with_session to _with_session; introduced async_session_maker; replaced commit patterns with flush in multiple CRUD operations (api_key, message, user, folder utilities).
API Endpoints – Transaction Handling `src/backend/base/langflow/api/v1/chat.py`, `src/backend/base/langflow/api/v1/flows.py`, `src/backend/base/langflow/api/v1/projects.py`, `src/backend/base/langflow/api/v1/users.py`, `src/backend/base/langflow/api/v1/mcp_projects.py`, `src/backend/base/langflow/api/v1/monitor.py`	Systematically replaced session.commit() with session.flush(); converted ORM objects to read schemas (FlowRead, FolderRead) within active sessions to prevent detached instance errors; adjusted return types to expose read models.
Storage Services – Local Implementation `src/backend/base/langflow/services/storage/local.py`, `src/lfx/src/lfx/services/storage/local.py`	Refactored LocalStorageService to delegate to lfx backend; added resolve_component_path, get_file_stream, get_file_size methods; switched to async file I/O operations.
Storage Services – S3 Implementation `src/backend/base/langflow/services/storage/s3.py`, `src/lfx/src/lfx/services/storage/s3.py`	Replaced synchronous boto3 with async aioboto3; refactored method signatures to use flow_id/file_name; added error mapping for FileNotFoundError/PermissionError; implemented get_file_stream, get_file_size, resolve_component_path; added configuration validation and tagging support.
Storage Services – Base Architecture `src/backend/base/langflow/services/storage/service.py`, `src/backend/base/langflow/services/storage/__init__.py`, `src/lfx/src/lfx/services/storage/service.py`	Updated StorageService to inherit from both Service and LfxStorageService; changed constructor to require session_service and settings_service; expanded abstract interface with build_full_path, resolve_component_path, get_file_stream, get_file_size, teardown; exported storage implementations.
File Upload & Download – Enhanced Error Handling `src/backend/base/langflow/api/v1/files.py`, `src/backend/base/langflow/api/v2/files.py`	Replaced storage-backed profile pictures with filesystem references; added FileNotFoundError (404) and PermissionError (403) handling; introduced file size retrieval post-save; added transactional cleanup on DB insert failure; enhanced delete paths with storage failure logging.
Storage-Aware Data Components `src/lfx/src/lfx/base/data/base_file.py`, `src/lfx/src/lfx/base/data/utils.py`, `src/lfx/src/lfx/base/data/storage_utils.py`, `src/lfx/src/lfx/components/data/{csv_to_data, file, json_to_data}.py`	Added storage_utils.py with parse_storage_path, read_file_bytes, read_file_text, get_file_size, file_exists; introduced async parsing (parse_text_file_to_data_async, read_docx_file_async, parse_pdf_to_text_async); updated data components for S3-aware file reading with lazy validation.
LangChain Utilities – Storage-Aware Processing `src/lfx/src/lfx/components/langchain_utilities/{csv_agent, json_agent}.py`, `src/lfx/src/lfx/components/twelvelabs/{split_video, video_file}.py`, `src/lfx/src/lfx/components/vectorstores/local_db.py`, `src/lfx/src/lfx/graph/vertex/param_handler.py`	Added local path resolution for S3 files with temporary download and cleanup; added S3 guards raising ValueError for incompatible components (video processing, local vector stores); replaced path construction with resolve_component_path.
Memory & Task Management `src/backend/base/langflow/memory.py`, `src/backend/base/langflow/services/task/temp_flow_cleanup.py`, `src/backend/base/langflow/services/flow/flow_runner.py`, `src/backend/base/langflow/services/variable/service.py`	Removed batch commits; introduced aadd_messagetables with retry logic for CancelledError; replaced commits with flushes in variable/flow operations; removed explicit rollbacks.
Setup & Initialization `src/backend/base/langflow/initial_setup/setup.py`, `src/backend/base/langflow/services/utils.py`, `src/backend/base/langflow/main.py`	Replaced commits with flushes in folder/flow/project creation; adjusted folder assignment logic; moved tempfile/FileLock imports to module scope; wrapped cleanup tasks in session_scope context.
Settings Configuration `src/lfx/src/lfx/services/settings/base.py`	Added migration_lock_namespace, object_storage_bucket_name, object_storage_prefix, object_storage_tags fields; extended sqlite_pragmas with busy_timeout.
lfx Session Management `src/lfx/src/lfx/services/deps.py`	Implemented proper session_scope with commit/rollback semantics; added session_scope_readonly for read-only operations; deprecated get_session with NotImplementedError; added InvalidRequestError handling.
Backend Tests – Session Refactoring `src/backend/tests/conftest.py`, `src/backend/tests/unit/test_database.py`, `src/backend/tests/unit/api/v1/{test_files, test_mcp_projects}.py`, `src/backend/tests/integration/components/mcp/test_mcp_superuser_flow.py`	Replaced per-session db_manager usage with session_scope; removed explicit commit calls; updated session_scope import source from langflow to lfx where applicable.
Backend Tests – S3 Integration `src/backend/tests/unit/api/test_s3_endpoints.py`, `src/backend/tests/unit/api/v2/test_files.py`, `src/backend/tests/unit/components/data/{test_s3_components, test_s3_uploader_component}.py`, `src/backend/tests/unit/services/storage/{test_local_storage_service, test_s3_storage_service}.py`	Added comprehensive test suites for S3 storage operations including streaming downloads, uploads, deletions, error handling, and metadata retrieval; added fixtures for AWS credential validation; updated S3 uploader test decorator.
Frontend File Handling `src/frontend/src/components/core/parameterRenderComponent/components/inputFileComponent/index.tsx`, `src/frontend/src/controllers/API/queries/file-management/use-post-upload-file.ts`, `src/frontend/src/hooks/files/use-upload-file.ts`	Added null-safety guards for file arrays; added array validation in cache updates; enhanced error message extraction from response data and fallback handling.
lfx Component & Utility Tests `src/lfx/tests/unit/base/data/test_storage_utils.py`, `src/lfx/tests/unit/components/langchain_utilities/{test_csv_agent, test_json_agent}.py`, `src/backend/tests/unit/components/processing/test_save_file_component.py`	Added comprehensive test coverage for storage utilities, CSV/JSON agents with local/S3 paths and temp file cleanup, and SaveFileComponent refactoring with async mocks.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API
    participant SessionScope as session_scope()
    participant DBService
    participant DB as Database

    Client->>API: Request (write operation)
    API->>SessionScope: Enter context
    SessionScope->>DBService: _with_session()
    DBService->>DB: Begin transaction
    DB-->>SessionScope: AsyncSession
    SessionScope-->>API: Yield session

    API->>API: Perform operation
    API->>SessionScope: flush() instead of commit()
    SessionScope->>DB: Send pending changes (no commit)
    DB-->>SessionScope: Changes staged

    API->>API: Additional operations in same transaction
    
    alt Success
        API->>SessionScope: Exit context normally
        SessionScope->>DB: COMMIT
        DB-->>SessionScope: Transaction committed
    else Exception
        API->>SessionScope: Exit context (exception)
        SessionScope->>DB: ROLLBACK
        DB-->>SessionScope: Transaction rolled back
    end
    
    SessionScope-->>Client: Response

sequenceDiagram
    participant Component as File Component
    participant Settings as Settings Service
    participant Storage as Storage Service
    participant LocalFS as Local Filesystem
    participant S3 as S3 Storage

    Component->>Settings: get_settings_service()
    Settings-->>Component: storage_type (s3 or local)

    alt storage_type == "s3"
        Component->>Storage: resolve_component_path(path)
        Storage->>S3: Parse S3 key
        S3-->>Storage: flow_id/file_name
        Storage-->>Component: S3 key
        
        Component->>Storage: read_file_bytes(s3_path)
        Storage->>S3: GetObject request
        S3-->>Storage: File bytes
        Storage-->>Component: File bytes
    else storage_type == "local"
        Component->>Storage: resolve_component_path(path)
        Storage->>LocalFS: Resolve path
        LocalFS-->>Storage: Local path
        Storage-->>Component: Local path
        
        Component->>Storage: read_file_bytes(local_path)
        Storage->>LocalFS: Read file
        LocalFS-->>Storage: File bytes
        Storage-->>Component: File bytes
    end

    Component->>Component: Process file content

sequenceDiagram
    participant Migration as Alembic Migration
    participant Env as env.py
    participant Settings as LANGFLOW Settings
    participant Lock as PostgreSQL Advisory Lock
    participant DB as PostgreSQL DB

    Migration->>Env: run_migrations(PostgreSQL)
    Env->>Settings: Check LANGFLOW_MIGRATION_LOCK_NAMESPACE
    
    alt LANGFLOW_MIGRATION_LOCK_NAMESPACE set
        Settings-->>Env: namespace value
        Env->>Env: Compute lock_key = hash(namespace)
    else LANGFLOW_MIGRATION_LOCK_NAMESPACE not set
        Settings-->>Env: None
        Env->>Env: Use default lock_key
    end
    
    Env->>DB: SET lock_timeout to 180s
    DB-->>Env: Configured
    
    Env->>Lock: SELECT pg_advisory_xact_lock(lock_key)
    Lock->>DB: Acquire lock
    DB-->>Lock: Lock acquired
    Lock-->>Env: Lock held
    
    Env->>Migration: Proceed with migration
    Migration->>DB: Run SQL changes
    DB-->>Migration: Changes applied
    
    Migration-->>Env: Complete
    Env->>Lock: Release lock (transaction end)
    Lock->>DB: Lock released

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Areas requiring extra attention during review:

Session & Transaction Management Refactoring (src/backend/base/langflow/services/database/service.py, src/lfx/src/lfx/services/deps.py): Critical architectural change from per-session management to centralized session_scope. Requires verification that all transaction boundaries are preserved, especially around flush vs. commit semantics and rollback error handling (InvalidRequestError).
Storage Service Abstraction (src/backend/base/langflow/services/storage/s3.py, src/lfx/src/lfx/services/storage/service.py): Complete redesign of storage layer with S3 support. Verify async aioboto3 integration, error mapping correctness, resource cleanup (client context managers), and file size/streaming behavior.
Database API Endpoint Changes (src/backend/base/langflow/api/v1/flows.py, src/backend/base/langflow/api/v1/projects.py): Multiple endpoints return different types (ORM models converted to read schemas). Verify in-session conversion prevents detached instance errors, flush points are correct, and all transaction paths maintain consistency.
File Operations & Cleanup (src/backend/base/langflow/api/v2/files.py): Enhanced error handling with new exception types and transactional cleanup on failure. Verify orphaned file cleanup logic, error propagation, and that all paths handle both success and failure cleanup correctly.
Storage-Aware Components (src/lfx/src/lfx/components/data/{base_file, utils, storage_utils}.py, src/lfx/src/lfx/components/langchain_utilities/*): New conditional logic based on storage_type with lazy validation for S3 and eager validation for local. Verify path resolution, temporary file cleanup on S3, and guard conditions preventing incompatible operations.
PostgreSQL Migration Locking (src/backend/base/langflow/alembic/env.py): New advisory lock mechanism with namespace hashing. Verify lock_key computation, timeout configuration, and that locking doesn't introduce deadlock scenarios in concurrent migration scenarios.
Dependency Version Changes (pyproject.toml): Updated langchain-aws to range constraint and added aioboto3. Verify compatibility across versions and that no breaking changes in boto3/aioboto3 API affect code paths.

Possibly related PRs

feat: introduce lfx package #9133: Introduces and integrates the lfx package, replacing langflow imports with lfx throughout; overlaps with session_scope and service imports in this PR.
fix: Update dependency versions - pyproject.toml #10028: Broadens dependency version constraints in pyproject.toml; directly related through dependency version updates.
fix: Run docling processing in subprocess #9541: Modifies FileComponent's Docling processing with subprocess-based workflow; touches the same file and feature as Docling updates in this PR.

Suggested labels

database, storage, session-management, s3-integration, transactions, lfx-integration

Suggested reviewers

erichare
ogabrielluiz

Pre-merge checks and finishing touches

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 4 warnings)

Check name	Status	Explanation	Resolution
Test Coverage For New Implementations	❌ Error	PR adds 7 new test files for S3/storage implementations, but tests contain critical bugs preventing execution and LFX S3StorageService lacks dedicated test coverage.	Fix test implementation bugs: correct module patches in test_storage_utils.py, fix exception types in test_s3_endpoints.py, use AsyncMock for async functions, remove incorrect awaits, replace asyncio.run() with run_until_complete(), add dedicated tests for lfx/services/storage/s3.py.
Docstring Coverage	⚠️ Warning	Docstring coverage is 72.36% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Quality And Coverage	⚠️ Warning	Test suite contains critical issues: AsyncMock not used for async functions, incorrect patch targets (langflow vs lfx), wrong error type expectations (FileNotFoundError vs HTTPException), and await on synchronous methods.	Replace MagicMock with AsyncMock for async functions, fix patch targets to lfx.base.data.storage_utils, remove incorrect get_file_size assertions, expect HTTPException 404 instead of FileNotFoundError, remove await from sync methods.
Test File Naming And Structure	⚠️ Warning	Test files contain multiple structural violations: incorrect mock patch module paths (langflow.base instead of lfx.base), missing AsyncMock imports for async patches, improper await calls on sync methods, and incorrect exception type expectations in assertions.	Correct all patch decorators to target proper module paths (lfx.base.data.storage_utils), add AsyncMock imports and use new_callable=AsyncMock for async patches, remove incorrect await calls on sync methods, and update test expectations to match actual FastAPI exception behavior.
Excessive Mock Usage Warning	⚠️ Warning	Test files exhibit excessive mock usage that obscures actual behavior verification. Unit tests mock core logic and dependencies rather than testing real interactions, while using incorrect patch targets and inconsistent async mock patterns.	Replace mocks of core logic with real objects (temp directories, actual components). Fix patch targets to correct module paths (lfx.base not langflow.base). Use AsyncMock for async functions consistently. Remove await calls on synchronous APIs. Reserve mocks for external dependencies only.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add s3 file storage implementation' clearly and concisely summarizes the main change: adding S3 as a file storage backend.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-11-06T20:01:11Z

Frontend Unit Test Coverage Report

Coverage Summary

Lines	Statements	Branches	Functions
	15.3% (4188/27372)	8.5% (1778/20915)	9.6% (579/6029)

Unit Test Results

Tests	Skipped	Failures	Errors	Time
1638	0 💤	0 ❌	0 🔥	20.961s ⏱️

codecov · 2025-11-06T20:01:49Z

Codecov Report

❌ Patch coverage is 47.97297% with 385 lines in your changes missing coverage. Please review.
✅ Project coverage is 32.38%. Comparing base (348b1b8) to head (02f0ee1).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/backend/base/langflow/services/storage/s3.py	11.27%	118 Missing ⚠️
src/lfx/src/lfx/base/data/utils.py	17.64%	56 Missing ⚠️
src/backend/base/langflow/api/v2/files.py	69.11%	42 Missing ⚠️
src/lfx/src/lfx/base/data/base_file.py	32.65%	28 Missing and 5 partials ⚠️
src/lfx/src/lfx/services/storage/local.py	21.05%	30 Missing ⚠️
src/lfx/src/lfx/services/deps.py	48.71%	19 Missing and 1 partial ⚠️
...rComponent/components/inputFileComponent/index.tsx	0.00%	12 Missing ⚠️
src/lfx/src/lfx/base/data/storage_utils.py	85.29%	5 Missing and 5 partials ⚠️
...rc/backend/base/langflow/services/storage/local.py	78.04%	9 Missing ⚠️
...backend/base/langflow/services/variable/service.py	58.82%	7 Missing ⚠️
... and 18 more

❌ Your project status has failed because the head coverage (40.17%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #10526      +/-   ##
==========================================
+ Coverage   32.10%   32.38%   +0.28%     
==========================================
  Files        1364     1366       +2     
  Lines       62528    62943     +415     
  Branches     9266     9304      +38     
==========================================
+ Hits        20077    20387     +310     
- Misses      41437    41531      +94     
- Partials     1014     1025      +11

Flag	Coverage Δ
backend	`51.08% <49.75%> (+0.51%)`	⬆️
frontend	`14.14% <10.52%> (-0.01%)`	⬇️
lfx	`40.17% <47.86%> (+0.19%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/backend/base/langflow/api/utils/core.py	`62.44% <100.00%> (+0.17%)`	⬆️
src/backend/base/langflow/api/v1/chat.py	`39.58% <ø> (+0.41%)`	⬆️
src/backend/base/langflow/api/v1/files.py	`66.14% <ø> (ø)`
src/backend/base/langflow/api/v1/users.py	`66.66% <100.00%> (+1.60%)`	⬆️
src/backend/base/langflow/helpers/user.py	`65.00% <100.00%> (ø)`
src/backend/base/langflow/main.py	`65.99% <100.00%> (+8.06%)`	⬆️
src/backend/base/langflow/services/auth/utils.py	`57.14% <100.00%> (-0.82%)`	⬇️
...ase/langflow/services/database/models/user/crud.py	`82.60% <100.00%> (+1.75%)`	⬆️
.../backend/base/langflow/services/storage/service.py	`78.78% <100.00%> (ø)`
...d/base/langflow/services/task/temp_flow_cleanup.py	`61.53% <ø> (+1.53%)`	⬆️
... and 31 more

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (10)

src/backend/base/langflow/api/v1/files.py (1)
128-151: Critical: Path traversal vulnerability allows unauthorized file access.

The function constructs file paths by directly concatenating user-provided folder_name and file_name without validation (lines 138-139). An attacker can use path traversal sequences like ../ to access files outside the intended profile_pictures directory.

Apply this diff to validate that the resolved path stays within the intended directory:
     try:
         # Profile pictures are in the package installation directory
         package_dir = Path(__file__).parent.parent.parent / "initial_setup" / "profile_pictures"
         file_path = package_dir / folder_name / file_name
+        
+        # Prevent path traversal by ensuring resolved path is within package_dir
+        if not file_path.resolve().is_relative_to(package_dir.resolve()):
+            raise HTTPException(status_code=400, detail="Invalid file path")
 
         if not file_path.exists():
             raise HTTPException(status_code=404, detail="Profile picture not found")
Additional issue: Redundant exception handling.

The HTTPException raised at line 142 is caught and re-wrapped at lines 149-150. Consider letting HTTPExceptions propagate naturally.

Apply this diff to fix the redundant error handling:
-    except Exception as e:
+    except HTTPException:
+        raise
+    except Exception as e:
         raise HTTPException(status_code=500, detail=str(e)) from e
src/backend/base/langflow/services/database/models/folder/utils.py (1)
26-34: Fix folder reassignment filter

Flow.folder_id is None is evaluated immediately by Python, returning False, so the UPDATE never matches any rows. As a result, flows with a null folder_id are never migrated into the default folder, defeating the purpose of this helper. Switch to SQLAlchemy's .is_(None) (or equivalent) so the predicate is rendered in SQL. Suggested fix:
-        await session.exec(
-            update(Flow)
-            .where(
-                and_(
-                    Flow.folder_id is None,
-                    Flow.user_id == user_id,
-                )
-            )
-            .values(folder_id=folder.id)
-        )
+        await session.exec(
+            update(Flow)
+            .where(
+                and_(
+                    Flow.folder_id.is_(None),
+                    Flow.user_id == user_id,
+                )
+            )
+            .values(folder_id=folder.id)
+        )
src/backend/tests/unit/api/v1/test_files.py (1)
51-72: Add flush before refresh to ensure object persistence.

At line 63, session.refresh(user) is called immediately after session.add(user) without an intervening flush. The refresh() operation requires the object to be persistent in the database, but without a flush, the INSERT may not have been sent yet.

Apply this diff:
         else:
             session.add(user)
+            await session.flush()
             await session.refresh(user)
The same issue exists in the files_flow fixture at lines 84-85. Apply a similar fix:
     async with session_scope() as session:
         session.add(flow)
+        await session.flush()
         await session.refresh(flow)
src/backend/base/langflow/services/deps.py (1)
152-170: Missing import for asynccontextmanager decorator.

Line 152 uses the @asynccontextmanager decorator, but asynccontextmanager is not imported. This will cause a NameError at runtime.

Add the missing import at the top of the file:
 from __future__ import annotations
 
+from contextlib import asynccontextmanager
 from typing import TYPE_CHECKING
src/backend/tests/conftest.py (1)

515-535: Give the superuser fixture its own username.

Both active_user and active_super_user now create the username "activeuser". If a test requests both fixtures, active_super_user will pick up the user created by active_user, leaving is_superuser=False and breaking the test. Please use a distinct username (or otherwise ensure is_superuser is flipped before yielding) so the superuser fixture always returns a real superuser.
src/backend/base/langflow/services/auth/utils.py (2)
145-150: Fix dependency injection to return AsyncSession.

Depends(session_scope) is pulling in lfx.services.deps.session_scope, which is an @asynccontextmanager. FastAPI will inject the context manager object itself, not an AsyncSession, so every call to get_current_user will pass a _AsyncGeneratorContextManager into downstream CRUD helpers and crash at runtime. Import the backend wrapper that yields the session (e.g., from langflow.services.deps import session_scope) or expose a generator-style function here.
-from lfx.services.deps import session_scope
+from langflow.services.deps import session_scope
586-591: Same dependency bug for MCP path.

The MCP handler still injects the async context manager instead of an AsyncSession, so MCP auth will fail the moment it hits the database. Align this import with the backend wrapper (see comment above) so the dependency actually yields a session.
src/backend/base/langflow/api/v2/files.py (1)
467-501: Fix download streaming flow and HTTP error mapping.

Calling await storage_service.get_file(...) before the streaming branch means we always pull the entire payload into memory—even when the backend supports chunked streaming—so large S3 downloads still read the whole file twice. A missing file now bubbles up as an unhandled FileNotFoundError, which our outer except Exception converts into a 500 instead of the expected 404. The fallback path also does await byte_stream_generator(...); that function is an async generator, so the await raises TypeError whenever we hit a storage backend without true streaming support.

Please restructure this block so we only invoke get_file when it’s actually needed (content return or non-streaming fallback), convert FileNotFoundError/PermissionError into 404/403 immediately, and drop the extra await on byte_stream_generator. For example:
-        # Get file stream
-        file_stream = await storage_service.get_file(flow_id=str(current_user.id), file_name=file_name)
-
-        if file_stream is None:
-            raise HTTPException(status_code=404, detail="File stream not available")
-
-        # If return_content is True, read the file content and return it
-        if return_content:
-            # For content return, get the full file
-            file_content = await storage_service.get_file(flow_id=str(current_user.id), file_name=file_name)
-            if file_content is None:
-                raise HTTPException(status_code=404, detail="File not found")
-            return await read_file_content(file_content, decode=True)
-
-        # For streaming, use the appropriate method based on storage type
-        if hasattr(storage_service, "get_file_stream"):
-            # S3 storage - use streaming method
-            file_stream = storage_service.get_file_stream(flow_id=str(current_user.id), file_name=file_name)
-            byte_stream = file_stream
-        else:
-            # Local storage - get file and convert to stream
-            file_content = await storage_service.get_file(flow_id=str(current_user.id), file_name=file_name)
-            if file_content is None:
-                raise HTTPException(status_code=404, detail="File not found")
-            byte_stream = await byte_stream_generator(file_content)
+        try:
+            if return_content:
+                file_content = await storage_service.get_file(flow_id=str(current_user.id), file_name=file_name)
+                return await read_file_content(file_content, decode=True)
+
+            if callable(getattr(storage_service, "get_file_stream", None)):
+                byte_stream = storage_service.get_file_stream(flow_id=str(current_user.id), file_name=file_name)
+            else:
+                file_content = await storage_service.get_file(flow_id=str(current_user.id), file_name=file_name)
+                byte_stream = byte_stream_generator(file_content)
+        except FileNotFoundError as exc:
+            raise HTTPException(status_code=404, detail=str(exc)) from exc
+        except PermissionError as exc:
+            raise HTTPException(status_code=403, detail=str(exc)) from exc
This keeps streaming efficient, preserves memory, and returns the correct status codes for missing or forbidden files. After this change, the tests asserting a 404 can pass without flakiness.
src/backend/tests/unit/api/v2/test_files.py (1)
1-10: Import the modules you use.

json.dumps and uuid.uuid4 are referenced later in this file, but json and uuid are never imported. As soon as the S3 fixtures run, pytest will raise NameError. Please add the missing imports near the top:
-import asyncio
-import os
+import asyncio
+import json
+import os
 import tempfile
 from contextlib import suppress
 from pathlib import Path
+import uuid
src/backend/base/langflow/api/v1/flows.py (1)
270-275: Paginated branch still returns ORM models

When get_all is False we still return the raw Page of Flow ORM instances. That bypasses the new FlowRead.model_validate(..., from_attributes=True) conversion, so the paginated response reintroduces the same detached-instance/serialization problems and no longer matches the declared Page[FlowRead] response model. Please convert the paginated items to FlowRead before returning.
-            return await apaginate(session, stmt, params=params)
+            page = await apaginate(session, stmt, params=params)
+            flow_reads = [FlowRead.model_validate(flow, from_attributes=True) for flow in page.items]
+            page_dict = page.model_dump()
+            page_dict["items"] = flow_reads
+            return Page(**page_dict)

🧹 Nitpick comments (6)

src/backend/base/langflow/api/v1/files.py (1)
153-174: Consider defensive path validation.

While the folder names are currently hardcoded ("People", "Space"), applying the same path validation pattern as recommended for download_profile_picture would provide defense-in-depth against future modifications or directory structure issues.

Additionally, the same redundant exception handling issue exists here. Consider letting HTTPExceptions propagate:
     try:
         # Profile pictures are in the package installation directory
         package_dir = Path(__file__).parent.parent.parent / "initial_setup" / "profile_pictures"
+        
+        # Validate package_dir exists within expected bounds
+        if not package_dir.exists():
+            raise HTTPException(status_code=500, detail="Profile pictures directory not found")
 
         people_path = package_dir / "People"
         space_path = package_dir / "Space"
 
         # List files from package directory - these are bundled with the container
         people = [f.name for f in people_path.iterdir() if f.is_file()] if people_path.exists() else []
         space = [f.name for f in space_path.iterdir() if f.is_file()] if space_path.exists() else []
-    except Exception as e:
+    except HTTPException:
+        raise
+    except Exception as e:
         raise HTTPException(status_code=500, detail=str(e)) from e
src/frontend/src/hooks/files/use-upload-file.ts (1)
60-65: Preserve the original error for better debugging.

The error message normalization pattern is good and provides a consistent user-facing message. However, re-throwing a new Error discards the original stack trace and error properties, which can complicate debugging.

Apply this diff to preserve the original error context:
-    } catch (e: any) {
+    } catch (e: unknown) {
       const errorMessage =
-        e?.response?.data?.detail ||
-        e?.message ||
+        (e as any)?.response?.data?.detail ||
+        (e as Error)?.message ||
         "An error occurred while uploading the file";
-      throw new Error(errorMessage);
+      throw new Error(errorMessage, { cause: e });
     }
This change:

Uses unknown for better type safety (explicit casting required)

Preserves the original error as cause, maintaining stack traces and debugging context

Keeps the normalized message for user-facing error handling
src/frontend/src/controllers/API/queries/file-management/use-post-upload-file.ts (1)
45-53: Make type annotation consistent.

Line 45 uses any for the old parameter, while lines 33 and 60 use FileType[]. For consistency and better type safety, consider using FileType[] here as well, or use unknown if you need to handle potentially non-array values before the guard clause.

Apply this diff to make the type annotation consistent:
-                queryClient.setQueryData(["useGetFilesV2"], (old: any) => {
+                queryClient.setQueryData(["useGetFilesV2"], (old: FileType[]) => {
src/backend/base/langflow/services/database/models/user/crud.py (1)

30-32: Consider removing commented-out code.

The commented-out username uniqueness check appears to be dead code. If the validation is no longer needed (perhaps enforced at the database level or elsewhere), consider removing these lines to improve code clarity.
src/lfx/src/lfx/components/twelvelabs/video_file.py (1)
146-149: Consider extracting the duplicated error message.

The same error message appears in both process_files (line 103) and load_files (line 148). Consider defining it as a class constant to ensure consistency and easier maintenance.
+    S3_NOT_SUPPORTED_MSG = "Video processing is not supported in S3 mode. Use local storage mode to enable this component."
+
     def process_files(self, file_list: list[BaseFileComponent.BaseFile]) -> list[BaseFileComponent.BaseFile]:
         """Process video files."""
         settings = get_settings_service().settings
         if settings.storage_type == "s3":
-            msg = "Video processing is not supported in S3 mode. Use local storage mode to enable this component."
-            raise ValueError(msg)
+            raise ValueError(self.S3_NOT_SUPPORTED_MSG)
src/backend/base/langflow/api/utils/core.py (1)

13-13: Well-designed session dependency separation.

The introduction of DbSessionReadOnly alongside DbSession is a good architectural improvement. The read-only variant avoids unnecessary locking and commit overhead for read operations.

Consider using DbSessionReadOnly in read-heavy endpoints (like GET operations) to improve performance and reduce database contention.

Also applies to: 38-41

github-actions · 2025-11-24T23:36:13Z

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

github-actions · 2025-11-25T00:10:55Z

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

github-actions · 2025-11-25T02:33:01Z

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

github-actions · 2025-11-25T02:36:33Z

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

github-actions · 2025-11-25T02:39:28Z

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

github-actions · 2025-11-25T03:04:31Z

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

github-actions · 2025-11-25T03:13:18Z

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

github-actions · 2025-11-25T03:26:32Z

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

github-actions · 2025-11-25T04:45:56Z

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

jordanrfrazier · 2025-11-25T05:14:03Z

@CodeRabbit review PR for issues

coderabbitai · 2025-11-25T05:15:38Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for potential session.flush() without proper transaction context
rg -n 'await session\.flush\(\)' --type=py -B 3 -A 3 src/backend/base/langflow/ | head -150

Length of output: 12504

🏁 Script executed:

#!/bin/bash
# Look for S3 client usage patterns and potential resource leaks
rg -n '_get_client|get_client' --type=py -A 10 src/backend/base/langflow/services/storage/s3.py

Length of output: 4067

🏁 Script executed:

#!/bin/bash
# Check for missing error handling in file operations
rg -n 'async def (save_file|get_file|delete_file)' --type=py -A 20 src/backend/base/langflow/services/storage/s3.py

Length of output: 4315

🏁 Script executed:

#!/bin/bash
# Look for potential issues with database session scope usage in endpoints
rg -n 'DbSession|session_scope' --type=py -C 5 src/backend/base/langflow/api/v1/flows.py | head -100

Length of output: 3004

🏁 Script executed:

#!/bin/bash
# Check if there are any missing await keywords
rg -n 'session\.(commit|flush|rollback|refresh)\(\)' --type=py src/backend/base/langflow/ | grep -v await | head -50

Length of output: 119

🏁 Script executed:

#!/bin/bash
# Check S3 initialization and bucket validation
rg -n 'def __init__|bucket_name|aioboto3' --type=py -C 5 src/backend/base/langflow/services/storage/s3.py

Length of output: 7368

🏁 Script executed:

#!/bin/bash
# Look for potential race conditions in session management
rg -n 'session_scope|_with_session' --type=py -C 3 src/backend/base/langflow/services/database/service.py

Length of output: 1729

🏁 Script executed:

#!/bin/bash
# Check for proper cleanup in async generators
ast-grep --pattern 'async def $FUNC($$$) -> AsyncIterator[$$$]:
  $$$'

Length of output: 23996

@jordanrfrazier I've completed a comprehensive review of the PR. Here are my findings:

🔴 CRITICAL BUG: S3 Streaming Connection Lifecycle Issue

Location: src/backend/base/langflow/services/storage/s3.py:185-223 - get_file_stream() method

Problem: The async generator has a fundamental context manager lifecycle bug. When the generator yields its first chunk, control returns to the caller, but the async with self._get_client() context manager exits immediately, closing the S3 client before the generator finishes yielding all chunks:

async def get_file_stream(self, flow_id: str, file_name: str, chunk_size: int = 8192) -> AsyncIterator[bytes]:
    key = self.build_full_path(flow_id, file_name)
    
    try:
        async with self._get_client() as s3_client:  # ❌ Context exits after first yield!
            response = await s3_client.get_object(Bucket=self.bucket_name, Key=key)
            body = response["Body"]
            
            try:
                async for chunk in body.iter_chunks(chunk_size):
                    yield chunk  # Function returns here, context manager exits
            finally:
                if hasattr(body, "close"):
                    with contextlib.suppress(Exception):
                        await body.close()

When FastAPI's StreamingResponse consumes this generator, the S3 connection will already be closed, causing incomplete downloads or connection errors.

Fix: Restructure to keep the aioboto3 session alive during the entire generator lifecycle:

@asynccontextmanager
async def _get_streaming_client(self):
    """Get S3 client that stays alive for streaming operations."""
    async with self.session.client("s3") as client:
        yield client

async def get_file_stream(self, flow_id: str, file_name: str, chunk_size: int = 8192) -> AsyncIterator[bytes]:
    key = self.build_full_path(flow_id, file_name)
    
    try:
        # Keep client alive for entire generator lifecycle
        async with self._get_streaming_client() as s3_client:
            response = await s3_client.get_object(Bucket=self.bucket_name, Key=key)
            body = response["Body"]
            
            try:
                # Now the context stays open during iteration
                async for chunk in body.iter_chunks(chunk_size):
                    yield chunk
            finally:
                if hasattr(body, "close"):
                    with contextlib.suppress(Exception):
                        await body.close()
                        
        logger.debug(f"File {file_name} streamed successfully from S3: s3://{self.bucket_name}/{key}")
        
    except Exception as e:
        if hasattr(e, "response") and e.response.get("Error", {}).get("Code") == "NoSuchKey":
            await logger.awarning(f"File {file_name} not found in S3 flow {flow_id}")
            msg = f"File not found: {file_name}"
            raise FileNotFoundError(msg) from e
        
        logger.exception(f"Error streaming file {file_name} from S3 in flow {flow_id}")
        raise

This ensures the aioboto3 session and S3 client remain active throughout the entire streaming operation.

🟡 Medium Priority Issues

1. Migration Lock Collision Risk

Location: src/backend/base/langflow/alembic/env.py:49-50

Using only 16 hex characters from SHA-256 increases collision probability:

lock_key = int(hashlib.sha256(namespace.encode()).hexdigest()[:16], 16) % (2**63 - 1)

Recommendation: Use full 8 bytes for better distribution:

hash_bytes = hashlib.sha256(namespace.encode()).digest()[:8]
lock_key = int.from_bytes(hash_bytes, 'big') % (2**63 - 1)

2. Missing Exception Handling in Read-Only Sessions

Location: src/lfx/src/lfx/services/deps.py:59-64 - session_scope_readonly()

No rollback on exceptions, which could leave locks held:

Recommendation:

@asynccontextmanager
async def session_scope_readonly() -> AsyncGenerator[AsyncSession, None]:
    db_service = get_db_service()
    async with db_service._with_session() as session:
        try:
            yield session
        except Exception:
            # Rollback even for read-only to release locks
            if session.is_active:
                try:
                    await session.rollback()
                except InvalidRequestError:
                    pass
            raise

🟢 Low Priority Observations

Temporary File Cleanup in Components

Location: src/lfx/src/lfx/components/langchain_utilities/csv_agent.py, json_agent.py

Components create temp files for S3 with try/finally cleanup, but process crashes will orphan files. Consider using tempfile.TemporaryDirectory or a background cleanup task.

✅ Verified Working Correctly

✅ Session scope commit/flush patterns - properly wrapped in session_scope() dependency
✅ Database transaction management - flush within contexts auto-commits on exit
✅ S3 error handling for standard operations - proper exception mapping
✅ File upload cleanup on DB failure - deletes uploaded file if DB insert fails
✅ No missing await keywords detected
✅ S3 initialization and configuration validation

Summary

The S3 streaming bug is critical and will cause production issues with file downloads from S3. This must be fixed before merging. The session management refactoring looks solid - the commit → flush pattern correctly relies on session_scope() to handle final commits.

The medium-priority issues are improvements but not blockers. Overall, the PR implements a significant architectural change well, aside from the streaming lifecycle bug.

🧠 Learnings used

Learnt from: edwinjosechittilappilly
Repo: langflow-ai/langflow PR: 8504
File: src/backend/base/langflow/initial_setup/starter_projects/Image Sentiment Analysis.json:391-393
Timestamp: 2025-06-12T15:25:01.072Z
Learning: The repository owner prefers CodeRabbit not to review or comment on JSON files because they are autogenerated.

github-actions · 2025-11-25T05:37:56Z

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

github-actions · 2025-11-25T05:52:01Z

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

github-actions · 2025-11-25T05:54:52Z

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

github-actions · 2025-11-25T05:58:04Z

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

jordanrfrazier added 7 commits November 4, 2025 15:41

Beginning cherry-pick of changes

02633e1

more updates

88e68d7

Add local and s3 storage services, modified base storage service, del…

af82131

…egate from langflow to lfx

more change

9200ad2

remove all instances of with session

2a2c228

test s3

61788c4

add tests and fix fe

9a44b59

jordanrfrazier requested review from erichare and ogabrielluiz November 6, 2025 19:59

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 6, 2025

This comment has been minimized.

Sign in to view

[autofix.ci] apply automated fixes

a7102ca

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 6, 2025

[autofix.ci] apply automated fixes (attempt 2/3)

ddd3dba

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 6, 2025

This comment has been minimized.

Sign in to view

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 6, 2025

coderabbitai Bot reviewed Nov 6, 2025

View reviewed changes

fix diamond inheritance and some flush / reset db patterns

cbe4bee

github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Nov 9, 2025

This comment has been minimized.

Sign in to view

Merge branch 'main' into s3-file-store

c6dcb92

Flush after delete in upload_user_file

641d5cc

jordanrfrazier added 2 commits November 24, 2025 21:30

use current path in loop csv test

20bce1c

ruff

3c30fa3

ruff

6b56845

Updates the sqlite pragma test to use sqlite3 directly

00a46b7

ruff

fa2bbb9

ensure pragma test db is in wal mode

96e08f3

Adds back some relevant tests

aa9543d

ruff

3f6115e

jordanrfrazier added 2 commits November 25, 2025 00:49

Add more integration s3 tests

415d0a7

ruff

3aaf4b2

component index

02f0ee1

This was referenced Nov 28, 2025

feat: Add configurable API key validation source (db/env) #10783

Merged

feat: S3 file size and associations to flows #10819

Open

Conversation

jordanrfrazier commented Nov 6, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Pre-merge checks failed

Uh oh!

github-actions Bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Frontend Unit Test Coverage Report

Coverage Summary

Unit Test Results

Uh oh!

This comment has been minimized.

codecov Bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment has been minimized.

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

github-actions Bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jordanrfrazier commented Nov 25, 2025

Uh oh!

coderabbitai Bot commented Nov 25, 2025

🔴 CRITICAL BUG: S3 Streaming Connection Lifecycle Issue

🟡 Medium Priority Issues

1. Migration Lock Collision Risk

2. Missing Exception Handling in Read-Only Sessions

🟢 Low Priority Observations

jordanrfrazier commented Nov 6, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Nov 6, 2025 •

edited

Loading

github-actions Bot commented Nov 6, 2025 •

edited

Loading

codecov Bot commented Nov 6, 2025 •

edited

Loading

github-actions Bot commented Nov 24, 2025 •

edited

Loading

github-actions Bot commented Nov 25, 2025 •

edited

Loading

github-actions Bot commented Nov 25, 2025 •

edited

Loading

github-actions Bot commented Nov 25, 2025 •

edited

Loading

github-actions Bot commented Nov 25, 2025 •

edited

Loading

github-actions Bot commented Nov 25, 2025 •

edited

Loading

github-actions Bot commented Nov 25, 2025 •

edited

Loading

github-actions Bot commented Nov 25, 2025 •

edited

Loading

github-actions Bot commented Nov 25, 2025 •

edited

Loading

github-actions Bot commented Nov 25, 2025 •

edited

Loading

github-actions Bot commented Nov 25, 2025 •

edited

Loading

github-actions Bot commented Nov 25, 2025 •

edited

Loading

github-actions Bot commented Nov 25, 2025 •

edited

Loading