Skip to content

feat: Changing location, name and functions of SaveToFileComponent.#9868

Merged
erichare merged 37 commits into
langflow-ai:mainfrom
Empreiteiro:write-file
Sep 30, 2025
Merged

feat: Changing location, name and functions of SaveToFileComponent.#9868
erichare merged 37 commits into
langflow-ai:mainfrom
Empreiteiro:write-file

Conversation

@Empreiteiro
Copy link
Copy Markdown
Collaborator

@Empreiteiro Empreiteiro commented Sep 15, 2025

Summary by CodeRabbit

  • New Features

    • Added a Save to File component with a UI to store outputs to Local, AWS S3, or Google Drive.
    • Supports saving DataFrame/Data/Message in multiple formats (CSV, Excel, JSON, Markdown, Text) and auto-adjusts file extensions.
    • Dynamically shows required fields per storage option and can create Google Docs/Slides.
    • Returns file paths or shareable links after upload.
  • Refactor

    • Replaced the previous save-to-file component and relocated it under data components for a more streamlined workflow and validation.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Sep 15, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds a new SaveToFileComponent under data/ with support for Local, AWS S3, and Google Drive destinations and various formats; removes the previous processing/ SaveToFileComponent. The new component dynamically configures UI fields per storage location and routes save operations to location-specific handlers, then uploads the saved file via the existing upload pipeline.

Changes

Cohort / File(s) Summary
New data SaveToFile component
src/lfx/src/lfx/components/data/save_file.py
Adds SaveToFileComponent with UI inputs for storage location (Local/AWS/Google Drive), format selection, credentials/paths, and output Message. Implements save_to_file control flow, type/format inference, local save logic, S3 upload via temporary file, Google Drive and Google Apps creation, content extraction, path/extension handling, and post-save upload.
Removed legacy processing SaveToFile
src/lfx/src/lfx/components/processing/save_file.py
Removes prior SaveToFileComponent that handled only local saves with format options and upload; functionality superseded by the new data component supporting multiple storage backends.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant UI as Component UI
  participant C as SaveToFileComponent
  participant L as Local FS
  participant S3 as AWS S3
  participant GD as Google Drive
  participant U as Upload Service

  User->>UI: Configure inputs (location, format, path/credentials)
  UI->>C: save_to_file()
  alt Detect input type and default format
    C->>C: _get_input_type / _get_default_format
  end
  opt Adjust path
    C->>C: _adjust_file_path_with_format
  end
  alt Location = Local
    C->>L: Write file (DataFrame/Data/Message)
    C->>U: _upload_file(final_path)
  else Location = AWS S3
    C->>C: Serialize to temp file
    C->>S3: Upload to bucket/prefix
    C->>U: _upload_file(temp or reference)
  else Location = Google Drive
    C->>GD: Create file or Google Doc/Slides
    C->>U: _upload_file(reference or exported content)
  end
  C-->>UI: Message(File Path / reference)
  UI-->>User: Display result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title describes relocating and changing SaveToFileComponent, which matches the changeset that removes the component from components/processing and adds an expanded implementation under components/data; therefore it is related to the main change. The title's mention of a "name" change appears misleading because the class name remains SaveToFileComponent in the diff. Overall the title is concise and conveys the primary intent but contains one inaccurate detail.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the enhancement New feature or request label Sep 15, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 15, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 15, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (6)
src/lfx/src/lfx/components/data/save_file.py (6)

388-391: Validate AWS credentials securely before use

The AWS credential validation only checks if the values exist but doesn't verify their format or validity. Invalid credentials will only fail during the boto3 call, potentially after creating temporary files.

Consider adding basic validation for AWS credentials format:

 # Validate AWS credentials
 if not getattr(self, 'aws_access_key_id', None):
     raise ValueError("AWS Access Key ID is required for S3 storage")
 if not getattr(self, 'aws_secret_access_key', None):
     raise ValueError("AWS Secret Key is required for S3 storage")
 if not getattr(self, 'bucket_name', None):
     raise ValueError("S3 Bucket Name is required for S3 storage")
+
+# Basic format validation
+if len(self.aws_access_key_id) < 16:
+    raise ValueError("AWS Access Key ID appears to be invalid")
+if len(self.aws_secret_access_key) < 20:
+    raise ValueError("AWS Secret Key appears to be invalid")

362-365: Path creation without proper error handling

Creating parent directories with mkdir could fail with permissions errors or disk space issues, but these aren't caught specifically.

Add specific error handling for directory creation:

 # Prepare file path
 file_path = Path(self.file_name).expanduser()
 if not file_path.parent.exists():
-    file_path.parent.mkdir(parents=True, exist_ok=True)
+    try:
+        file_path.parent.mkdir(parents=True, exist_ok=True)
+    except PermissionError as e:
+        raise ValueError(f"Cannot create directory {file_path.parent}: Permission denied") from e
+    except OSError as e:
+        raise ValueError(f"Cannot create directory {file_path.parent}: {e}") from e

463-465: Improve JSON parsing error message for service account key

The error message for invalid JSON could be more helpful by suggesting common issues.

Enhance the error message:

 try:
     credentials_dict = json.loads(self.service_account_key)
 except json.JSONDecodeError as e:
-    raise ValueError(f"Invalid JSON in service account key: {str(e)}")
+    raise ValueError(
+        f"Invalid JSON in service account key: {str(e)}. "
+        "Ensure you've pasted the complete JSON content from your service account key file."
+    )

248-250: Consider using a mapping for file extensions

The logic for adjusting file paths with extensions could be simplified using a mapping.

 def _adjust_file_path_with_format(self, path: Path, fmt: str) -> Path:
     """Adjust the file path to include the correct extension."""
+    FORMAT_EXTENSIONS = {
+        "excel": "xlsx",
+        # Add other format mappings if needed
+    }
     file_extension = path.suffix.lower().lstrip(".")
-    if fmt == "excel":
-        return Path(f"{path}.xlsx").expanduser() if file_extension not in ["xlsx", "xls"] else path
-    return Path(f"{path}.{fmt}").expanduser() if file_extension != fmt else path
+    
+    target_ext = FORMAT_EXTENSIONS.get(fmt, fmt)
+    if fmt == "excel" and file_extension in ["xlsx", "xls"]:
+        return path
+    if file_extension != target_ext:
+        return Path(f"{path}.{target_ext}").expanduser()
+    return path

397-401: Import boto3 at module level

Dynamic imports inside methods can cause unexpected delays and make dependency management harder.

Move the import to the top of the file and handle ImportError gracefully:

# At the top of the file
try:
    import boto3
    from botocore.exceptions import ClientError, NoCredentialsError
    HAS_BOTO3 = True
except ImportError:
    HAS_BOTO3 = False

# In the method
if not HAS_BOTO3:
    msg = "boto3 is not installed. Please install it using `uv pip install boto3`."
    raise ImportError(msg)

450-459: Same issue with Google API imports

The Google API libraries should also be imported at module level.

Apply the same pattern as suggested for boto3:

# At the top of the file
try:
    from googleapiclient.discovery import build
    from googleapiclient.http import MediaFileUpload
    from google.oauth2 import service_account
    HAS_GOOGLE_API = True
except ImportError:
    HAS_GOOGLE_API = False
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 974cbe0 and ae2c2ab.

📒 Files selected for processing (2)
  • src/lfx/src/lfx/components/data/save_file.py (1 hunks)
  • src/lfx/src/lfx/components/processing/save_file.py (0 hunks)
💤 Files with no reviewable changes (1)
  • src/lfx/src/lfx/components/processing/save_file.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/lfx/src/lfx/components/data/save_file.py (3)
src/backend/base/langflow/api/v2/files.py (1)
  • upload_user_file (83-177)
src/backend/base/langflow/services/database/models/user/crud.py (1)
  • get_user_by_id (19-23)
src/backend/base/langflow/services/deps.py (3)
  • get_settings_service (111-124)
  • get_storage_service (88-96)
  • session_scope (151-173)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Update Starter Projects
  • GitHub Check: Run Ruff Check and Format
  • GitHub Check: Ruff Style Check (3.13)
🔇 Additional comments (3)
src/lfx/src/lfx/components/data/save_file.py (3)

226-233: Good use of exact type checking

The implementation correctly uses type() instead of isinstance() to avoid inheritance issues with Message/Data types. The comment clearly explains the reasoning.


148-193: Clean implementation of dynamic UI configuration

The update_build_config method provides a clean implementation for dynamically showing/hiding fields based on storage location selection. The logic is clear and maintainable.


569-583: Verify async iterator handling in _extract_content_for_upload

ripgrep returned no matches for Message async-iterator usage; unable to confirm whether Message.text can be an async iterator—manual verification required. If Message.text can be async, convert _extract_content_for_upload to async and handle async/sync iterables; suggested change:

-def _extract_content_for_upload(self) -> str:
+async def _extract_content_for_upload(self) -> str:
     """Extract content from input for upload to cloud services."""
     if self._get_input_type() == "DataFrame":
         return self.input.to_csv(index=False)
     elif self._get_input_type() == "Data":
         if hasattr(self.input, 'data') and self.input.data:
             if isinstance(self.input.data, dict):
                 import json
                 return json.dumps(self.input.data, indent=2, ensure_ascii=False)
             return str(self.input.data)
         return str(self.input)
     elif self._get_input_type() == "Message":
-        return str(self.input.text) if self.input.text else str(self.input)
+        if self.input.text is None:
+            return str(self.input)
+        # async iterable
+        if hasattr(self.input.text, "__aiter__"):
+            parts = []
+            async for item in self.input.text:
+                parts.append(str(item))
+            return " ".join(parts)
+        # sync iterable (exclude str/bytes)
+        if hasattr(self.input.text, "__iter__") and not isinstance(self.input.text, (str, bytes)):
+            return " ".join(str(item) for item in self.input.text)
+        return str(self.input.text)
     else:
         return str(self.input)

Also verify and update all callers to await this function if it becomes async (e.g., _save_message).

Comment thread src/lfx/src/lfx/components/data/save_file.py Outdated
Comment thread src/lfx/src/lfx/components/data/save_file.py
Comment thread src/lfx/src/lfx/components/data/save_file.py
Comment thread src/lfx/src/lfx/components/data/save_file.py
Comment thread src/lfx/src/lfx/components/data/save_file.py Outdated
Comment thread src/lfx/src/lfx/components/data/save_file.py
Comment thread src/lfx/src/lfx/components/data/save_file.py Outdated
Comment thread src/lfx/src/lfx/components/data/save_file.py
Comment thread src/lfx/src/lfx/components/data/save_file.py Outdated
Comment thread src/lfx/src/lfx/components/data/save_file.py
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 15, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 15, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 22, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 22, 2025
@github-actions github-actions Bot removed the enhancement New feature or request label Sep 22, 2025
@github-actions github-actions Bot added the enhancement New feature or request label Sep 30, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 30, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 30, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 30, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 30, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 30, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 30, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 30, 2025
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Sep 30, 2025
@sonarqubecloud
Copy link
Copy Markdown

@erichare erichare added this pull request to the merge queue Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request needs-docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants