feat: Changing location, name and functions of SaveToFileComponent.#9868
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughAdds a new SaveToFileComponent under data/ with support for Local, AWS S3, and Google Drive destinations and various formats; removes the previous processing/ SaveToFileComponent. The new component dynamically configures UI fields per storage location and routes save operations to location-specific handlers, then uploads the saved file via the existing upload pipeline. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant UI as Component UI
participant C as SaveToFileComponent
participant L as Local FS
participant S3 as AWS S3
participant GD as Google Drive
participant U as Upload Service
User->>UI: Configure inputs (location, format, path/credentials)
UI->>C: save_to_file()
alt Detect input type and default format
C->>C: _get_input_type / _get_default_format
end
opt Adjust path
C->>C: _adjust_file_path_with_format
end
alt Location = Local
C->>L: Write file (DataFrame/Data/Message)
C->>U: _upload_file(final_path)
else Location = AWS S3
C->>C: Serialize to temp file
C->>S3: Upload to bucket/prefix
C->>U: _upload_file(temp or reference)
else Location = Google Drive
C->>GD: Create file or Google Doc/Slides
C->>U: _upload_file(reference or exported content)
end
C-->>UI: Message(File Path / reference)
UI-->>User: Display result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🧪 Early access (Sonnet 4.5): enabledWe are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience. Note:
Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (6)
src/lfx/src/lfx/components/data/save_file.py (6)
388-391: Validate AWS credentials securely before useThe AWS credential validation only checks if the values exist but doesn't verify their format or validity. Invalid credentials will only fail during the boto3 call, potentially after creating temporary files.
Consider adding basic validation for AWS credentials format:
# Validate AWS credentials if not getattr(self, 'aws_access_key_id', None): raise ValueError("AWS Access Key ID is required for S3 storage") if not getattr(self, 'aws_secret_access_key', None): raise ValueError("AWS Secret Key is required for S3 storage") if not getattr(self, 'bucket_name', None): raise ValueError("S3 Bucket Name is required for S3 storage") + +# Basic format validation +if len(self.aws_access_key_id) < 16: + raise ValueError("AWS Access Key ID appears to be invalid") +if len(self.aws_secret_access_key) < 20: + raise ValueError("AWS Secret Key appears to be invalid")
362-365: Path creation without proper error handlingCreating parent directories with
mkdircould fail with permissions errors or disk space issues, but these aren't caught specifically.Add specific error handling for directory creation:
# Prepare file path file_path = Path(self.file_name).expanduser() if not file_path.parent.exists(): - file_path.parent.mkdir(parents=True, exist_ok=True) + try: + file_path.parent.mkdir(parents=True, exist_ok=True) + except PermissionError as e: + raise ValueError(f"Cannot create directory {file_path.parent}: Permission denied") from e + except OSError as e: + raise ValueError(f"Cannot create directory {file_path.parent}: {e}") from e
463-465: Improve JSON parsing error message for service account keyThe error message for invalid JSON could be more helpful by suggesting common issues.
Enhance the error message:
try: credentials_dict = json.loads(self.service_account_key) except json.JSONDecodeError as e: - raise ValueError(f"Invalid JSON in service account key: {str(e)}") + raise ValueError( + f"Invalid JSON in service account key: {str(e)}. " + "Ensure you've pasted the complete JSON content from your service account key file." + )
248-250: Consider using a mapping for file extensionsThe logic for adjusting file paths with extensions could be simplified using a mapping.
def _adjust_file_path_with_format(self, path: Path, fmt: str) -> Path: """Adjust the file path to include the correct extension.""" + FORMAT_EXTENSIONS = { + "excel": "xlsx", + # Add other format mappings if needed + } file_extension = path.suffix.lower().lstrip(".") - if fmt == "excel": - return Path(f"{path}.xlsx").expanduser() if file_extension not in ["xlsx", "xls"] else path - return Path(f"{path}.{fmt}").expanduser() if file_extension != fmt else path + + target_ext = FORMAT_EXTENSIONS.get(fmt, fmt) + if fmt == "excel" and file_extension in ["xlsx", "xls"]: + return path + if file_extension != target_ext: + return Path(f"{path}.{target_ext}").expanduser() + return path
397-401: Import boto3 at module levelDynamic imports inside methods can cause unexpected delays and make dependency management harder.
Move the import to the top of the file and handle ImportError gracefully:
# At the top of the file try: import boto3 from botocore.exceptions import ClientError, NoCredentialsError HAS_BOTO3 = True except ImportError: HAS_BOTO3 = False # In the method if not HAS_BOTO3: msg = "boto3 is not installed. Please install it using `uv pip install boto3`." raise ImportError(msg)
450-459: Same issue with Google API importsThe Google API libraries should also be imported at module level.
Apply the same pattern as suggested for boto3:
# At the top of the file try: from googleapiclient.discovery import build from googleapiclient.http import MediaFileUpload from google.oauth2 import service_account HAS_GOOGLE_API = True except ImportError: HAS_GOOGLE_API = False
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/lfx/src/lfx/components/data/save_file.py(1 hunks)src/lfx/src/lfx/components/processing/save_file.py(0 hunks)
💤 Files with no reviewable changes (1)
- src/lfx/src/lfx/components/processing/save_file.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/lfx/src/lfx/components/data/save_file.py (3)
src/backend/base/langflow/api/v2/files.py (1)
upload_user_file(83-177)src/backend/base/langflow/services/database/models/user/crud.py (1)
get_user_by_id(19-23)src/backend/base/langflow/services/deps.py (3)
get_settings_service(111-124)get_storage_service(88-96)session_scope(151-173)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Update Starter Projects
- GitHub Check: Run Ruff Check and Format
- GitHub Check: Ruff Style Check (3.13)
🔇 Additional comments (3)
src/lfx/src/lfx/components/data/save_file.py (3)
226-233: Good use of exact type checkingThe implementation correctly uses
type()instead ofisinstance()to avoid inheritance issues with Message/Data types. The comment clearly explains the reasoning.
148-193: Clean implementation of dynamic UI configurationThe
update_build_configmethod provides a clean implementation for dynamically showing/hiding fields based on storage location selection. The logic is clear and maintainable.
569-583: Verify async iterator handling in _extract_content_for_uploadripgrep returned no matches for Message async-iterator usage; unable to confirm whether Message.text can be an async iterator—manual verification required. If Message.text can be async, convert _extract_content_for_upload to async and handle async/sync iterables; suggested change:
-def _extract_content_for_upload(self) -> str: +async def _extract_content_for_upload(self) -> str: """Extract content from input for upload to cloud services.""" if self._get_input_type() == "DataFrame": return self.input.to_csv(index=False) elif self._get_input_type() == "Data": if hasattr(self.input, 'data') and self.input.data: if isinstance(self.input.data, dict): import json return json.dumps(self.input.data, indent=2, ensure_ascii=False) return str(self.input.data) return str(self.input) elif self._get_input_type() == "Message": - return str(self.input.text) if self.input.text else str(self.input) + if self.input.text is None: + return str(self.input) + # async iterable + if hasattr(self.input.text, "__aiter__"): + parts = [] + async for item in self.input.text: + parts.append(str(item)) + return " ".join(parts) + # sync iterable (exclude str/bytes) + if hasattr(self.input.text, "__iter__") and not isinstance(self.input.text, (str, bytes)): + return " ".join(str(item) for item in self.input.text) + return str(self.input.text) else: return str(self.input)Also verify and update all callers to await this function if it becomes async (e.g., _save_message).
|



Summary by CodeRabbit
New Features
Refactor