fix: Knowledge Base fixes for 1.6#9657
Conversation
…stion component - Reorganized import statements for better clarity. - Enhanced formatting of lists and function parameters for improved readability. - Removed unused parameters and streamlined the column configuration in the Knowledge Bases tab. - Updated JSON configuration for Knowledge Ingestion to reflect changes in code structure. These changes aim to enhance maintainability and readability of the codebase.
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughFrontend: removed inline rename support for knowledge bases, updated column config to disable editing and most sorting, and changed the createKnowledgeBaseColumns API to accept only onDelete. Backend: minor formatting and label text changes in ingestion.py without logic changes. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (13.04%) is below the target coverage (40.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #9657 +/- ##
=======================================
Coverage 21.63% 21.63%
=======================================
Files 1074 1074
Lines 39650 39649 -1
Branches 5418 5417 -1
=======================================
+ Hits 8578 8580 +2
+ Misses 30928 30925 -3
Partials 144 144
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
…guration in knowledge base - Removed the extraneous flag from the `@clack/prompts` dependency in `package-lock.json`. - Updated the `editable` property in the knowledge base columns configuration to `false`, enhancing the integrity of the data structure.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/backend/base/langflow/components/knowledge_bases/ingestion.py (2)
551-572: Fix unbound embedding_model/api_key when metadata is missing or decryption failsembedding_model and/or api_key can be undefined if embedding_metadata.json is absent or lacks api_key, or if decryption fails and no override is provided. This leads to an UnboundLocalError at Line 571 and brittle error handling.
Apply this diff to initialize, guard, and validate before use:
@@ - # If the API key is not provided, try to read it from the metadata file - if metadata_path.exists(): - settings_service = get_settings_service() - metadata = json.loads(metadata_path.read_text()) - embedding_model = metadata.get("embedding_model") - try: - api_key = decrypt_api_key(metadata["api_key"], settings_service) - except (InvalidToken, TypeError, ValueError) as e: - logger.error(f"Could not decrypt API key. Please provide it manually. Error: {e}") + # Initialize defaults + embedding_model: str | None = None + api_key: str | None = None + + # If the API key is not provided, try to read it from the metadata file + if metadata_path.exists(): + settings_service = get_settings_service() + metadata = json.loads(metadata_path.read_text()) + embedding_model = metadata.get("embedding_model") + enc_key = metadata.get("api_key") + if enc_key: + try: + api_key = decrypt_api_key(enc_key, settings_service) + except (InvalidToken, TypeError, ValueError) as e: + logger.error(f"Could not decrypt API key. Please provide it manually. Error: {e}") @@ - # Check if a custom API key was provided, update metadata if so - if self.api_key: - api_key = self.api_key - self._save_embedding_metadata( - kb_path=kb_path, - embedding_model=embedding_model, - api_key=api_key, - ) + # Check if a custom API key was provided, update metadata if so + if self.api_key: + api_key = self.api_key + self._save_embedding_metadata( + kb_path=kb_path, + embedding_model=embedding_model or "", + api_key=api_key, + ) + + # Validate required params before proceeding + if not embedding_model: + raise ValueError("Embedding model not configured. Create a knowledge base or provide metadata first.") + provider = self._get_embedding_provider(embedding_model) + if provider in {"OpenAI", "Cohere"} and not api_key: + raise ValueError(f"{provider} API key is required for the selected embedding model.") @@ - await self._create_vector_store(df_source, config_list, embedding_model=embedding_model, api_key=api_key) + await self._create_vector_store(df_source, config_list, embedding_model=embedding_model, api_key=api_key)
654-661: Catch asyncio.TimeoutError from wait_for, not builtin TimeoutErrorasyncio.wait_for raises asyncio.TimeoutError. Catching TimeoutError here misses the specific case.
- except TimeoutError as e: + except asyncio.TimeoutError as e: msg = "Embedding validation timed out. Please verify network connectivity and key." raise ValueError(msg) from e
🧹 Nitpick comments (4)
src/backend/base/langflow/components/knowledge_bases/ingestion.py (3)
399-404: Avoid blocking the event loop while adding documentsChroma.add_documents is CPU/network-bound and synchronous. Wrap it in to_thread to keep the async pipeline responsive.
- if documents: - chroma.add_documents(documents) + if documents: + await asyncio.to_thread(chroma.add_documents, documents) self.log(f"Added {len(documents)} documents to vector store '{self.knowledge_base}'")
416-426: Loading entire collection to dedupe does not scalechroma.get() fetches all docs and metadatas; on large KBs this is O(N) memory/time. Prefer targeted existence checks (e.g., per-hash where filters) or batched retrieval of just ids/metadata if supported, or maintain a lightweight local index (e.g., a JSON Lines file of known _id hashes).
444-469: Clarify variables and ensure hash input uses identifiers when availableThe variable name identifier_parts is used for content_cols first, then reused for identifiers, which is confusing. Also make hash input explicit: identifiers if present, otherwise content.
- # Build content text from identifier columns using list comprehension - identifier_parts = [str(row[col]) for col in content_cols if col in row and pd.notna(row[col])] - - # Join all parts into a single string - page_content = " ".join(identifier_parts) + # Build content text from vectorized columns + content_parts = [str(row[col]) for col in content_cols if col in row and pd.notna(row[col])] + page_content = " ".join(content_parts) @@ - # Add identifier columns if they exist - if identifier_cols: - identifier_parts = [str(row[col]) for col in identifier_cols if col in row and pd.notna(row[col])] - page_content = " ".join(identifier_parts) + # Build identifier text (used for stable deduplication) + id_parts = [str(row[col]) for col in identifier_cols if col in row and pd.notna(row[col])] + id_text = " ".join(id_parts) @@ - # Hash the page_content for unique ID - page_content_hash = hashlib.sha256(page_content.encode()).hexdigest() + # Hash identifiers when available, otherwise hash content + hash_input = id_text if id_parts else page_content + page_content_hash = hashlib.sha256(hash_input.encode()).hexdigest()src/frontend/src/pages/MainPage/pages/filesPage/config/knowledgeBaseColumns.tsx (1)
36-45: Align field with displayed value for filtering consistencyThe column displays embedding_model via valueGetter but uses field: "embedding_provider". Text filtering will operate on embedding_provider, not the shown embedding_model. Recommend aligning the field.
- field: "embedding_provider", + field: "embedding_model",
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
src/frontend/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (3)
src/backend/base/langflow/components/knowledge_bases/ingestion.py(5 hunks)src/frontend/src/pages/MainPage/pages/filesPage/components/KnowledgeBasesTab.tsx(2 hunks)src/frontend/src/pages/MainPage/pages/filesPage/config/knowledgeBaseColumns.tsx(8 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
src/backend/base/langflow/components/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
src/backend/base/langflow/components/**/*.py: Add new backend components to the appropriate subdirectory under src/backend/base/langflow/components/
Implement async component methods using async def and await for asynchronous operations
Use asyncio.create_task for background work in async components and ensure proper cleanup on cancellation
Use asyncio.Queue for non-blocking queue operations in async components and handle timeouts appropriately
Files:
src/backend/base/langflow/components/knowledge_bases/ingestion.py
{src/backend/**/*.py,tests/**/*.py,Makefile}
📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)
{src/backend/**/*.py,tests/**/*.py,Makefile}: Run make format_backend to format Python code before linting or committing changes
Run make lint to perform linting checks on backend Python code
Files:
src/backend/base/langflow/components/knowledge_bases/ingestion.py
src/backend/**/components/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)
In your Python component class, set the
iconattribute to a string matching the frontend icon mapping exactly (case-sensitive).
Files:
src/backend/base/langflow/components/knowledge_bases/ingestion.py
src/frontend/src/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.cursor/rules/frontend_development.mdc)
src/frontend/src/**/*.{ts,tsx,js,jsx}: All frontend TypeScript and JavaScript code should be located under src/frontend/src/ and organized into components, pages, icons, stores, types, utils, hooks, services, and assets directories as per the specified directory layout.
Use React 18 with TypeScript for all UI components in the frontend.
Format all TypeScript and JavaScript code using the make format_frontend command.
Lint all TypeScript and JavaScript code using the make lint command.
Files:
src/frontend/src/pages/MainPage/pages/filesPage/components/KnowledgeBasesTab.tsxsrc/frontend/src/pages/MainPage/pages/filesPage/config/knowledgeBaseColumns.tsx
🧬 Code graph analysis (3)
src/backend/base/langflow/components/knowledge_bases/ingestion.py (1)
src/backend/base/langflow/services/deps.py (3)
get_settings_service(111-124)get_variable_service(99-108)session_scope(151-173)
src/frontend/src/pages/MainPage/pages/filesPage/components/KnowledgeBasesTab.tsx (1)
src/frontend/src/pages/MainPage/pages/filesPage/config/knowledgeBaseColumns.tsx (1)
createKnowledgeBaseColumns(10-122)
src/frontend/src/pages/MainPage/pages/filesPage/config/knowledgeBaseColumns.tsx (1)
src/frontend/src/pages/MainPage/pages/filesPage/utils/knowledgeBaseUtils.ts (1)
formatAverageChunkSize(11-13)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: Lint Backend / Run Mypy (3.10)
- GitHub Check: Run Frontend Unit Tests / Frontend Jest Unit Tests
- GitHub Check: Lint Backend / Run Mypy (3.12)
- GitHub Check: Run Frontend Tests / Determine Test Suites and Shard Distribution
- GitHub Check: Lint Backend / Run Mypy (3.13)
- GitHub Check: Lint Backend / Run Mypy (3.11)
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
- GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
- GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
- GitHub Check: Test Starter Templates
- GitHub Check: Ruff Style Check (3.13)
- GitHub Check: Run Ruff Check and Format
- GitHub Check: Update Starter Projects
🔇 Additional comments (5)
src/frontend/src/pages/MainPage/pages/filesPage/config/knowledgeBaseColumns.tsx (3)
18-25: Disable sorting and editing on Name: LGTMConsistent with removing in-grid rename. Checkbox selection is preserved.
47-60: Sorting disabled on metrics columns: LGTMDisabling sort avoids heavy client-side work on large datasets.
10-13: Signature update validated—no onRename references remain
Only usage in KnowledgeBasesTab.tsx passes the delete handler; no call sites still pass onRename.src/frontend/src/pages/MainPage/pages/filesPage/components/KnowledgeBasesTab.tsx (2)
1-1: Type-only ag-grid imports: LGTMKeeps runtime bundle clean.
156-156: Updated column factory usage: LGTMMatches new signature createKnowledgeBaseColumns(handleDelete).
…modal handling - Refactored FlowToolbar to replace openCodeModal with openApiModal for better clarity in modal management. - Updated FlowToolbarOptions to accept openApiModal and setOpenApiModal props, enhancing the component's flexibility. - Adjusted PublishDropdown to utilize the new API modal state, ensuring consistent behavior across the toolbar. - Cleaned up import statements for better organization and readability.
…guration - Reorganized import statements in KnowledgeBasesTab and knowledgeBaseColumns for improved clarity and consistency. - Removed unused parameters from the createKnowledgeBaseColumns function, simplifying its signature. - Adjusted column flex properties for better layout in the knowledge base table. - Enhanced overall readability and maintainability of the codebase.
|
Build successful! ✅ |
e427027 to
7a3ff3a
Compare
|
|
Don't mean to butt in but since this touches KBs, shouldn't this be merged into
|



This pull request introduces several frontend and backend changes focused on improving code clarity, component structure, and user experience in knowledge base management and flow toolbar interactions. The most significant updates include refactoring React component props for better state management, removing knowledge base renaming functionality from the UI, and disabling column sorting/editing in the knowledge base table.
Frontend: Flow Toolbar and Dropdown Refactor
FlowToolbarOptionsandPublishDropdowncomponents to receive modal state and setters as props, improving state management and modularity. This allows parent components to control modal visibility and simplifies component logic. [1] [2] [3] [4]FlowToolbarfor clarity and consistency, and replaced the code modal state with API modal state.Frontend: Knowledge Base Table Updates
Backend: Minor Code Style Improvements
ingestion.pyfor improved readability and consistency. [1] [2] [3] [4]Other Minor Updates
package-lock.jsonfor cleaner dependency management.These changes collectively streamline the UI, clarify component responsibilities, and ensure the knowledge base table is read-only, enhancing both developer experience and end-user clarity.