Skip to content

fix: Knowledge Base fixes for 1.6#9657

Closed
deon-sanchez wants to merge 9 commits into
mainfrom
lfoss-2047
Closed

fix: Knowledge Base fixes for 1.6#9657
deon-sanchez wants to merge 9 commits into
mainfrom
lfoss-2047

Conversation

@deon-sanchez
Copy link
Copy Markdown
Collaborator

@deon-sanchez deon-sanchez commented Sep 2, 2025

This pull request introduces several frontend and backend changes focused on improving code clarity, component structure, and user experience in knowledge base management and flow toolbar interactions. The most significant updates include refactoring React component props for better state management, removing knowledge base renaming functionality from the UI, and disabling column sorting/editing in the knowledge base table.

Frontend: Flow Toolbar and Dropdown Refactor

  • Refactored FlowToolbarOptions and PublishDropdown components to receive modal state and setters as props, improving state management and modularity. This allows parent components to control modal visibility and simplifies component logic. [1] [2] [3] [4]
  • Updated keyboard shortcut handlers in FlowToolbar for clarity and consistency, and replaced the code modal state with API modal state.

Frontend: Knowledge Base Table Updates

  • Removed knowledge base renaming functionality from the UI: the rename handler and editable column logic were deleted, so knowledge base names can no longer be edited inline. [1] [2] [3]
  • Disabled sorting and editing for all columns in the knowledge base table for a more consistent and read-only display. [1] [2] [3] [4] [5] [6] [7]

Backend: Minor Code Style Improvements

  • Reformatted import statements and lists in ingestion.py for improved readability and consistency. [1] [2] [3] [4]
  • Changed the display name for the embedding model dropdown to "Choose Embedding" for clearer UI labeling.

Other Minor Updates

  • Removed an extraneous property from the frontend package-lock.json for cleaner dependency management.
  • Cleaned up unused imports in several frontend files for better maintainability. [1] [2] [3]

These changes collectively streamline the UI, clarify component responsibilities, and ensure the knowledge base table is read-only, enhancing both developer experience and end-user clarity.

…stion component

- Reorganized import statements for better clarity.
- Enhanced formatting of lists and function parameters for improved readability.
- Removed unused parameters and streamlined the column configuration in the Knowledge Bases tab.
- Updated JSON configuration for Knowledge Ingestion to reflect changes in code structure.

These changes aim to enhance maintainability and readability of the codebase.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Sep 2, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Frontend: removed inline rename support for knowledge bases, updated column config to disable editing and most sorting, and changed the createKnowledgeBaseColumns API to accept only onDelete. Backend: minor formatting and label text changes in ingestion.py without logic changes.

Changes

Cohort / File(s) Summary
Backend formatting & labels
src/backend/base/langflow/components/knowledge_bases/ingestion.py
Reformatted imports/lists, adjusted UI label to "Choose Embedding", multiline field_order and function params; no logic/control-flow changes.
Frontend page wiring (KB tab)
src/frontend/src/pages/MainPage/pages/filesPage/components/KnowledgeBasesTab.tsx
Removed in-grid rename flow and related types/handlers; updated invocation to createKnowledgeBaseColumns(handleDelete). Delete flow unchanged.
Frontend column config (KB table)
src/frontend/src/pages/MainPage/pages/filesPage/config/knowledgeBaseColumns.tsx
API change: createKnowledgeBaseColumns(onDelete) only; removed onRename and editability. Made Name non-editable; disabled sorting on multiple columns including actions; kept delete action using onDelete.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

bug, refactor, lgtm

Suggested reviewers

  • edwinjosechittilappilly
  • carlosrcoelho
  • mfortman11
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch lfoss-2047

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Sep 2, 2025

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 6%
6.47% (1680/25937) 3.51% (690/19619) 3.47% (194/5584)

Unit Test Results

Tests Skipped Failures Errors Time
682 0 💤 0 ❌ 0 🔥 11.676s ⏱️

@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 2, 2025

Codecov Report

❌ Patch coverage is 13.04348% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 21.63%. Comparing base (aaaaed1) to head (ce08d40).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
...src/components/core/flowToolbarComponent/index.tsx 0.00% 8 Missing ⚠️
...olbarComponent/components/flow-toolbar-options.tsx 0.00% 6 Missing ⚠️
...lowToolbarComponent/components/deploy-dropdown.tsx 0.00% 4 Missing ⚠️
...e/pages/filesPage/components/KnowledgeBasesTab.tsx 0.00% 1 Missing ⚠️
...ge/pages/filesPage/config/knowledgeBaseColumns.tsx 0.00% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (13.04%) is below the target coverage (40.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (5.81%) is below the target coverage (10.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #9657   +/-   ##
=======================================
  Coverage   21.63%   21.63%           
=======================================
  Files        1074     1074           
  Lines       39650    39649    -1     
  Branches     5418     5417    -1     
=======================================
+ Hits         8578     8580    +2     
+ Misses      30928    30925    -3     
  Partials      144      144           
Flag Coverage Δ
backend 46.91% <100.00%> (+0.01%) ⬆️
frontend 5.81% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...e/langflow/components/knowledge_bases/ingestion.py 76.12% <100.00%> (ø)
...e/pages/filesPage/components/KnowledgeBasesTab.tsx 0.00% <0.00%> (ø)
...ge/pages/filesPage/config/knowledgeBaseColumns.tsx 0.00% <0.00%> (ø)
...lowToolbarComponent/components/deploy-dropdown.tsx 0.00% <0.00%> (ø)
...olbarComponent/components/flow-toolbar-options.tsx 0.00% <0.00%> (ø)
...src/components/core/flowToolbarComponent/index.tsx 0.00% <0.00%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

autofix-ci Bot and others added 4 commits September 2, 2025 18:46
…guration in knowledge base

- Removed the extraneous flag from the `@clack/prompts` dependency in `package-lock.json`.
- Updated the `editable` property in the knowledge base columns configuration to `false`, enhancing the integrity of the data structure.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/backend/base/langflow/components/knowledge_bases/ingestion.py (2)

551-572: Fix unbound embedding_model/api_key when metadata is missing or decryption fails

embedding_model and/or api_key can be undefined if embedding_metadata.json is absent or lacks api_key, or if decryption fails and no override is provided. This leads to an UnboundLocalError at Line 571 and brittle error handling.

Apply this diff to initialize, guard, and validate before use:

@@
-            # If the API key is not provided, try to read it from the metadata file
-            if metadata_path.exists():
-                settings_service = get_settings_service()
-                metadata = json.loads(metadata_path.read_text())
-                embedding_model = metadata.get("embedding_model")
-                try:
-                    api_key = decrypt_api_key(metadata["api_key"], settings_service)
-                except (InvalidToken, TypeError, ValueError) as e:
-                    logger.error(f"Could not decrypt API key. Please provide it manually. Error: {e}")
+            # Initialize defaults
+            embedding_model: str | None = None
+            api_key: str | None = None
+
+            # If the API key is not provided, try to read it from the metadata file
+            if metadata_path.exists():
+                settings_service = get_settings_service()
+                metadata = json.loads(metadata_path.read_text())
+                embedding_model = metadata.get("embedding_model")
+                enc_key = metadata.get("api_key")
+                if enc_key:
+                    try:
+                        api_key = decrypt_api_key(enc_key, settings_service)
+                    except (InvalidToken, TypeError, ValueError) as e:
+                        logger.error(f"Could not decrypt API key. Please provide it manually. Error: {e}")
@@
-            # Check if a custom API key was provided, update metadata if so
-            if self.api_key:
-                api_key = self.api_key
-                self._save_embedding_metadata(
-                    kb_path=kb_path,
-                    embedding_model=embedding_model,
-                    api_key=api_key,
-                )
+            # Check if a custom API key was provided, update metadata if so
+            if self.api_key:
+                api_key = self.api_key
+                self._save_embedding_metadata(
+                    kb_path=kb_path,
+                    embedding_model=embedding_model or "",
+                    api_key=api_key,
+                )
+
+            # Validate required params before proceeding
+            if not embedding_model:
+                raise ValueError("Embedding model not configured. Create a knowledge base or provide metadata first.")
+            provider = self._get_embedding_provider(embedding_model)
+            if provider in {"OpenAI", "Cohere"} and not api_key:
+                raise ValueError(f"{provider} API key is required for the selected embedding model.")
@@
-            await self._create_vector_store(df_source, config_list, embedding_model=embedding_model, api_key=api_key)
+            await self._create_vector_store(df_source, config_list, embedding_model=embedding_model, api_key=api_key)

654-661: Catch asyncio.TimeoutError from wait_for, not builtin TimeoutError

asyncio.wait_for raises asyncio.TimeoutError. Catching TimeoutError here misses the specific case.

-                except TimeoutError as e:
+                except asyncio.TimeoutError as e:
                     msg = "Embedding validation timed out. Please verify network connectivity and key."
                     raise ValueError(msg) from e
🧹 Nitpick comments (4)
src/backend/base/langflow/components/knowledge_bases/ingestion.py (3)

399-404: Avoid blocking the event loop while adding documents

Chroma.add_documents is CPU/network-bound and synchronous. Wrap it in to_thread to keep the async pipeline responsive.

-            if documents:
-                chroma.add_documents(documents)
+            if documents:
+                await asyncio.to_thread(chroma.add_documents, documents)
                 self.log(f"Added {len(documents)} documents to vector store '{self.knowledge_base}'")

416-426: Loading entire collection to dedupe does not scale

chroma.get() fetches all docs and metadatas; on large KBs this is O(N) memory/time. Prefer targeted existence checks (e.g., per-hash where filters) or batched retrieval of just ids/metadata if supported, or maintain a lightweight local index (e.g., a JSON Lines file of known _id hashes).


444-469: Clarify variables and ensure hash input uses identifiers when available

The variable name identifier_parts is used for content_cols first, then reused for identifiers, which is confusing. Also make hash input explicit: identifiers if present, otherwise content.

-            # Build content text from identifier columns using list comprehension
-            identifier_parts = [str(row[col]) for col in content_cols if col in row and pd.notna(row[col])]
-
-            # Join all parts into a single string
-            page_content = " ".join(identifier_parts)
+            # Build content text from vectorized columns
+            content_parts = [str(row[col]) for col in content_cols if col in row and pd.notna(row[col])]
+            page_content = " ".join(content_parts)
@@
-            # Add identifier columns if they exist
-            if identifier_cols:
-                identifier_parts = [str(row[col]) for col in identifier_cols if col in row and pd.notna(row[col])]
-                page_content = " ".join(identifier_parts)
+            # Build identifier text (used for stable deduplication)
+            id_parts = [str(row[col]) for col in identifier_cols if col in row and pd.notna(row[col])]
+            id_text = " ".join(id_parts)
@@
-            # Hash the page_content for unique ID
-            page_content_hash = hashlib.sha256(page_content.encode()).hexdigest()
+            # Hash identifiers when available, otherwise hash content
+            hash_input = id_text if id_parts else page_content
+            page_content_hash = hashlib.sha256(hash_input.encode()).hexdigest()
src/frontend/src/pages/MainPage/pages/filesPage/config/knowledgeBaseColumns.tsx (1)

36-45: Align field with displayed value for filtering consistency

The column displays embedding_model via valueGetter but uses field: "embedding_provider". Text filtering will operate on embedding_provider, not the shown embedding_model. Recommend aligning the field.

-      field: "embedding_provider",
+      field: "embedding_model",
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between aaaaed1 and a4fa769.

⛔ Files ignored due to path filters (1)
  • src/frontend/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (3)
  • src/backend/base/langflow/components/knowledge_bases/ingestion.py (5 hunks)
  • src/frontend/src/pages/MainPage/pages/filesPage/components/KnowledgeBasesTab.tsx (2 hunks)
  • src/frontend/src/pages/MainPage/pages/filesPage/config/knowledgeBaseColumns.tsx (8 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
src/backend/base/langflow/components/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

src/backend/base/langflow/components/**/*.py: Add new backend components to the appropriate subdirectory under src/backend/base/langflow/components/
Implement async component methods using async def and await for asynchronous operations
Use asyncio.create_task for background work in async components and ensure proper cleanup on cancellation
Use asyncio.Queue for non-blocking queue operations in async components and handle timeouts appropriately

Files:

  • src/backend/base/langflow/components/knowledge_bases/ingestion.py
{src/backend/**/*.py,tests/**/*.py,Makefile}

📄 CodeRabbit inference engine (.cursor/rules/backend_development.mdc)

{src/backend/**/*.py,tests/**/*.py,Makefile}: Run make format_backend to format Python code before linting or committing changes
Run make lint to perform linting checks on backend Python code

Files:

  • src/backend/base/langflow/components/knowledge_bases/ingestion.py
src/backend/**/components/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/icons.mdc)

In your Python component class, set the icon attribute to a string matching the frontend icon mapping exactly (case-sensitive).

Files:

  • src/backend/base/langflow/components/knowledge_bases/ingestion.py
src/frontend/src/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/frontend_development.mdc)

src/frontend/src/**/*.{ts,tsx,js,jsx}: All frontend TypeScript and JavaScript code should be located under src/frontend/src/ and organized into components, pages, icons, stores, types, utils, hooks, services, and assets directories as per the specified directory layout.
Use React 18 with TypeScript for all UI components in the frontend.
Format all TypeScript and JavaScript code using the make format_frontend command.
Lint all TypeScript and JavaScript code using the make lint command.

Files:

  • src/frontend/src/pages/MainPage/pages/filesPage/components/KnowledgeBasesTab.tsx
  • src/frontend/src/pages/MainPage/pages/filesPage/config/knowledgeBaseColumns.tsx
🧬 Code graph analysis (3)
src/backend/base/langflow/components/knowledge_bases/ingestion.py (1)
src/backend/base/langflow/services/deps.py (3)
  • get_settings_service (111-124)
  • get_variable_service (99-108)
  • session_scope (151-173)
src/frontend/src/pages/MainPage/pages/filesPage/components/KnowledgeBasesTab.tsx (1)
src/frontend/src/pages/MainPage/pages/filesPage/config/knowledgeBaseColumns.tsx (1)
  • createKnowledgeBaseColumns (10-122)
src/frontend/src/pages/MainPage/pages/filesPage/config/knowledgeBaseColumns.tsx (1)
src/frontend/src/pages/MainPage/pages/filesPage/utils/knowledgeBaseUtils.ts (1)
  • formatAverageChunkSize (11-13)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: Lint Backend / Run Mypy (3.10)
  • GitHub Check: Run Frontend Unit Tests / Frontend Jest Unit Tests
  • GitHub Check: Lint Backend / Run Mypy (3.12)
  • GitHub Check: Run Frontend Tests / Determine Test Suites and Shard Distribution
  • GitHub Check: Lint Backend / Run Mypy (3.13)
  • GitHub Check: Lint Backend / Run Mypy (3.11)
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 4
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 5
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 3
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 1
  • GitHub Check: Run Backend Tests / Unit Tests - Python 3.10 - Group 2
  • GitHub Check: Run Backend Tests / Integration Tests - Python 3.10
  • GitHub Check: Test Starter Templates
  • GitHub Check: Ruff Style Check (3.13)
  • GitHub Check: Run Ruff Check and Format
  • GitHub Check: Update Starter Projects
🔇 Additional comments (5)
src/frontend/src/pages/MainPage/pages/filesPage/config/knowledgeBaseColumns.tsx (3)

18-25: Disable sorting and editing on Name: LGTM

Consistent with removing in-grid rename. Checkbox selection is preserved.


47-60: Sorting disabled on metrics columns: LGTM

Disabling sort avoids heavy client-side work on large datasets.


10-13: Signature update validated—no onRename references remain
Only usage in KnowledgeBasesTab.tsx passes the delete handler; no call sites still pass onRename.

src/frontend/src/pages/MainPage/pages/filesPage/components/KnowledgeBasesTab.tsx (2)

1-1: Type-only ag-grid imports: LGTM

Keeps runtime bundle clean.


156-156: Updated column factory usage: LGTM

Matches new signature createKnowledgeBaseColumns(handleDelete).

…modal handling

- Refactored FlowToolbar to replace openCodeModal with openApiModal for better clarity in modal management.
- Updated FlowToolbarOptions to accept openApiModal and setOpenApiModal props, enhancing the component's flexibility.
- Adjusted PublishDropdown to utilize the new API modal state, ensuring consistent behavior across the toolbar.
- Cleaned up import statements for better organization and readability.
@ogabrielluiz ogabrielluiz changed the title bug: Knowledge Base fixes for 1.6 fix: Knowledge Base fixes for 1.6 Sep 2, 2025
@github-actions github-actions Bot added the bug Something isn't working label Sep 2, 2025
@deon-sanchez deon-sanchez self-assigned this Sep 2, 2025
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Sep 3, 2025
Comment thread src/backend/base/langflow/components/knowledge_bases/ingestion.py
@github-actions github-actions Bot added the lgtm This PR has been approved by a maintainer label Sep 3, 2025
…guration

- Reorganized import statements in KnowledgeBasesTab and knowledgeBaseColumns for improved clarity and consistency.
- Removed unused parameters from the createKnowledgeBaseColumns function, simplifying its signature.
- Adjusted column flex properties for better layout in the knowledge base table.
- Enhanced overall readability and maintainability of the codebase.
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Sep 3, 2025
@deon-sanchez deon-sanchez changed the base branch from main to release-1.6.0 September 3, 2025 14:46
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Sep 3, 2025
@deon-sanchez deon-sanchez enabled auto-merge (squash) September 3, 2025 14:57
@github-actions github-actions Bot removed the bug Something isn't working label Sep 3, 2025
@github-actions github-actions Bot added the bug Something isn't working label Sep 3, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Sep 3, 2025

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

@deon-sanchez deon-sanchez changed the base branch from release-1.6.0 to main September 3, 2025 15:28
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Sep 3, 2025
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Sep 3, 2025
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Sep 3, 2025
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Sep 3, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
6.5% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@deon-sanchez deon-sanchez added the DO NOT MERGE Don't Merge this PR label Sep 3, 2025
@TejasQ
Copy link
Copy Markdown
Contributor

TejasQ commented Sep 4, 2025

Don't mean to butt in but since this touches KBs, shouldn't this be merged into release-1.6.0 instead of main? @carlosrcoelho

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working DO NOT MERGE Don't Merge this PR lgtm This PR has been approved by a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants