Skip to content

fix: Import and Statistics fixes for Knowledge Bases#12446

Merged
carlosrcoelho merged 18 commits into
release-1.9.0from
fix-kb-chroma-attribute
Apr 2, 2026
Merged

fix: Import and Statistics fixes for Knowledge Bases#12446
carlosrcoelho merged 18 commits into
release-1.9.0from
fix-kb-chroma-attribute

Conversation

@erichare
Copy link
Copy Markdown
Collaborator

@erichare erichare commented Apr 1, 2026

Summary

This PR fixes two correctness issues related to knowledge base ingestion and custom code execution.

For knowledge bases, component-based ingestion could leave embedding_metadata.json with stale metrics, especially when the file reported chunks = 0 even though Chroma data had already been written. This change makes new ingestions persist accurate chunk/text metrics and also allows existing stale knowledge bases to self-heal when metadata is read.

It also fixes dotted import handling in lfx.custom.validate so custom code follows normal Python import semantics for statements like import urllib.request and import urllib.request as request.

What changed

  • Updated KnowledgeIngestionComponent to recompute and persist:
    • chunks
    • words
    • characters
    • avg_chunk_size
    • size
      after documents are written to Chroma.
  • Updated KBAnalysisHelper.get_metadata(..., fast=True) to detect stale zero-chunk metadata when Chroma artifacts exist on disk, recount metrics from Chroma, and persist the corrected metadata.
  • Fixed dotted import binding in lfx.custom.validate so non-aliased dotted imports bind the top-level package like normal Python imports, while aliased imports continue to work as expected.
  • Regenerated component_index.json to keep the embedded component asset in sync with the ingestion changes.

Why

This closes the gap between what is actually stored in Chroma and what Langflow reports in knowledge base metadata. Without this fix, the KB modal/API could show misleading zeroed metrics after successful ingestion or partial failures.

It also makes custom code execution more predictable by aligning dotted import behavior with standard Python semantics.

Test coverage

  • Added unit coverage for persisted KB metric updates after ingestion.
  • Added backend API coverage for recounting stale zero-chunk metadata in fast metadata reads.
  • Added unit coverage for dotted imports, aliased dotted imports, and global scope preparation in custom validation.

@erichare erichare requested a review from ogabrielluiz April 1, 2026 19:20
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 1, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f062e7a5-161d-464c-b691-95ab8cf0c516

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-kb-chroma-attribute

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the bug Something isn't working label Apr 1, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 1, 2026

Codecov Report

❌ Patch coverage is 81.81818% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.36%. Comparing base (b0a8662) to head (89cb662).
⚠️ Report is 1 commits behind head on release-1.9.0.

Files with missing lines Patch % Lines
src/backend/base/langflow/api/utils/kb_helpers.py 80.00% 2 Missing ⚠️
src/lfx/src/lfx/custom/validate.py 83.33% 1 Missing and 1 partial ⚠️

❌ Your project status has failed because the head coverage (48.00%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@                Coverage Diff                @@
##           release-1.9.0   #12446      +/-   ##
=================================================
+ Coverage          49.32%   49.36%   +0.04%     
=================================================
  Files               1924     1924              
  Lines             170395   170412      +17     
  Branches           24839    24841       +2     
=================================================
+ Hits               84043    84131      +88     
+ Misses             85348    85272      -76     
- Partials            1004     1009       +5     
Flag Coverage Δ
backend 55.64% <80.00%> (+0.10%) ⬆️
frontend 48.23% <ø> (+<0.01%) ⬆️
lfx 48.00% <83.33%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/backend/base/langflow/api/utils/kb_helpers.py 77.44% <80.00%> (+6.69%) ⬆️
src/lfx/src/lfx/custom/validate.py 57.40% <83.33%> (+12.20%) ⬆️

... and 24 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 1, 2026
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 28%
28.01% (29184/104184) 64.76% (3717/5739) 30.03% (689/2294)

Unit Test Results

Tests Skipped Failures Errors Time
3014 0 💤 0 ❌ 0 🔥 4m 31s ⏱️

@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 1, 2026
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 2, 2026
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 2, 2026
@erichare erichare requested a review from Copilot April 2, 2026 03:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves correctness in two areas: (1) knowledge base metric accuracy when persisted metadata becomes stale relative to Chroma storage, and (2) runtime code execution/import handling so dotted imports bind the same names Python would normally bind.

Changes:

  • Recount KB chunk/text metrics when embedding_metadata.json reports chunks=0 but Chroma data exists, and persist corrected metrics back to disk.
  • Update KnowledgeIngestionComponent to recompute and persist chunk/word/character/size metrics immediately after ingestion.
  • Fix dotted import binding behavior in custom code validation/execution and add unit tests for dotted/aliased imports.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/lfx/tests/unit/custom/component/test_validate.py Adds tests covering dotted imports and aliased dotted imports for custom code execution.
src/lfx/src/lfx/custom/validate.py Adjusts import binding to keep top-level package names for dotted imports (matching Python semantics).
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py Adds persisted metrics update after ingestion and returns Chroma from vector store creation for metric recounting.
src/lfx/src/lfx/_assets/component_index.json Updates embedded component code hash/content to match ingestion changes.
src/backend/tests/unit/test_knowledge_bases_api.py Adds coverage ensuring stale zero-chunk metadata triggers a recount in fast metadata mode.
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py Adds coverage for persisting chunk/text metrics from a mocked Chroma collection.
.secrets.baseline Updates baseline line numbers/timestamp due to test file changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/lfx/src/lfx/components/files_and_knowledge/ingestion.py Outdated
Comment thread src/lfx/src/lfx/custom/validate.py
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 2, 2026
@github-actions github-actions Bot added the lgtm This PR has been approved by a maintainer label Apr 2, 2026
@erichare erichare added this pull request to the merge queue Apr 2, 2026
@erichare erichare removed this pull request from the merge queue due to a manual request Apr 2, 2026
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 2, 2026
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 2, 2026
@erichare erichare enabled auto-merge April 2, 2026 17:37
@erichare erichare disabled auto-merge April 2, 2026 17:54
@carlosrcoelho carlosrcoelho enabled auto-merge April 2, 2026 18:07
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 2, 2026
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 2, 2026
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 2, 2026
@carlosrcoelho carlosrcoelho added this pull request to the merge queue Apr 2, 2026
Merged via the queue into release-1.9.0 with commit 4bdc87d Apr 2, 2026
178 of 181 checks passed
@carlosrcoelho carlosrcoelho deleted the fix-kb-chroma-attribute branch April 2, 2026 20:00
Adam-Aghili pushed a commit that referenced this pull request Apr 15, 2026
* fix: Access the appropriate attribute for chroma

* Fix display of chunk metadata

* [autofix.ci] apply automated fixes

* Add some unit tests

* Update ingestion.py

* [autofix.ci] apply automated fixes

* Review updates

* Update component_index.json

* Fix bug with ingestion

* [autofix.ci] apply automated fixes

* Update test_ingestion.py

* Update test_ingestion.py

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Coelho <80289056+carlosrcoelho@users.noreply.github.com>
Adam-Aghili pushed a commit that referenced this pull request Apr 15, 2026
* fix: Access the appropriate attribute for chroma

* Fix display of chunk metadata

* [autofix.ci] apply automated fixes

* Add some unit tests

* Update ingestion.py

* [autofix.ci] apply automated fixes

* Review updates

* Update component_index.json

* Fix bug with ingestion

* [autofix.ci] apply automated fixes

* Update test_ingestion.py

* Update test_ingestion.py

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Coelho <80289056+carlosrcoelho@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants