fix: Import and Statistics fixes for Knowledge Bases#12446
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
❌ Your project status has failed because the head coverage (48.00%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## release-1.9.0 #12446 +/- ##
=================================================
+ Coverage 49.32% 49.36% +0.04%
=================================================
Files 1924 1924
Lines 170395 170412 +17
Branches 24839 24841 +2
=================================================
+ Hits 84043 84131 +88
+ Misses 85348 85272 -76
- Partials 1004 1009 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR improves correctness in two areas: (1) knowledge base metric accuracy when persisted metadata becomes stale relative to Chroma storage, and (2) runtime code execution/import handling so dotted imports bind the same names Python would normally bind.
Changes:
- Recount KB chunk/text metrics when
embedding_metadata.jsonreportschunks=0but Chroma data exists, and persist corrected metrics back to disk. - Update
KnowledgeIngestionComponentto recompute and persist chunk/word/character/size metrics immediately after ingestion. - Fix dotted import binding behavior in custom code validation/execution and add unit tests for dotted/aliased imports.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/lfx/tests/unit/custom/component/test_validate.py |
Adds tests covering dotted imports and aliased dotted imports for custom code execution. |
src/lfx/src/lfx/custom/validate.py |
Adjusts import binding to keep top-level package names for dotted imports (matching Python semantics). |
src/lfx/src/lfx/components/files_and_knowledge/ingestion.py |
Adds persisted metrics update after ingestion and returns Chroma from vector store creation for metric recounting. |
src/lfx/src/lfx/_assets/component_index.json |
Updates embedded component code hash/content to match ingestion changes. |
src/backend/tests/unit/test_knowledge_bases_api.py |
Adds coverage ensuring stale zero-chunk metadata triggers a recount in fast metadata mode. |
src/backend/tests/unit/components/files_and_knowledge/test_ingestion.py |
Adds coverage for persisting chunk/text metrics from a mocked Chroma collection. |
.secrets.baseline |
Updates baseline line numbers/timestamp due to test file changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
* fix: Access the appropriate attribute for chroma * Fix display of chunk metadata * [autofix.ci] apply automated fixes * Add some unit tests * Update ingestion.py * [autofix.ci] apply automated fixes * Review updates * Update component_index.json * Fix bug with ingestion * [autofix.ci] apply automated fixes * Update test_ingestion.py * Update test_ingestion.py --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Coelho <80289056+carlosrcoelho@users.noreply.github.com>
* fix: Access the appropriate attribute for chroma * Fix display of chunk metadata * [autofix.ci] apply automated fixes * Add some unit tests * Update ingestion.py * [autofix.ci] apply automated fixes * Review updates * Update component_index.json * Fix bug with ingestion * [autofix.ci] apply automated fixes * Update test_ingestion.py * Update test_ingestion.py --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Coelho <80289056+carlosrcoelho@users.noreply.github.com>
Summary
This PR fixes two correctness issues related to knowledge base ingestion and custom code execution.
For knowledge bases, component-based ingestion could leave
embedding_metadata.jsonwith stale metrics, especially when the file reportedchunks = 0even though Chroma data had already been written. This change makes new ingestions persist accurate chunk/text metrics and also allows existing stale knowledge bases to self-heal when metadata is read.It also fixes dotted import handling in
lfx.custom.validateso custom code follows normal Python import semantics for statements likeimport urllib.requestandimport urllib.request as request.What changed
KnowledgeIngestionComponentto recompute and persist:chunkswordscharactersavg_chunk_sizesizeafter documents are written to Chroma.
KBAnalysisHelper.get_metadata(..., fast=True)to detect stale zero-chunk metadata when Chroma artifacts exist on disk, recount metrics from Chroma, and persist the corrected metadata.lfx.custom.validateso non-aliased dotted imports bind the top-level package like normal Python imports, while aliased imports continue to work as expected.component_index.jsonto keep the embedded component asset in sync with the ingestion changes.Why
This closes the gap between what is actually stored in Chroma and what Langflow reports in knowledge base metadata. Without this fix, the KB modal/API could show misleading zeroed metrics after successful ingestion or partial failures.
It also makes custom code execution more predictable by aligning dotted import behavior with standard Python semantics.
Test coverage