feat: Better support for advanced parser in File Component#10048
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughUpdates starter project JSONs and the core FileComponent to enable Advanced Parser by default, integrate Docling processing via a subprocess, expand UI inputs/visibility logic, adjust multi-file advanced handling, and bump a dependency in News Aggregator. Core file.py now supports advanced processing across multiple files when all are compatible. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as User/UI
participant FC as FileComponent
participant DL as Docling Subprocess
participant MAP as Result Mapper
participant OUT as Outputs
U->>FC: Provide file paths + Advanced Parser options
alt All files Docling-compatible AND advanced_mode
FC->>DL: Invoke subprocess with args (pipeline, ocr, markdown, ...)
DL-->>FC: JSON results (structured/markdown/errors)
FC->>MAP: Parse and assemble Data/DataFrame
MAP-->>OUT: Structured/Markdown/Raw outputs (aggregated)
else Standard path
FC-->>OUT: Raw Content / File Path via standard loaders
end
sequenceDiagram
autonumber
participant FC as FileComponent(process_files)
participant CHK as Compatibility Check
participant DL as Docling Subprocess
participant AGG as Aggregator
FC->>CHK: Verify all files not *.csv/*.xlsx/*.parquet
alt Compatible
FC->>DL: Process all files via Docling
DL-->>FC: Per-file JSON results
FC->>AGG: Collect into final list
AGG-->>FC: final_return
else Incompatible
FC-->>FC: Fall back to standard multi-file processing
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 error, 3 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project check has failed because the head coverage (47.20%) is below the target coverage (55.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## fix-docling-vlm #10048 +/- ##
===================================================
+ Coverage 24.20% 24.21% +0.01%
===================================================
Files 1091 1091
Lines 40038 40037 -1
Branches 5543 5542 -1
===================================================
+ Hits 9690 9694 +4
+ Misses 30177 30172 -5
Partials 171 171
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
@erichare If time permits should we add tests for checking the function with what happens if the file is not docling compatible? |
Merge commits are not allowed on this repository
|
* fix: Proper support for VLM in Docling * [autofix.ci] apply automated fixes * [autofix.ci] apply automated fixes (attempt 2/3) * Update file.py * [autofix.ci] apply automated fixes * Update pyproject.toml Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update uv.lock * Fix project specs * Add jpg as accepted file type * [autofix.ci] apply automated fixes * Update dep structure * One more attempt at getting this right * And again * Add docling core * Update pyproject.toml * Update deps * [autofix.ci] apply automated fixes * Update knowledge_bases.py * Package version bumps * Add pytest tests for advanced mode * Update test_file_component.py * Update test_file_component.py * Make pipeline a visible option in advanced mode * Feat tool mode files (#10107) * feat: Tool Mode Support for File Components * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * feat: Better support for advanced parser in File Component (#10048) * feat: Better support for advanced parser in files * [autofix.ci] apply automated fixes * Add docling mocked tests * Update file.py * Update test_file_component.py * [autofix.ci] apply automated fixes * Update News Aggregator.json * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> * Update file.py * [autofix.ci] apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>



This pull request updates the advanced document processing ("Docling") feature in the
FileComponentto support processing multiple files at once, as long as all selected files are compatible. Previously, advanced processing was limited to a single file. The changes update both the UI logic and backend processing to reflect this expanded capability.Advanced document processing enhancements:
advanced_modeoption is now shown in the UI even when multiple files are selected, as long as all files are compatible with Docling. Previously, it was only available for a single file.update_build_confignow enables advanced processing if all selected files are non-tabular and Docling-compatible, rather than requiring exactly one file.Backend processing updates:
process_files) now allows advanced processing for multiple compatible files, updating the docstring and logic accordingly.process_file_standardnow checks that all files are Docling-compatible before enabling advanced processing, and processes each file individually in a subprocess, aggregating the results. [1] [2]Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Chores