Skip to content

Feat/kb integration v2.2.0#17

Open
racmac57 wants to merge 10 commits into
mainfrom
feat/kb-integration-v2.2.0
Open

Feat/kb integration v2.2.0#17
racmac57 wants to merge 10 commits into
mainfrom
feat/kb-integration-v2.2.0

Conversation

@racmac57
Copy link
Copy Markdown
Owner

Summary

Changes

Test

  • CI green

Checklist

  • Docs updated if needed
  • No secrets in diff

racmac57 and others added 9 commits November 9, 2025 21:58
- Added batch processing (100 files per cycle) to prevent system overload
- Implemented stability skip optimization (files >10 min bypass checks)
- Enhanced parallel processing with optional multiprocessing and fallback
- Added archive reprocessing script (reprocess_output.py)
- Added OneDrive migration script (migrate_to_onedrive.py)
- Refactored department configuration with 18 domain-specific departments
- Added auto-archival of old output sessions (>90 days)
- Implemented long path handling for Windows MAX_PATH limits
- Added version conflict resolution for sidecars and manifests

Performance: Reduced 6,500 file processing time from ~3.5 hours to ~53 minutes (90% improvement)

Updated README.md, SUMMARY.md, and CHANGELOG.md with v2.1.9 improvements
…(v2.1.9 - 2025-11-19)

- Added analyze_failed_files.py script for comprehensive failed file analysis
  - Analyzes file types, sizes, time patterns, and reprocessing potential
  - Identifies files that might succeed with updated code
  - Saves analysis results to JSON for review

- Updated config.json to use OneDrive path for failed directory
  - Added failed_dir: %OneDriveCommercial%\\KB_Shared\\03_archive\\failed
  - Ensures consistency with archive and output directories

- Enhanced watcher_splitter.py load_cfg() function
  - Added failed_dir to environment variable expansion list
  - Ensures proper path resolution for failed directory

- Added comprehensive HANDOFF_PROMPT.md
  - Complete project context for AI assistants
  - Current system state and findings
  - Recommendations and next steps

- Updated documentation (README, SUMMARY, CHANGELOG)
  - Added failed file analysis tools section
  - Updated v2.1.9 changes for November 19
  - Documented OneDrive failed directory configuration
…ring (v2.2.0)

- Automatic KB insertion: Chunks automatically inserted into ChromaDB during processing
- Enterprise retry logic: @backoff decorator with exponential backoff (3 retries)
- Duplicate prevention: Pre-insertion checks prevent duplicate chunks
- Graceful degradation: Errors logged but processing continues
- Real-time monitoring dashboard: Streamlit app with live metrics, charts, and RAG search
- Comprehensive testing: Integration and unit tests with mocked embeddings
- Updated PowerShell scripts: Start/Stop scripts now manage both watcher and dashboard
- Configuration options: New KB config keys for easy tuning
- Documentation: Complete KB integration guide and updated README/CHANGELOG/SUMMARY

Key files:
- watcher_splitter.py: Added insert_chunks_to_kb() with retry and duplicate checking
- dashboard_kb_monitoring.py: New Streamlit monitoring dashboard
- rag_integration.py: Fixed metadata compatibility (tags/keywords as JSON strings)
- config.json: Added auto_kb_insertion, kb_insertion_batch_size, kb_insertion_retry_attempts
- requirements.txt: Added backoff, plotly, pytest-asyncio
- tests/: Comprehensive test coverage for KB integration
- scripts/: Updated PowerShell scripts for dual-service management
@racmac57 racmac57 requested a review from hy5guy as a code owner November 23, 2025 23:15
@racmac57 racmac57 enabled auto-merge (squash) November 23, 2025 23:16
hy5guy
hy5guy previously approved these changes Nov 23, 2025
- nltk is required by watcher_splitter.py and chunker_core.py
- Fixes ModuleNotFoundError in CI test suite (8 failing tests)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants