Skip to content

docs: v2.2.1 docs update - directory cleanup and background mode#22

Open
racmac57 wants to merge 13 commits into
mainfrom
docs/update-20251221-2041
Open

docs: v2.2.1 docs update - directory cleanup and background mode#22
racmac57 wants to merge 13 commits into
mainfrom
docs/update-20251221-2041

Conversation

@racmac57
Copy link
Copy Markdown
Owner

Changes in v2.2.1

Directory Cleanup & Background Mode (2025-12-21)

  • Background mode support for watcher and dashboard services
  • Directory organization with requirements, logs, and runtime files in dedicated directories
  • File cleanup scripts for maintaining project structure
  • Updated script references to use new file locations

File Processing Workflow & Stability Fixes (2025-11-29)

  • File processing workflow scripts
  • Database lock resolution
  • File extension handling improvements
  • Process management enhancements

Checklist

  • CHANGELOG.md updated
  • README.md updated
  • SUMMARY.md updated
  • Version synced to v2.2.1

See CHANGELOG.md for full details.

racmac57 and others added 12 commits November 9, 2025 21:58
- Added batch processing (100 files per cycle) to prevent system overload
- Implemented stability skip optimization (files >10 min bypass checks)
- Enhanced parallel processing with optional multiprocessing and fallback
- Added archive reprocessing script (reprocess_output.py)
- Added OneDrive migration script (migrate_to_onedrive.py)
- Refactored department configuration with 18 domain-specific departments
- Added auto-archival of old output sessions (>90 days)
- Implemented long path handling for Windows MAX_PATH limits
- Added version conflict resolution for sidecars and manifests

Performance: Reduced 6,500 file processing time from ~3.5 hours to ~53 minutes (90% improvement)

Updated README.md, SUMMARY.md, and CHANGELOG.md with v2.1.9 improvements
…(v2.1.9 - 2025-11-19)

- Added analyze_failed_files.py script for comprehensive failed file analysis
  - Analyzes file types, sizes, time patterns, and reprocessing potential
  - Identifies files that might succeed with updated code
  - Saves analysis results to JSON for review

- Updated config.json to use OneDrive path for failed directory
  - Added failed_dir: %OneDriveCommercial%\\KB_Shared\\03_archive\\failed
  - Ensures consistency with archive and output directories

- Enhanced watcher_splitter.py load_cfg() function
  - Added failed_dir to environment variable expansion list
  - Ensures proper path resolution for failed directory

- Added comprehensive HANDOFF_PROMPT.md
  - Complete project context for AI assistants
  - Current system state and findings
  - Recommendations and next steps

- Updated documentation (README, SUMMARY, CHANGELOG)
  - Added failed file analysis tools section
  - Updated v2.1.9 changes for November 19
  - Documented OneDrive failed directory configuration
…ring (v2.2.0)

- Automatic KB insertion: Chunks automatically inserted into ChromaDB during processing
- Enterprise retry logic: @backoff decorator with exponential backoff (3 retries)
- Duplicate prevention: Pre-insertion checks prevent duplicate chunks
- Graceful degradation: Errors logged but processing continues
- Real-time monitoring dashboard: Streamlit app with live metrics, charts, and RAG search
- Comprehensive testing: Integration and unit tests with mocked embeddings
- Updated PowerShell scripts: Start/Stop scripts now manage both watcher and dashboard
- Configuration options: New KB config keys for easy tuning
- Documentation: Complete KB integration guide and updated README/CHANGELOG/SUMMARY

Key files:
- watcher_splitter.py: Added insert_chunks_to_kb() with retry and duplicate checking
- dashboard_kb_monitoring.py: New Streamlit monitoring dashboard
- rag_integration.py: Fixed metadata compatibility (tags/keywords as JSON strings)
- config.json: Added auto_kb_insertion, kb_insertion_batch_size, kb_insertion_retry_attempts
- requirements.txt: Added backoff, plotly, pytest-asyncio
- tests/: Comprehensive test coverage for KB integration
- scripts/: Updated PowerShell scripts for dual-service management
- nltk is required by watcher_splitter.py and chunker_core.py
- Fixes ModuleNotFoundError in CI test suite (8 failing tests)
…dling

- Fixed database lock errors by detecting and stopping duplicate watcher processes
- Added file processing workflow scripts (move/copy with source tracking)
- Fixed file extension detection for files without extensions
- Documented exclude pattern behavior and provided workarounds
- Improved watcher process management and cleanup
- Updated README, SUMMARY, and CHANGELOG with recent improvements
…p and background mode

Refs: none

Affects: CHANGELOG.md, README.md, SUMMARY.md
@racmac57 racmac57 requested a review from hy5guy as a code owner December 22, 2025 01:46
@racmac57
Copy link
Copy Markdown
Owner Author

Documentation (CHANGELOG, README, SUMMARY) is updated for v2.2.1. Directory cleanup and background mode features are correctly implemented. Note: CI tests are failing due to a missing sentence-transformers dependency in requirements.txt. CC @hy5guy for code owner review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants