-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Fix: Enhanced codebase index recovery and reuse ('Start Indexing' button now reuses existing Qdrant index) #8588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Added hasIndexedData() method to check if collection has points - Modified startIndexing() to skip full scan if data exists - Updated webview handler to mimic extension startup behavior - Extension now recognizes active Qdrant container and reuses index Fixes issue where clicking 'Start Indexing' always rebuilt the index when Roo initialized without qdrant container active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! The implementation correctly addresses the issue where clicking 'Start Indexing' would rebuild an existing index when the Qdrant container was initially inactive. The changes properly check for existing indexed data and skip the full scan when appropriate, matching the extension startup behavior.
- Added markIndexingComplete() to store completion metadata in Qdrant - Updated hasIndexedData() to check for completion marker, not just points_count - Completion marker uses deterministic UUID generated via uuidv5 - Fixes issue where interrupted indexing was incorrectly reported as complete Addresses edge case where closing workspace mid-index left partial data that was treated as a complete index on reopening.
This comment was marked as outdated.
This comment was marked as outdated.
…al recovery - Track indexing state with `indexingStarted` flag set after successful Qdrant connection - Preserve cache when Qdrant unavailable at startup (enables incremental scan on recovery) - Clear cache only if indexing started but failed mid-process (prevents cache-Qdrant mismatch) - Run incremental scan on resume instead of skipping scan entirely - Add `markIndexingComplete()` to store completion metadata preventing partial indexes from appearing complete Fixes edge cases: 1. Qdrant inactive on startup → cache preserved for incremental recovery 2. Files added while workspace closed → incremental scan catches new files
- Add markIndexingIncomplete() to set state at start of indexing - Call markIndexingIncomplete() at start of both full and incremental scans - Call markIndexingComplete() after successful full AND incremental scans - Update hasIndexedData() with backward compatibility fallback for old indexes Indexing lifecycle now properly tracked: - incomplete → complete on successful scan - incomplete state persists if scan interrupted - Old indexes without marker fall back to points_count > 0 check Fixes: - Incremental scans now properly mark completion (was missing before) - Interrupted scans correctly identified as incomplete - Backward compatibility ensures existing indexes work without rebuild
|
I have updated the PR writeup to reflect the increased scope of this fix. |
…gate orchestrator cleanup behind indexingStarted
- Merge must_not: [{ key: "type", match: { value: "metadata" } }] into QdrantVectorStore.search() filter ([QdrantVectorStore.search()](src/services/code-index/vector-store/qdrant-client.ts:383))
- Normalize constants import path in [qdrant-client.ts](src/services/code-index/vector-store/qdrant-client.ts:8)
- Only invoke clearCollection()/cache clear after initialize() succeeds in [CodeIndexOrchestrator.startIndexing()](src/services/code-index/orchestrator.ts:291)
test: update qdrant-client search expectations and add orchestrator error-path gating test
Code Review CompleteI've completed a thorough review of this PR. The implementation successfully addresses the issue where clicking "Start Indexing" would rebuild an existing Qdrant index when the container was temporarily unavailable. Review SummaryNo issues found. The implementation is well-designed with: ✅ Metadata-based completion tracking - Properly tracks indexing state with backward compatibility for existing indexes ✅ Smart scan selection - Correctly chooses between incremental and full scans based on existing data ✅ Intelligent cache preservation - Distinguishes between connection failures (preserves cache) and mid-scan failures (clears cache) ✅ Comprehensive test coverage - Includes tests for error path gating and cleanup logic ✅ Proper error handling - All error paths are handled correctly with appropriate cleanup The PR is ready for merge. |
Related GitHub Issue
Closes: #8129
Description
This PR fixes a critical issue where clicking the "Start Indexing" button would completely rebuild an existing Qdrant index when the Qdrant docker container was temporarily unavailable during extension startup. The implementation introduces intelligent index state tracking and incremental recovery mechanisms.
Key Changes
Implementation Details
1. Metadata-Based Completion Tracking (
qdrant-client.ts:549-652)hasIndexedData(),markIndexingComplete(), andmarkIndexingIncomplete()methodsuuidv5) to store indexing metadata as a special point in Qdrantindexing_completeboolean and timestamps2. Smart Scan Selection (
orchestrator.ts:123-287)3. Intelligent Cache Preservation (
orchestrator.ts:299-315)indexingStartedflag to distinguish connection failures from mid-scan failures4. Startup-Consistent Button Behavior (
webviewMessageHandler.ts:2663-2686)initialize()first to check Qdrant availability and existing collectionsstartIndexing()when in "Standby" or "Error" statesTest Procedure
Manual testing steps:
Error - Failed during initial scan: fetch failedis visible in the codebase indexing UI[CodeIndexOrchestrator] Collection already has indexed data. Skipping full scan and starting file watcher.Improvements
Performance:
Data Integrity:
Resilience:
User Experience:
These changes make Roo Code's codebase indexing significantly more robust for users with containerized Qdrant setups, where container restarts are common during development workflows.
Pre-Submission Checklist
Documentation Updates
Get in Touch
Discord:
@ocean.smithImportant
Optimize 'Start Indexing' by checking for existing Qdrant index data before full scan, reducing unnecessary rebuilds.
startIndexing()inorchestrator.tsnow checks for existing indexed data usinghasIndexedData()before a full scan.webviewMessageHandler.tsupdated to initialize and start indexing only in "Standby" or "Error" states.hasIndexedData()toIVectorStoreand implemented inQdrantVectorStore.orchestrator.tswhen skipping full scan due to existing data.This description was created by
for 253e843. You can customize this summary. It will automatically update as commits are pushed.