Fix: Enhanced codebase index recovery and reuse ('Start Indexing' button now reuses existing Qdrant index) #8588

heyseth · 2025-10-10T00:05:26Z

Related GitHub Issue

Closes: #8129

Description

This PR fixes a critical issue where clicking the "Start Indexing" button would completely rebuild an existing Qdrant index when the Qdrant docker container was temporarily unavailable during extension startup. The implementation introduces intelligent index state tracking and incremental recovery mechanisms.

Key Changes

253e843 - Initial fix to reuse existing Qdrant index
55c254a - Prevent partial indexes from being treated as complete
54d9abb - Preserve cache on Qdrant connection failures
9ddaa58 - Track indexing completion state throughout scan lifecycle

Implementation Details

1. Metadata-Based Completion Tracking (qdrant-client.ts:549-652)

Added hasIndexedData(), markIndexingComplete(), and markIndexingIncomplete() methods
Uses a deterministic UUID (via uuidv5) to store indexing metadata as a special point in Qdrant
Metadata point tracks indexing_complete boolean and timestamps
Provides backward compatibility for existing indexes without metadata markers

2. Smart Scan Selection (orchestrator.ts:123-287)

After successful Qdrant connection, checks for existing indexed data
Incremental scan when data exists: Only processes new/changed files detected by cache comparison
Full scan when no data exists or collection just created: Processes entire workspace
Both scan types mark incomplete at start, complete at end, ensuring partial scans are detectable

3. Intelligent Cache Preservation (orchestrator.ts:299-315)

Tracks indexingStarted flag to distinguish connection failures from mid-scan failures
Preserves cache if Qdrant connection failed (enables future incremental recovery)
Clears cache only when indexing started but failed mid-way (prevents cache-Qdrant inconsistency)

4. Startup-Consistent Button Behavior (webviewMessageHandler.ts:2663-2686)

"Start Indexing" button now mimics extension startup flow
Always calls initialize() first to check Qdrant availability and existing collections
Only triggers startIndexing() when in "Standby" or "Error" states
Prevents redundant re-indexing when already indexed

Test Procedure

Manual testing steps:

Set up code indexing with a local Qdrant docker container
Ensure that the Qdrant docker container is not running
Open a workspace in Roo that has been fully indexed already
Verify that the error message Error - Failed during initial scan: fetch failed is visible in the codebase indexing UI
Start the Qdrant docker container
Click the "Start Indexing" button
Verify that the extension recognizes the existing index and doesn't rebuild
Check the output logs for the message: [CodeIndexOrchestrator] Collection already has indexed data. Skipping full scan and starting file watcher.
Make a file change
Verify that the file watcher picks up and indexes the change

Improvements

Performance:

Eliminates unnecessary full workspace scans when index data already exists
Incremental scans only process changed files, dramatically reducing startup time after Qdrant recovery

Data Integrity:

Explicit metadata markers prevent partial indexes from being treated as complete
Cache preservation enables accurate incremental recovery after connection failures

Resilience:

Graceful degradation when Qdrant is temporarily unavailable
Smart recovery that maximizes reuse of existing work

User Experience:

"Start Indexing" button becomes a true recovery mechanism rather than a full rebuild trigger
Reduced wait times when recovering from temporary Qdrant outages

These changes make Roo Code's codebase indexing significantly more robust for users with containerized Qdrant setups, where container restarts are common during development workflows.

Pre-Submission Checklist

Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
Scope: My changes are focused on the linked issue (one major feature/fix per PR).
Self-Review: I have performed a thorough self-review of my code.
Testing: New and/or updated tests have been added to cover my changes (if applicable).
Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Documentation Updates

No documentation updates are required.

Get in Touch

Discord: @ocean.smith

Important

Optimize 'Start Indexing' by checking for existing Qdrant index data before full scan, reducing unnecessary rebuilds.

Behavior:
- startIndexing() in orchestrator.ts now checks for existing indexed data using hasIndexedData() before a full scan.
- If data exists, skips full scan and starts file watcher.
- webviewMessageHandler.ts updated to initialize and start indexing only in "Standby" or "Error" states.
Interfaces:
- Added hasIndexedData() to IVectorStore and implemented in QdrantVectorStore.
Logging:
- Logs message in orchestrator.ts when skipping full scan due to existing data.

^{This description was created by}^{for 253e843. You can customize this summary. It will automatically update as commits are pushed.}

- Added hasIndexedData() method to check if collection has points - Modified startIndexing() to skip full scan if data exists - Updated webview handler to mimic extension startup behavior - Extension now recognizes active Qdrant container and reuses index Fixes issue where clicking 'Start Indexing' always rebuilt the index when Roo initialized without qdrant container active.

roomote

LGTM! The implementation correctly addresses the issue where clicking 'Start Indexing' would rebuild an existing index when the Qdrant container was initially inactive. The changes properly check for existing indexed data and skip the full scan when appropriate, matching the extension startup behavior.

- Added markIndexingComplete() to store completion metadata in Qdrant - Updated hasIndexedData() to check for completion marker, not just points_count - Completion marker uses deterministic UUID generated via uuidv5 - Fixes issue where interrupted indexing was incorrectly reported as complete Addresses edge case where closing workspace mid-index left partial data that was treated as a complete index on reopening.

src/services/code-index/vector-store/qdrant-client.ts

…al recovery - Track indexing state with `indexingStarted` flag set after successful Qdrant connection - Preserve cache when Qdrant unavailable at startup (enables incremental scan on recovery) - Clear cache only if indexing started but failed mid-process (prevents cache-Qdrant mismatch) - Run incremental scan on resume instead of skipping scan entirely - Add `markIndexingComplete()` to store completion metadata preventing partial indexes from appearing complete Fixes edge cases: 1. Qdrant inactive on startup → cache preserved for incremental recovery 2. Files added while workspace closed → incremental scan catches new files

src/services/code-index/vector-store/qdrant-client.ts

src/services/code-index/orchestrator.ts

- Add markIndexingIncomplete() to set state at start of indexing - Call markIndexingIncomplete() at start of both full and incremental scans - Call markIndexingComplete() after successful full AND incremental scans - Update hasIndexedData() with backward compatibility fallback for old indexes Indexing lifecycle now properly tracked: - incomplete → complete on successful scan - incomplete state persists if scan interrupted - Old indexes without marker fall back to points_count > 0 check Fixes: - Incremental scans now properly mark completion (was missing before) - Interrupted scans correctly identified as incomplete - Backward compatibility ensures existing indexes work without rebuild

src/services/code-index/orchestrator.ts

heyseth · 2025-10-11T03:16:08Z

I have updated the PR writeup to reflect the increased scope of this fix.

…gate orchestrator cleanup behind indexingStarted - Merge must_not: [{ key: "type", match: { value: "metadata" } }] into QdrantVectorStore.search() filter ([QdrantVectorStore.search()](src/services/code-index/vector-store/qdrant-client.ts:383)) - Normalize constants import path in [qdrant-client.ts](src/services/code-index/vector-store/qdrant-client.ts:8) - Only invoke clearCollection()/cache clear after initialize() succeeds in [CodeIndexOrchestrator.startIndexing()](src/services/code-index/orchestrator.ts:291) test: update qdrant-client search expectations and add orchestrator error-path gating test

roomote · 2025-10-30T19:52:00Z

Code Review Complete

I've completed a thorough review of this PR. The implementation successfully addresses the issue where clicking "Start Indexing" would rebuild an existing Qdrant index when the container was temporarily unavailable.

Review Summary

No issues found. The implementation is well-designed with:

✅ Metadata-based completion tracking - Properly tracks indexing state with backward compatibility for existing indexes

✅ Smart scan selection - Correctly chooses between incremental and full scans based on existing data

✅ Intelligent cache preservation - Distinguishes between connection failures (preserves cache) and mid-scan failures (clears cache)

✅ Comprehensive test coverage - Includes tests for error path gating and cleanup logic

✅ Proper error handling - All error paths are handled correctly with appropriate cleanup

The PR is ready for merge.

Follow Along on Roo Code Cloud

heyseth requested review from cte, jr and mrubens as code owners October 10, 2025 00:05

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Oct 10, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Oct 10, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Oct 10, 2025

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Oct 10, 2025

roomote bot reviewed Oct 10, 2025

View reviewed changes

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Oct 10, 2025

This comment was marked as outdated.

Sign in to view

roomote bot reviewed Oct 10, 2025

View reviewed changes

src/services/code-index/vector-store/qdrant-client.ts Show resolved Hide resolved

roomote bot reviewed Oct 10, 2025

View reviewed changes

src/services/code-index/vector-store/qdrant-client.ts Outdated Show resolved Hide resolved

roomote bot reviewed Oct 10, 2025

View reviewed changes

src/services/code-index/orchestrator.ts Show resolved Hide resolved

roomote bot reviewed Oct 11, 2025

View reviewed changes

src/services/code-index/orchestrator.ts Show resolved Hide resolved

heyseth changed the title ~~Fix: 'Start Indexing' button now reuses existing Qdrant index~~ Fix: Enhanced Codebase Index Recovery and Reuse ('Start Indexing' button now reuses existing Qdrant index) Oct 11, 2025

heyseth changed the title ~~Fix: Enhanced Codebase Index Recovery and Reuse ('Start Indexing' button now reuses existing Qdrant index)~~ Fix: Enhanced codebase index recovery and reuse ('Start Indexing' button now reuses existing Qdrant index) Oct 11, 2025

daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Oct 28, 2025

hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Oct 29, 2025

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Oct 30, 2025

daniel-lxs approved these changes Oct 30, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 30, 2025

mrubens approved these changes Oct 30, 2025

View reviewed changes

mrubens merged commit f9d6fe7 into RooCodeInc:main Oct 30, 2025
15 checks passed

github-project-automation bot moved this from New to Done in Roo Code Roadmap Oct 30, 2025

github-project-automation bot moved this from PR [Needs Prelim Review] to Done in Roo Code Roadmap Oct 30, 2025

rossdonald mentioned this pull request Oct 31, 2025

[BUG] Bad request: Index required but not found for \"type\" of one of the following types: [keyword] code indexing #8963

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Enhanced codebase index recovery and reuse ('Start Indexing' button now reuses existing Qdrant index) #8588

Fix: Enhanced codebase index recovery and reuse ('Start Indexing' button now reuses existing Qdrant index) #8588

Uh oh!

heyseth commented Oct 10, 2025 •

edited

Loading

Uh oh!

roomote bot left a comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

heyseth commented Oct 11, 2025

Uh oh!

roomote bot commented Oct 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix: Enhanced codebase index recovery and reuse ('Start Indexing' button now reuses existing Qdrant index) #8588

Fix: Enhanced codebase index recovery and reuse ('Start Indexing' button now reuses existing Qdrant index) #8588

Uh oh!

Conversation

heyseth commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related GitHub Issue

Description

Key Changes

Implementation Details

Test Procedure

Improvements

Pre-Submission Checklist

Documentation Updates

Get in Touch

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

heyseth commented Oct 11, 2025

Uh oh!

roomote bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete

Review Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

heyseth commented Oct 10, 2025 •

edited

Loading

roomote bot commented Oct 30, 2025 •

edited

Loading