Skip to content

Conversation

@heyseth
Copy link
Contributor

@heyseth heyseth commented Oct 10, 2025

Related GitHub Issue

Closes: #8129

Description

This PR fixes a critical issue where clicking the "Start Indexing" button would completely rebuild an existing Qdrant index when the Qdrant docker container was temporarily unavailable during extension startup. The implementation introduces intelligent index state tracking and incremental recovery mechanisms.

Key Changes

  1. 253e843 - Initial fix to reuse existing Qdrant index
  2. 55c254a - Prevent partial indexes from being treated as complete
  3. 54d9abb - Preserve cache on Qdrant connection failures
  4. 9ddaa58 - Track indexing completion state throughout scan lifecycle

Implementation Details

1. Metadata-Based Completion Tracking (qdrant-client.ts:549-652)

  • Added hasIndexedData(), markIndexingComplete(), and markIndexingIncomplete() methods
  • Uses a deterministic UUID (via uuidv5) to store indexing metadata as a special point in Qdrant
  • Metadata point tracks indexing_complete boolean and timestamps
  • Provides backward compatibility for existing indexes without metadata markers

2. Smart Scan Selection (orchestrator.ts:123-287)

  • After successful Qdrant connection, checks for existing indexed data
  • Incremental scan when data exists: Only processes new/changed files detected by cache comparison
  • Full scan when no data exists or collection just created: Processes entire workspace
  • Both scan types mark incomplete at start, complete at end, ensuring partial scans are detectable

3. Intelligent Cache Preservation (orchestrator.ts:299-315)

  • Tracks indexingStarted flag to distinguish connection failures from mid-scan failures
  • Preserves cache if Qdrant connection failed (enables future incremental recovery)
  • Clears cache only when indexing started but failed mid-way (prevents cache-Qdrant inconsistency)

4. Startup-Consistent Button Behavior (webviewMessageHandler.ts:2663-2686)

  • "Start Indexing" button now mimics extension startup flow
  • Always calls initialize() first to check Qdrant availability and existing collections
  • Only triggers startIndexing() when in "Standby" or "Error" states
  • Prevents redundant re-indexing when already indexed

Test Procedure

Manual testing steps:

  1. Set up code indexing with a local Qdrant docker container
  2. Ensure that the Qdrant docker container is not running
  3. Open a workspace in Roo that has been fully indexed already
  4. Verify that the error message Error - Failed during initial scan: fetch failed is visible in the codebase indexing UI
  5. Start the Qdrant docker container
  6. Click the "Start Indexing" button
  7. Verify that the extension recognizes the existing index and doesn't rebuild
  8. Check the output logs for the message: [CodeIndexOrchestrator] Collection already has indexed data. Skipping full scan and starting file watcher.
  9. Make a file change
  10. Verify that the file watcher picks up and indexes the change

Improvements

Performance:

  • Eliminates unnecessary full workspace scans when index data already exists
  • Incremental scans only process changed files, dramatically reducing startup time after Qdrant recovery

Data Integrity:

  • Explicit metadata markers prevent partial indexes from being treated as complete
  • Cache preservation enables accurate incremental recovery after connection failures

Resilience:

  • Graceful degradation when Qdrant is temporarily unavailable
  • Smart recovery that maximizes reuse of existing work

User Experience:

  • "Start Indexing" button becomes a true recovery mechanism rather than a full rebuild trigger
  • Reduced wait times when recovering from temporary Qdrant outages

These changes make Roo Code's codebase indexing significantly more robust for users with containerized Qdrant setups, where container restarts are common during development workflows.

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes (if applicable).
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Documentation Updates

  • No documentation updates are required.

Get in Touch

Discord: @ocean.smith


Important

Optimize 'Start Indexing' by checking for existing Qdrant index data before full scan, reducing unnecessary rebuilds.

  • Behavior:
    • startIndexing() in orchestrator.ts now checks for existing indexed data using hasIndexedData() before a full scan.
    • If data exists, skips full scan and starts file watcher.
    • webviewMessageHandler.ts updated to initialize and start indexing only in "Standby" or "Error" states.
  • Interfaces:
    • Added hasIndexedData() to IVectorStore and implemented in QdrantVectorStore.
  • Logging:
    • Logs message in orchestrator.ts when skipping full scan due to existing data.

This description was created by Ellipsis for 253e843. You can customize this summary. It will automatically update as commits are pushed.

- Added hasIndexedData() method to check if collection has points
- Modified startIndexing() to skip full scan if data exists
- Updated webview handler to mimic extension startup behavior
- Extension now recognizes active Qdrant container and reuses index

Fixes issue where clicking 'Start Indexing' always rebuilt the index when Roo initialized without qdrant container active.
@heyseth heyseth requested review from cte, jr and mrubens as code owners October 10, 2025 00:05
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Oct 10, 2025
Copy link
Contributor

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! The implementation correctly addresses the issue where clicking 'Start Indexing' would rebuild an existing index when the Qdrant container was initially inactive. The changes properly check for existing indexed data and skip the full scan when appropriate, matching the extension startup behavior.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Oct 10, 2025
- Added markIndexingComplete() to store completion metadata in Qdrant
- Updated hasIndexedData() to check for completion marker, not just points_count
- Completion marker uses deterministic UUID generated via uuidv5
- Fixes issue where interrupted indexing was incorrectly reported as complete

Addresses edge case where closing workspace mid-index left partial data that was treated as a complete index on reopening.
@heyseth

This comment was marked as outdated.

…al recovery

- Track indexing state with `indexingStarted` flag set after successful Qdrant connection
- Preserve cache when Qdrant unavailable at startup (enables incremental scan on recovery)
- Clear cache only if indexing started but failed mid-process (prevents cache-Qdrant mismatch)
- Run incremental scan on resume instead of skipping scan entirely
- Add `markIndexingComplete()` to store completion metadata preventing partial indexes from appearing complete

Fixes edge cases:
1. Qdrant inactive on startup → cache preserved for incremental recovery
2. Files added while workspace closed → incremental scan catches new files
- Add markIndexingIncomplete() to set state at start of indexing
- Call markIndexingIncomplete() at start of both full and incremental scans
- Call markIndexingComplete() after successful full AND incremental scans
- Update hasIndexedData() with backward compatibility fallback for old indexes

Indexing lifecycle now properly tracked:
- incomplete → complete on successful scan
- incomplete state persists if scan interrupted
- Old indexes without marker fall back to points_count > 0 check

Fixes:
- Incremental scans now properly mark completion (was missing before)
- Interrupted scans correctly identified as incomplete
- Backward compatibility ensures existing indexes work without rebuild
@heyseth heyseth changed the title Fix: 'Start Indexing' button now reuses existing Qdrant index Fix: Enhanced Codebase Index Recovery and Reuse ('Start Indexing' button now reuses existing Qdrant index) Oct 11, 2025
@heyseth heyseth changed the title Fix: Enhanced Codebase Index Recovery and Reuse ('Start Indexing' button now reuses existing Qdrant index) Fix: Enhanced codebase index recovery and reuse ('Start Indexing' button now reuses existing Qdrant index) Oct 11, 2025
@heyseth
Copy link
Contributor Author

heyseth commented Oct 11, 2025

I have updated the PR writeup to reflect the increased scope of this fix.

@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Oct 28, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Oct 29, 2025
…gate orchestrator cleanup behind indexingStarted

- Merge must_not: [{ key: "type", match: { value: "metadata" } }] into QdrantVectorStore.search() filter ([QdrantVectorStore.search()](src/services/code-index/vector-store/qdrant-client.ts:383))
- Normalize constants import path in [qdrant-client.ts](src/services/code-index/vector-store/qdrant-client.ts:8)
- Only invoke clearCollection()/cache clear after initialize() succeeds in [CodeIndexOrchestrator.startIndexing()](src/services/code-index/orchestrator.ts:291)

test: update qdrant-client search expectations and add orchestrator error-path gating test
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Oct 30, 2025
@roomote
Copy link
Contributor

roomote bot commented Oct 30, 2025

Code Review Complete

I've completed a thorough review of this PR. The implementation successfully addresses the issue where clicking "Start Indexing" would rebuild an existing Qdrant index when the container was temporarily unavailable.

Review Summary

No issues found. The implementation is well-designed with:

Metadata-based completion tracking - Properly tracks indexing state with backward compatibility for existing indexes

Smart scan selection - Correctly chooses between incremental and full scans based on existing data

Intelligent cache preservation - Distinguishes between connection failures (preserves cache) and mid-scan failures (clears cache)

Comprehensive test coverage - Includes tests for error path gating and cleanup logic

Proper error handling - All error paths are handled correctly with appropriate cleanup

The PR is ready for merge.


Follow Along on Roo Code Cloud

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 30, 2025
@mrubens mrubens merged commit f9d6fe7 into RooCodeInc:main Oct 30, 2025
15 checks passed
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Oct 30, 2025
@github-project-automation github-project-automation bot moved this from PR [Needs Prelim Review] to Done in Roo Code Roadmap Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer PR - Needs Preliminary Review size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[BUG] Codebase Indexing is running fully every day

4 participants