Skip to content

Phase 1c: Cross-Encoder Reranking#47

Open
brian-lai wants to merge 15 commits into
mainfrom
para/phase1-implementation-phase1c
Open

Phase 1c: Cross-Encoder Reranking#47
brian-lai wants to merge 15 commits into
mainfrom
para/phase1-implementation-phase1c

Conversation

@brian-lai
Copy link
Copy Markdown
Owner

Summary

Implements cross-encoder reranking to improve search quality by 10-15% through two-stage retrieval.

Phase: Phase 1c - Cross-Encoder Reranking
Plan: context/plans/2026-02-03-phase1c-cross-encoder-reranking.md

Implementation

Core Components

  1. Reranker Infrastructure (internal/reranker/)

    • Reranker interface with Rerank(query, candidates, topK) method
    • ScoredResult type for scored documents
    • Factory function NewReranker(provider string)
    • Error handling for unavailable rerankers
  2. Qwen3-Reranker Integration (internal/reranker/qwen3.go)

    • Full implementation using Ollama /api/generate endpoint
    • Parallel batch scoring with goroutines
    • Document truncation to 500 chars for performance
    • Score parsing with fallback to 0.5
    • 5s timeout per candidate, 30s HTTP timeout
  3. Hybrid Search Integration (internal/search/hybrid/hybrid.go)

    • Added Rerank and RerankTopK fields to Config
    • SetReranker() method for dependency injection
    • Reranking pipeline: retrieve → fuse → rerank → return top-K
    • Graceful fallback if reranking fails

Features

  • ✅ Optional reranking (disabled by default)
  • ✅ Parallel goroutine scoring for performance
  • ✅ Graceful fallback on errors
  • ✅ MCP tool support (hybrid_search_v2 with rerank parameter)
  • ✅ Environment variable configuration
  • ✅ YAML configuration support
  • ✅ Comprehensive documentation

Testing

  • ✅ Unit tests for score parsing (9 test cases)
  • ✅ Unit tests for score clamping (7 test cases)
  • ✅ Unit tests for result sorting (4 test cases)
  • ✅ All tests passing

Documentation

  • ✅ Comprehensive reranking guide (docs/reranking.md)
  • ✅ Updated README.md with hybrid_search_v2 documentation
  • ✅ Configuration examples (environment variables and YAML)
  • ✅ Troubleshooting section
  • ✅ Performance metrics and latency breakdown

Success Criteria

Criterion Status Notes
MRR improves by >10% ⏸️ Pending Requires manual benchmarking with Ollama
Latency <200ms end-to-end ⏸️ Pending Requires manual benchmarking
Reranking optional (flag-controlled) ✅ Complete rerank parameter in MCP tool
Graceful fallback if unavailable ✅ Complete Error handling with fallback to original results

Manual Validation Required

Before merging, please validate:

  1. Install Qwen3-Reranker:

    ollama pull sam860/qwen3-reranker
  2. Enable reranking:

    export CODETECT_RERANK_ENABLED=true
    export CODETECT_RERANK_MODEL=sam860/qwen3-reranker
  3. Test with hybrid_search_v2:

    {
      "query": "authentication middleware",
      "limit": 20,
      "rerank": true
    }
  4. Verify:

    • Reranking completes in <200ms
    • Results are reordered by relevance
    • MRR improves (compare against baseline)

Next Steps

After merging Phase 1c:

  • Phase 1d: .codetectignore Support
  • Phase 1e: HTTP API

References

  • Master Plan: context/plans/2026-02-02-phase1-implementation-roadmap.md
  • Phase 1c Plan: context/plans/2026-02-03-phase1c-cross-encoder-reranking.md
  • Reranking Research: context/data/2026-02-03-cross-encoder-reranking-research.md

…o Phase 2

Decision rationale:
- Simplicity first: native Go/Ollama integration
- bge-m3 provides good quality for current workload
- Focus Phase 1 on shipping features (reranking, HTTP API, .codetectignore)
- Dual-model adds complexity without clear user pain
- Can evaluate in Phase 2 if needed

Impact: Phase 1b (Dual-Model) removed from scope
New timeline: 5-7 weeks (down from 8-12 weeks)
Key findings:
- Qwen3-Reranker models available in Ollama (0.6B, 4B, 8B)
- Expected 10-15% MRR improvement (industry standard)
- Native Go integration possible (workaround for no /rerank API)
- Prototype needed to validate >5% improvement

Recommendation: Use Qwen3-Reranker-0.6B for speed
Fallback: MS MARCO MiniLM via Python microservice

Deliverable: context/data/2026-02-03-cross-encoder-reranking-research.md
Key features:
- 10 REST endpoints covering all MCP tools + utilities
- Dual auth strategy: local (no auth) + cloud (API keys)
- OpenAPI 3.0 spec for client generation
- Integration examples (Python, TypeScript, VS Code)

Architecture:
- Chi router for HTTP layer
- Wraps existing MCP server (no duplication)
- Docker + K8s deployment manifests

Deliverable: context/data/2026-02-03-http-api-design.md
Key features:
- .gitignore-compatible syntax (wildcards, negation, comments)
- Independent of .gitignore (exclude tracked, include ignored)
- Hierarchical loading (project + global)
- Common use cases documented (generated code, vendor, fixtures)

Implementation:
- Use github.com/sabhiram/go-gitignore library
- Apply during file scanning + embedding
- CLI flags: --ignore-file, --no-ignore

Deliverable: context/data/2026-02-03-codetectignore-spec.md
Phase 1a Complete - All research deliverables achieved:
✅ Model selection: Keep bge-m3 (defer dual-model to Phase 2)
✅ Reranking research: Qwen3-Reranker + 10-15% improvement expected
✅ HTTP API design: 10 REST endpoints + OpenAPI spec
✅ .codetectignore spec: gitignore-compatible with 5 use cases

Impact on Phase 1 scope:
- Removed Phase 1b (Dual-Model) - deferred to Phase 2
- New sequence: Phase 1c (Reranking) → 1d (.codetectignore) → 1e (HTTP API)
- Timeline: 5-7 weeks (down from 8-12 weeks)

Success criteria met:
✅ All technical unknowns resolved
✅ Implementation paths clear
✅ Specifications ready for execution
✅ No blockers for next phases
Changes based on Phase 1a research outcomes:
- Reduced from 4 features to 3 features
- Removed Phase 1b (Dual-Model) - deferred to Phase 2
- Updated timeline: 5-7 weeks (down from 8-12 weeks)
- Marked Phase 1a as COMPLETE (2026-02-03)
- Updated dependencies: 1a → 1c → 1d → 1e
- Removed dual-model technical risks

Key decision: Keep bge-m3 for Phase 1, focus on features
- Cross-encoder reranking (Phase 1c)
- .codetectignore support (Phase 1d)
- HTTP API (Phase 1e)

Dual-model embedding deferred to Phase 2 for future evaluation
- Define Reranker interface with Rerank(query, candidates, topK)
- Create ScoredResult type for scored documents
- Implement NewReranker factory supporting qwen3 provider
- Add error handling for unknown/disabled providers
- Create Qwen3Reranker struct with HTTP client for Ollama
- Implement parallel batch scoring with goroutines
- Design relevance scoring prompt (0.0-1.0 scale)
- Parse float scores from model responses with fallback
- Add 5s timeout per candidate scoring
- Truncate documents to 500 chars for speed
- Handle scoring errors gracefully (default to 0.0)
- Add Rerank and RerankTopK fields to Config
- Add SetReranker method to Searcher
- Implement rerankResults method with cross-encoder scoring
- Apply reranking after RRF fusion but before return
- Graceful fallback to original results if reranking fails
- Map reranked scores back to Result structs
- Test score parsing with various formats
- Test score clamping for out-of-range values
- Test ScoredResult sorting (descending by score)
- Test edge cases: empty, single element, same scores
- All tests passing
- Create docs/reranking.md with full guide
- Cover configuration, architecture, performance, troubleshooting
- Update README.md with hybrid_search_v2 tool documentation
- Add reranking quick start section
- Document environment variables and YAML config
- Include latency breakdown and quality metrics
- Document all implementation steps
- List technical highlights and decisions
- Track commits and file changes
- Note manual validation checklist
- Capture lessons learned
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant