Skip to content

search: precompute intent once, reuse across all semantic_search calls#76

Open
donsummerwind wants to merge 1 commit intosopaco:mainfrom
donsummerwind:main
Open

search: precompute intent once, reuse across all semantic_search calls#76
donsummerwind wants to merge 1 commit intosopaco:mainfrom
donsummerwind:main

Conversation

@donsummerwind
Copy link
Copy Markdown

Summary

Precompute LLM intent analysis once per search, then reuse across all semantic_search calls. Reduces LLM calls from 5 to 1 per search.

Changes

  • SearchOptions gains precomputed_intent: Option<Arc<EnhancedQueryIntent>>
  • search_handler calls analyze_intent() once before layered_semantic_search
  • layered + 4 semantic_search calls all reuse the same intent
  • Result: ~3-5x search speedup

Performance

Before After
5× LLM calls (~16s) 1× LLM call (~3s warm, 7B cold ~8s)

Files changed

  • cortex-mem-core/src/search/vector_engine.rs
  • cortex-mem-core/src/types.rs
  • cortex-mem-core/src/vector_store/qdrant.rs
  • cortex-mem-service/src/handlers/filesystem.rs
  • cortex-mem-service/src/handlers/search.rs
  • cortex-mem-service/src/main.rs

- SearchOptions gains precomputed_intent: Option<Arc<EnhancedQueryIntent>>
- search handler calls analyze_intent() once before layered_semantic_search
- layered + 4 semantic_search calls all reuse the same intent (5 LLM calls → 1)
- ~3-5x search speedup depending on model warm/cold state

fixes: 5 serial LLM calls per search (intent analysis bottleneck)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant