Add fast raw memory search path by cnguyen14 · Pull Request #189 · XortexAI/XMem

cnguyen14 · 2026-05-21T03:08:09Z

Closes #163

Summary

expands /v1/memory/search into a low-latency raw search path across profile, temporal, summary, snippet, and code domains
keeps answer synthesis optional via answer=true
adds cached profile catalogs and cached raw retrieval plans
returns per-mode latency timings plus rolling p50/p95/p99 stats

Tests

pytest tests/api/test_dependencies_and_routes.py tests/unit/test_schemas.py -q
python3 -m ruff check src/api/schemas.py src/api/routes/memory.py src/pipelines/retrieval.py tests/api/test_dependencies_and_routes.py

gemini-code-assist

Code Review

This pull request enhances the memory search functionality by adding support for 'snippet' and 'code' domains, implementing optional synthesized answer generation, and introducing latency tracking with percentile statistics. It also adds caching mechanisms for retrieval plans and profile catalogs to improve efficiency. Review feedback highlights a thread-safety concern when iterating over the latency samples dictionary, suggests parallelizing I/O-bound search tasks using asyncio.gather to reduce total latency, and recommends adding an eviction policy to the profile catalog cache to prevent potential memory leaks.

gemini-code-assist · 2026-05-21T03:09:34Z

+
+def _latency_stats() -> Dict[str, Dict[str, float]]:
+    stats: Dict[str, Dict[str, float]] = {}
+    for mode, samples in _latency_samples.items():


Iterating over _latency_samples.items() directly is not thread-safe because _latency_samples is a defaultdict. If a concurrent request records latency for a new mode (causing a new key to be added) while this loop is running, it will raise a RuntimeError: dictionary changed size during iteration.

Consider wrapping the items in a list() to create a snapshot of the keys before iteration.

Suggested change

for mode, samples in _latency_samples.items():

for mode, samples in list(_latency_samples.items()):

gemini-code-assist · 2026-05-21T03:09:34Z

+        if "profile" in plan:
+            results, elapsed = await _timed("profile", _search_profile, pipeline, user_id)
+            latency_ms["profile"] = elapsed
+            all_results.extend(results)
+        if "temporal" in plan:
+            results, elapsed = await _timed("temporal", _search_temporal, pipeline, req.query, user_id, req.top_k)
+            latency_ms["temporal"] = elapsed
+            all_results.extend(results)
+        if "summary" in plan:
+            results, elapsed = await _timed("summary", _search_summary, pipeline, req.query, user_id, req.top_k)
+            latency_ms["summary"] = elapsed
+            all_results.extend(results)
+        if "snippet" in plan:
+            results, elapsed = await _timed("snippet", _search_snippet, pipeline, req.query, user_id, req.top_k)
+            latency_ms["snippet"] = elapsed
+            all_results.extend(results)
+        if "code" in plan:
+            results, elapsed = await _timed("code", _search_code, pipeline, req.query, user_id, req.top_k)
+            latency_ms["code"] = elapsed
+            all_results.extend(results)


The search domains are currently queried sequentially, which negates the performance benefits of having a "fast" raw search path. Since these operations are I/O bound, they should be executed in parallel using asyncio.gather to minimize total latency.

Suggested change

if "profile" in plan:

results, elapsed = await _timed("profile", _search_profile, pipeline, user_id)

latency_ms["profile"] = elapsed

all_results.extend(results)

if "temporal" in plan:

results, elapsed = await _timed("temporal", _search_temporal, pipeline, req.query, user_id, req.top_k)

latency_ms["temporal"] = elapsed

all_results.extend(results)

if "summary" in plan:

results, elapsed = await _timed("summary", _search_summary, pipeline, req.query, user_id, req.top_k)

latency_ms["summary"] = elapsed

all_results.extend(results)

if "snippet" in plan:

results, elapsed = await _timed("snippet", _search_snippet, pipeline, req.query, user_id, req.top_k)

latency_ms["snippet"] = elapsed

all_results.extend(results)

if "code" in plan:

results, elapsed = await _timed("code", _search_code, pipeline, req.query, user_id, req.top_k)

latency_ms["code"] = elapsed

all_results.extend(results)

search_tasks = []

if "profile" in plan:

search_tasks.append(_timed("profile", _search_profile, pipeline, user_id))

if "temporal" in plan:

search_tasks.append(_timed("temporal", _search_temporal, pipeline, req.query, user_id, req.top_k))

if "summary" in plan:

search_tasks.append(_timed("summary", _search_summary, pipeline, req.query, user_id, req.top_k))

if "snippet" in plan:

search_tasks.append(_timed("snippet", _search_snippet, pipeline, req.query, user_id, req.top_k))

if "code" in plan:

search_tasks.append(_timed("code", _search_code, pipeline, req.query, user_id, req.top_k))

if search_tasks:

task_results = await asyncio.gather(*search_tasks)

for (results, elapsed), mode in zip(task_results, plan):

latency_ms[mode] = elapsed

all_results.extend(results)

gemini-code-assist · 2026-05-21T03:09:34Z


        self.embed_fn = embed_fn
        self._snippet_stores: Dict[str, BaseVectorStore] = {}
+        self._profile_catalog_cache: Dict[str, tuple[float, List[Dict[str, str]], list]] = {}


The _profile_catalog_cache uses user_id as a key but lacks an eviction policy. In a production environment with a large number of unique users, this dictionary will grow indefinitely, leading to a memory leak.

Consider using an LRU cache or implementing a simple size-based eviction mechanism to bound memory usage.

Add fast raw memory search path

2863756

cnguyen14 requested review from ishaanxgupta and ved015 as code owners May 21, 2026 03:08

github-actions Bot added tests api pipelines labels May 21, 2026

gemini-code-assist Bot reviewed May 21, 2026

View reviewed changes

cnguyen14 closed this May 21, 2026

This was referenced May 21, 2026

Add fast raw memory search path #194

Open

Add low-latency raw search path separate from agentic answer synthesis #163

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fast raw memory search path#189

Add fast raw memory search path#189
cnguyen14 wants to merge 1 commit into
XortexAI:mainfrom
cnguyen14:codex/fast-search-path-163

cnguyen14 commented May 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

gemini-code-assist Bot May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	for mode, samples in _latency_samples.items():
	for mode, samples in list(_latency_samples.items()):

Conversation

cnguyen14 commented May 21, 2026

Summary

Tests

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant