Skip to content

fix(search): wire search_skills to SkillRanker embedding cache#81

Open
fabioscarsi wants to merge 2 commits intoHKUDS:mainfrom
fabioscarsi:feat/wire-skill-ranker-cache-to-search
Open

fix(search): wire search_skills to SkillRanker embedding cache#81
fabioscarsi wants to merge 2 commits intoHKUDS:mainfrom
fabioscarsi:feat/wire-skill-ranker-cache-to-search

Conversation

@fabioscarsi
Copy link
Copy Markdown

@fabioscarsi fabioscarsi commented Apr 18, 2026

Problem

hybrid_search_skills in openspace/cloud/search.py generates candidate embeddings via generate_embedding on every invocation, and SkillSearchEngine._bm25_phase instantiates a fresh SkillRanker per call that reloads the pickle cache each time. The persistent embedding cache in openspace/skill_engine/skill_ranker.py is unused by the MCP search_skills tool path.

On a 28-skill local registry with text-embedding-3-small via OpenRouter, this produces 8-14 seconds of latency per query.

Additionally, SkillRanker._embedding_cache is keyed by skill_id alone, so any edit to a SKILL.md body or description would leave stale embeddings in place until a manual invalidate_cache call. This behaviour was masked while the cache was only used by select_skills_with_llm, but is exposed by wiring it into search_skills.

Fix

Commit 1 — wire the cache (fc95348). Route both paths through a shared SkillRanker singleton, reusing the persistent pickle at .openspace/skill_embedding_cache/skill_embeddings_v1.pkl across invocations. Candidates without a stable skill_id are skipped to avoid cache key collisions.

Commit 2 — content-addressed cache key (3708359). Change the cache key from skill_id to "{skill_id}:{sha256(embedding_text)[:16]}". Any change to the text that feeds the embedding (name, description, body, truncated to SKILL_EMBEDDING_MAX_CHARS) flips the key and forces a fresh embedding. Both get_or_compute_embedding and _embedding_rank use this pattern; invalidate_cache(skill_id) was updated to remove every entry with that skill_id prefix plus any legacy-format entry.

Backward compatibility: existing pickle files keyed by skill_id alone are migrated in place on first lookup (no API call). The old key is dropped after migration, so no double bookkeeping persists.

Bounded cache growth: when a new embedding is computed for a skill, older entries carrying the same "{skill_id}:" prefix are pruned in the same write. Net result: at most one cached embedding per skill_id in steady state.

Measured impact

On 28 local skills, text-embedding-3-small via OpenRouter:

before after
query latency (warm) 8-14 s ~300 ms
top-1 match identity preserved on all test queries
score drift <0.001
cache staleness after SKILL.md edit persists until manual invalidation automatically invalidated on next lookup

Per-query latency (warm) on 6 test queries after both commits: 259, 292, 308, 333, 298, 272 ms (mean 294 ms).

Out of scope

  • Cloud-only paths (server-ranked _search_rank): unchanged. Cloud candidates carry _embedding from the server and are skipped by the new code path.
  • Thread-safety of SkillRanker._save_cache: pre-existing; asyncio single-thread usage makes races unlikely in practice.

No new dependency; SkillRanker already ships in the repo and is exercised by select_skills_with_llm. The only stdlib addition is hashlib.

Both paths in search_skills/hybrid_search_skills now go through a
shared SkillRanker singleton:

- SkillSearchEngine._bm25_phase: previously instantiated a fresh
  SkillRanker per call, reloading the pickle cache each time.
- hybrid_search_skills candidate loop: previously generated
  embeddings via generate_embedding on every query, ignoring the
  persistent cache entirely.

The persistent pickle at
.openspace/skill_embedding_cache/skill_embeddings_v1.pkl is
reused across invocations and survives process restarts.
Candidates without a stable skill_id are skipped to avoid cache
key collisions.

On a 28-skill local registry with text-embedding-3-small via
OpenRouter, query latency drops from 8-14s to ~300ms after
warm-up. Top-1 match identity is preserved on all test queries
(score drift <0.001).

Cloud candidates that already carry _embedding from the
server-side search endpoint are skipped and unchanged.
The embedding cache was keyed by skill_id alone, so any edit to a
SKILL.md body or description produced stale embeddings that
get_or_compute_embedding kept serving until a manual invalidate_cache
call or a file deletion. Previously this was mostly invisible because
select_skills_with_llm was the only caller exercising the cache; after
the preceding commit wires search_skills through the same path the
staleness becomes observable on every MCP query.

Use "{skill_id}:{sha256(embedding_text)[:16]}" as the cache key, so
any change to the text produced by _build_embedding_text (name +
description + body, truncated to SKILL_EMBEDDING_MAX_CHARS) causes
an automatic cache miss and a fresh embedding. Both
get_or_compute_embedding and _embedding_rank are updated.

Bounded growth: on each successful new compute, older entries with
the same "{skill_id}:" prefix are pruned in the same write. Net
result: at most one cached embedding per skill_id at any time, aside
from transient migration state.

Backward compatibility: existing pickle files keyed by skill_id alone
are migrated in place on first lookup (no API call needed); the old
key is dropped after migration.

invalidate_cache(skill_id) now removes every content-addressed entry
and any legacy entry for that skill_id, so historical versions do
not leak across evolutions.

Functional benchmark on a 28-skill local registry with
text-embedding-3-small via OpenRouter: top-1 match identity preserved
on all test queries, score drift below 0.001, warm latency
~260-400ms/query (unchanged from the previous commit).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant