fix(search): wire search_skills to SkillRanker embedding cache#81
Open
fabioscarsi wants to merge 2 commits intoHKUDS:mainfrom
Open
fix(search): wire search_skills to SkillRanker embedding cache#81fabioscarsi wants to merge 2 commits intoHKUDS:mainfrom
fabioscarsi wants to merge 2 commits intoHKUDS:mainfrom
Conversation
Both paths in search_skills/hybrid_search_skills now go through a shared SkillRanker singleton: - SkillSearchEngine._bm25_phase: previously instantiated a fresh SkillRanker per call, reloading the pickle cache each time. - hybrid_search_skills candidate loop: previously generated embeddings via generate_embedding on every query, ignoring the persistent cache entirely. The persistent pickle at .openspace/skill_embedding_cache/skill_embeddings_v1.pkl is reused across invocations and survives process restarts. Candidates without a stable skill_id are skipped to avoid cache key collisions. On a 28-skill local registry with text-embedding-3-small via OpenRouter, query latency drops from 8-14s to ~300ms after warm-up. Top-1 match identity is preserved on all test queries (score drift <0.001). Cloud candidates that already carry _embedding from the server-side search endpoint are skipped and unchanged.
The embedding cache was keyed by skill_id alone, so any edit to a
SKILL.md body or description produced stale embeddings that
get_or_compute_embedding kept serving until a manual invalidate_cache
call or a file deletion. Previously this was mostly invisible because
select_skills_with_llm was the only caller exercising the cache; after
the preceding commit wires search_skills through the same path the
staleness becomes observable on every MCP query.
Use "{skill_id}:{sha256(embedding_text)[:16]}" as the cache key, so
any change to the text produced by _build_embedding_text (name +
description + body, truncated to SKILL_EMBEDDING_MAX_CHARS) causes
an automatic cache miss and a fresh embedding. Both
get_or_compute_embedding and _embedding_rank are updated.
Bounded growth: on each successful new compute, older entries with
the same "{skill_id}:" prefix are pruned in the same write. Net
result: at most one cached embedding per skill_id at any time, aside
from transient migration state.
Backward compatibility: existing pickle files keyed by skill_id alone
are migrated in place on first lookup (no API call needed); the old
key is dropped after migration.
invalidate_cache(skill_id) now removes every content-addressed entry
and any legacy entry for that skill_id, so historical versions do
not leak across evolutions.
Functional benchmark on a 28-skill local registry with
text-embedding-3-small via OpenRouter: top-1 match identity preserved
on all test queries, score drift below 0.001, warm latency
~260-400ms/query (unchanged from the previous commit).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
hybrid_search_skillsinopenspace/cloud/search.pygenerates candidate embeddings viagenerate_embeddingon every invocation, andSkillSearchEngine._bm25_phaseinstantiates a freshSkillRankerper call that reloads the pickle cache each time. The persistent embedding cache inopenspace/skill_engine/skill_ranker.pyis unused by the MCPsearch_skillstool path.On a 28-skill local registry with
text-embedding-3-smallvia OpenRouter, this produces 8-14 seconds of latency per query.Additionally,
SkillRanker._embedding_cacheis keyed byskill_idalone, so any edit to aSKILL.mdbody or description would leave stale embeddings in place until a manualinvalidate_cachecall. This behaviour was masked while the cache was only used byselect_skills_with_llm, but is exposed by wiring it intosearch_skills.Fix
Commit 1 — wire the cache (
fc95348). Route both paths through a sharedSkillRankersingleton, reusing the persistent pickle at.openspace/skill_embedding_cache/skill_embeddings_v1.pklacross invocations. Candidates without a stableskill_idare skipped to avoid cache key collisions.Commit 2 — content-addressed cache key (
3708359). Change the cache key fromskill_idto"{skill_id}:{sha256(embedding_text)[:16]}". Any change to the text that feeds the embedding (name, description, body, truncated toSKILL_EMBEDDING_MAX_CHARS) flips the key and forces a fresh embedding. Bothget_or_compute_embeddingand_embedding_rankuse this pattern;invalidate_cache(skill_id)was updated to remove every entry with thatskill_idprefix plus any legacy-format entry.Backward compatibility: existing pickle files keyed by
skill_idalone are migrated in place on first lookup (no API call). The old key is dropped after migration, so no double bookkeeping persists.Bounded cache growth: when a new embedding is computed for a skill, older entries carrying the same
"{skill_id}:"prefix are pruned in the same write. Net result: at most one cached embedding perskill_idin steady state.Measured impact
On 28 local skills,
text-embedding-3-smallvia OpenRouter:Per-query latency (warm) on 6 test queries after both commits: 259, 292, 308, 333, 298, 272 ms (mean 294 ms).
Out of scope
_search_rank): unchanged. Cloud candidates carry_embeddingfrom the server and are skipped by the new code path.SkillRanker._save_cache: pre-existing;asynciosingle-thread usage makes races unlikely in practice.No new dependency;
SkillRankeralready ships in the repo and is exercised byselect_skills_with_llm. The only stdlib addition ishashlib.