Problem
On a large codebase (~200K nodes), every search_graph call using name_pattern= takes 1.5–2.5s regardless of how specific the pattern is. An exact name search is as slow as a broad regex:
search_graph(name_pattern="specificFunctionName") → 1.5s, 70% CPU
search_graph(label="Method", name_pattern=".Controller.") → 2.0s, 88% CPU
Scales linearly with node count — confirmed across four projects ranging from 1K to 200K+ nodes. The query= (BM25) path is unaffected and fast.
### Root cause — three compounding bugs
sqlite_iregexp recompiles the regex on every row (store.c:431)
static void sqlite_iregexp(sqlite3_context *ctx, int argc, sqlite3_value **argv) {
cbm_regex_t re;
int rc = cbm_regcomp(&re, pattern, CBM_REG_EXTENDED | CBM_REG_NOSUB | CBM_REG_ICASE);
// ...
rc = cbm_regexec(&re, text, 0, NULL, 0);
cbm_regfree(&re); // ← compiled and freed for every row
}
For a Method-scoped search across a large project, the pattern is compiled and freed once per node. The standard SQLite fix is sqlite3_get_auxdata/sqlite3_set_auxdata to cache the compiled regex for the lifetime of the statement.
- The count query runs a full second scan (
store.c:2321)
snprintf(count_sql, sizeof(count_sql), "SELECT COUNT(*) FROM (%s)", sql);
The count query executes before the data query with identical WHERE conditions, so iregexp() fires twice for every row — doubling the already-expensive work.
cbm_extract_like_hints is implemented but never called (store.c:2018, store.h:559)
There is already a function that extracts literal segments from a regex pattern to build LIKE pre-filters:
int cbm_extract_like_hints(const char *pattern, char **out, int max_out);
For .*Controller.* it yields ["Controller"], enabling:
WHERE n.name LIKE '%Controller%' AND iregexp(?1, n.name)
The idx_nodes_name (project, name) index can satisfy the LIKE clause and cut the regex scan to only matching rows. But cbm_extract_like_hints is never called from search_where_basic or anywhere in the search path — it is dead code.
Suggested fixes
Fix 1 — cache the compiled regex in sqlite_regexp/sqlite_iregexp using sqlite3_set_auxdata:
cbm_regex_t *re = (cbm_regex_t *)sqlite3_get_auxdata(ctx, 0);
if (!re) {
re = malloc(sizeof(cbm_regex_t));
cbm_regcomp(re, pattern, CBM_REG_EXTENDED | CBM_REG_NOSUB | CBM_REG_ICASE);
sqlite3_set_auxdata(ctx, 0, re, (void(*)(void*))regex_free_cb);
}
Fix 2 — wire cbm_extract_like_hints into search_where_basic to prepend LIKE conditions before the regex clause. The function is already correct; it just needs to be called.
Fix 3 — merge the count and data query into a single pass (e.g. COUNT(*) OVER () window function) to eliminate the double scan.
Note on query= mode
query= (BM25/FTS5) already takes a completely separate fast path (handle_search_graph:1318) and is unaffected. This bug is specific to name_pattern= and the cbm_store_search code path.
### Environment
- Version: 0.6.0
- OS: Linux (Ubuntu, x86_64)
- 216,648 nodes, 661,477 edges, 509MB database (PHP monolith)
Problem
On a large codebase (~200K nodes), every
search_graphcall usingname_pattern=takes 1.5–2.5s regardless of how specific the pattern is. An exact name search is as slow as a broad regex:search_graph(name_pattern="specificFunctionName") → 1.5s, 70% CPU
search_graph(label="Method", name_pattern=".Controller.") → 2.0s, 88% CPU
Scales linearly with node count — confirmed across four projects ranging from 1K to 200K+ nodes. The
query=(BM25) path is unaffected and fast.### Root cause — three compounding bugs
sqlite_iregexprecompiles the regex on every row (store.c:431)For a Method-scoped search across a large project, the pattern is compiled and freed once per node. The standard SQLite fix is
sqlite3_get_auxdata/sqlite3_set_auxdatato cache the compiled regex for the lifetime of the statement.store.c:2321)The count query executes before the data query with identical WHERE conditions, so
iregexp()fires twice for every row — doubling the already-expensive work.cbm_extract_like_hintsis implemented but never called (store.c:2018, store.h:559)There is already a function that extracts literal segments from a regex pattern to build LIKE pre-filters:
For
.*Controller.*it yields["Controller"], enabling:WHERE n.name LIKE '%Controller%' AND iregexp(?1, n.name)The
idx_nodes_name (project, name)index can satisfy the LIKE clause and cut the regex scan to only matching rows. Butcbm_extract_like_hintsis never called fromsearch_where_basicor anywhere in the search path — it is dead code.Suggested fixes
Fix 1 — cache the compiled regex in
sqlite_regexp/sqlite_iregexpusingsqlite3_set_auxdata:Fix 2 — wire
cbm_extract_like_hintsintosearch_where_basicto prepend LIKE conditions before the regex clause. The function is already correct; it just needs to be called.Fix 3 — merge the count and data query into a single pass (e.g.
COUNT(*) OVER ()window function) to eliminate the double scan.Note on
query=modequery=(BM25/FTS5) already takes a completely separate fast path (handle_search_graph:1318) and is unaffected. This bug is specific toname_pattern=and thecbm_store_searchcode path.### Environment