Skip to content

Code review fixes: SQL params, symlinks, FTS scaling, typing, tests#13

Open
aausch wants to merge 15 commits intoglommer:mainfrom
aausch:aausch/fixes
Open

Code review fixes: SQL params, symlinks, FTS scaling, typing, tests#13
aausch wants to merge 15 commits intoglommer:mainfrom
aausch:aausch/fixes

Conversation

@aausch
Copy link
Copy Markdown

@aausch aausch commented Mar 25, 2026

Addresses issues found in a Claude-powered code review of the codebase.

Fixes

  • Parameterize SQL LIMIT in getStaleEmbeddings (was interpolated via template literal)
  • Replace hardcoded absolute path in .mcp.json with just codemogger mcp
  • Skip symlinks in file walker (directory symlinks could escape the intended tree; file symlinks could index unexpected files)
  • Distinguish a missing .gitignore (ENOENT, silent) from an unreadable one (other errors, now reported)
  • Surface embedding batch failures as warnings in IndexResult.errors instead of crashing

Refactors

  • Replace all as any[] / as any DB casts with typed row interfaces (CodebaseRow, StaleEmbeddingRow, etc.) so column renames are caught at compile time
  • Incremental FTS updates: replace DROP/CREATE full rebuild with ensureFtsTable + per-file populateFtsForFiles + optimizeFts. The existing ON DELETE CASCADE on fts.chunk_id handles cleanup automatically when chunks are replaced
  • Fix FTS search scaling: replace N codebase existence checks with a single sqlite_master GLOB 'fts_*' query, and JOIN chunk data inline instead of per-row lookups

Dependencies

  • Upgrade @tursodatabase/database from pre-release 0.5.0-pre.14 to stable 0.5.1
  • Remove deprecated boolean@3.2.0 transitive dep by overriding global-agent to ^4.1.3 (which dropped it)
  • Note: tree-sitter grammar version alignment attempted but reverted — ^0.25.0 packages don't exist yet for most grammars; documented as a known issue

Docs / chore

  • Document MAX_CHUNK_LINES, BATCH, and pre-release DB dependency
  • Add README note on the @tursodatabase/database pre-release status

Tests

  • test/walker.test.ts: 7 new tests covering file discovery, size limit, .gitignore, symlink skipping (dir + file), unreadable .gitignore, and language filter
  • test/index.test.ts: 3 new tests covering chunk count, embed failure capture, and re-index deduplication

Test plan

  • bun test — all 22 tests pass
  • bun run build — CLI builds cleanly
  • Manual: codemogger index . on a real repo, verify FTS search still works
  • Manual: verify .mcp.json works after installing the package globally

🤖 Generated with Claude Code

aausch and others added 15 commits March 24, 2026 23:37
Replace template literal interpolation with a prepared statement ?
parameter, consistent with all other queries in the file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous args pointed to the original developer's local machine path.
Replace with just 'codemogger mcp' so the config works for anyone who
has the package installed globally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Grammars were scattered across ^0.23.x, ^0.24.x, and ^0.25.x against
web-tree-sitter@0.26.5. Mismatched WASM ABI versions can cause silent
parse failures or wrong ASTs. Align everything to ^0.25.0.

Run `bun install` to apply.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Directory symlinks could recurse outside the intended tree (and into
cycles). File symlinks could index files the user expects to be excluded.
Skip both explicitly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously an error from the embedder would propagate up and abort the
entire index operation, leaving no record of what failed. Wrap each
batch in try/catch: record the failure in IndexResult.errors and print
a warning to stderr so the user sees a partial index rather than a
crash.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
An empty catch was treating permission errors the same as 'no
.gitignore exists'. Now only ENOENT is silently ignored; any other
error (EACCES, EISDIR, etc.) is added to the scan error list.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Define private row types (CodebaseRow, StaleEmbeddingRow, etc.) for
each query result and replace all 'as any[]' / 'as any' casts. Column
renames are now caught at compile time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously every index run dropped and recreated the entire FTS table.
Now:
- ensureFtsTable() creates the table+index once (idempotent)
- populateFtsForFiles() inserts FTS entries per file after chunk upsert
  (the ON DELETE CASCADE on fts.chunk_id handles cleanup automatically
  when old chunks are deleted by batchUpsertAllFileChunks)
- optimizeFts() runs OPTIMIZE INDEX once at the end

Re-indexing a large codebase now only rewrites the changed files'
FTS entries instead of rebuilding everything from scratch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously ftsSearch did:
- SELECT id FROM codebases (1 query)
- N * SELECT from sqlite_master to check if each FTS table exists
- M individual SELECT from chunks per FTS result row

Now:
- One SELECT from sqlite_master GLOB 'fts_*' discovers all tables
- JOIN chunks inline in the FTS query eliminates per-row lookups

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a note in README explaining why the pre-release version is used
and flagging it for upgrade once a stable 0.5.x is released.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add inline comments explaining the rationale behind the two undocumented
numeric constants. The RRF k=60 constant in rank.ts was already
documented in its JSDoc comment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0.25.x versions have not been published for: tree-sitter-c, c-sharp,
cpp, java, php, ruby, rust, scala, or typescript. Revert those to the
latest available (0.23.x or 0.24.x).

The version mismatch with web-tree-sitter@0.26.5 remains a known issue.
It can be addressed when upstream grammar packages publish 0.25.x
compatible WASM builds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
walker.test.ts covers:
- TypeScript file discovery
- 1MB size limit
- .gitignore pattern filtering
- Symlink skipping (directory and file)
- Unreadable .gitignore reported as error
- Language filter

index.test.ts covers:
- Basic chunk-count assertion
- Embedding batch failures captured in errors (not a crash)
- Re-indexing unchanged files skips them (hash check)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Moves from the pre-release 0.5.0-pre.14 to the first stable 0.5.x
release.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
boolean was pulled in via:
  @huggingface/transformers -> onnxruntime-node -> global-agent@3 -> boolean

global-agent@4.x dropped boolean (and roarr) entirely. Added an
overrides entry to force global-agent to ^4.1.3 since onnxruntime-node
pins ^3.0.0.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant