feat: replace Ollama with node-llama-cpp for embeddings by dandaka · Pull Request #19 · dandaka/traul

dandaka · 2026-03-18T22:30:00Z

Summary

Add node-llama-cpp as primary embedding backend (Qwen3-Embedding-0.6B, ~639MB auto-download on first use)
Route embed(), embedQuery(), embedBatch() through in-process llama.ts with automatic Ollama HTTP fallback
Add embedQuery() for asymmetric query embedding with Qwen3 instruction prefix
Update traul search to use embedQuery() for better semantic search quality

Architecture

Three layers: llama.ts (node-llama-cpp singleton, model-specific formatting) → embeddings.ts (public API, routing + Ollama fallback) → callers (search.ts, embed.ts)

Key changes

src/lib/llama.ts — new: singleton wrapper, lazy model loading, idle unload (5min), Qwen3 formatting
src/lib/embeddings.ts — rewritten: llama-first routing, Ollama fallback with separate model name
src/commands/search.ts — uses embedQuery() instead of embed()
skill.md — updated embedding backend docs

Test plan

192 tests pass (0 failures)
llama.ts formatting helpers (6 tests)
llama.ts wrapper with mocked node-llama-cpp (8 tests)
embeddings.ts llama routing + Ollama fallback (8 tests)
No mock.module cross-file conflicts
Manual: traul search "test" with model download on first run

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds `traul reset <layer>` (sync, chunks, embed, all) with optional --source filter for sync layer. Deprecates the old `reset-embed` command. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Implements isQwenEmbeddingModel, formatQuery, and formatDoc pure functions with full test coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Implements lazy-loaded singleton LlamaCpp wrapper with embedDoc, embedQuery, embedDocBatch, and idle-timeout disposal. Fixes retry logic in embedSingle to only truncate-retry when text exceeds the retry limit, preventing false success on mock failures in tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- embeddings.ts now routes embed/embedQuery/embedBatch through llama.ts by default - Ollama HTTP remains as automatic fallback when llama throws - Added embedQuery() export for asymmetric embedding (query vs doc) - Added _resetFallbackForTesting() export for test isolation - Ollama fallback uses OLLAMA_MODEL constant (snowflake-arctic-embed2), not the HF URI - EMBED_MODEL export now reflects llama.LLAMA_EMBED_MODEL - Tests rewritten to mock llama.ts instead of fetch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Split llama wrapper tests into llama-wrapper.test.ts to avoid Bun's global mock.module conflicts with embeddings.test.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Migrations unconditionally wrote chunker_version, embed_model, and embed_dims on every startup. When multiple traul processes ran concurrently (e.g. parallel searches), they contended for the write lock causing SQLITE_BUSY crashes. Now only writes when values differ. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Set logLevel to "error" when initializing llama to hide the "control-looking token was not control-type" warning that printed on every hybrid search query. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The mock module was missing the LlamaLogLevel export, causing CI to fail with "Export named 'formatDoc' not found" after we added the LlamaLogLevel import to llama.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dandaka and others added 20 commits March 18, 2026 21:58

feat: add traul_meta table for version tracking

696cf70

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: export CHUNKER_VERSION constant

0c97078

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add getMeta/setMeta for version tracking

4d35356

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add resetSyncCursors and resetChunks methods

aa84ee6

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add auto-migration for chunker/embed version changes

d660d54

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: run auto-migration on startup

b5ec39c

feat: add traul reset command for manual data layer resets

abf14d0

Adds `traul reset <layer>` (sync, chunks, embed, all) with optional --source filter for sync layer. Deprecates the old `reset-embed` command. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: document traul reset command and auto-migration

4271e9f

chore: bump version to 0.2.0

5593df4

chore: add node-llama-cpp dependency

f2ca8ea

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add llama.ts formatting helpers with tests

fbe2cfb

Implements isQwenEmbeddingModel, formatQuery, and formatDoc pure functions with full test coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: use embedQuery() for search queries

b38fdef

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: resolve mock.module conflicts between test files

22c064b

Split llama wrapper tests into llama-wrapper.test.ts to avoid Bun's global mock.module conflicts with embeddings.test.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: document node-llama-cpp embedding backend

30eebcb

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: bump CHUNKER_VERSION to 2 to trigger rechunking

ed31a55

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: suppress noisy node-llama-cpp token type warning

8d88a88

Set logLevel to "error" when initializing llama to hide the "control-looking token was not control-type" warning that printed on every hybrid search query. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dandaka merged commit f84bbc0 into main Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: replace Ollama with node-llama-cpp for embeddings#19

feat: replace Ollama with node-llama-cpp for embeddings#19
dandaka merged 20 commits intomainfrom
feat/node-llama-cpp-embeddings

dandaka commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dandaka commented Mar 18, 2026

Summary

Architecture

Key changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant