Skip to content

feat: replace Ollama with node-llama-cpp for embeddings#19

Merged
dandaka merged 20 commits intomainfrom
feat/node-llama-cpp-embeddings
Mar 19, 2026
Merged

feat: replace Ollama with node-llama-cpp for embeddings#19
dandaka merged 20 commits intomainfrom
feat/node-llama-cpp-embeddings

Conversation

@dandaka
Copy link
Copy Markdown
Owner

@dandaka dandaka commented Mar 18, 2026

Summary

  • Add node-llama-cpp as primary embedding backend (Qwen3-Embedding-0.6B, ~639MB auto-download on first use)
  • Route embed(), embedQuery(), embedBatch() through in-process llama.ts with automatic Ollama HTTP fallback
  • Add embedQuery() for asymmetric query embedding with Qwen3 instruction prefix
  • Update traul search to use embedQuery() for better semantic search quality

Architecture

Three layers: llama.ts (node-llama-cpp singleton, model-specific formatting) → embeddings.ts (public API, routing + Ollama fallback) → callers (search.ts, embed.ts)

Key changes

  • src/lib/llama.ts — new: singleton wrapper, lazy model loading, idle unload (5min), Qwen3 formatting
  • src/lib/embeddings.ts — rewritten: llama-first routing, Ollama fallback with separate model name
  • src/commands/search.ts — uses embedQuery() instead of embed()
  • skill.md — updated embedding backend docs

Test plan

  • 192 tests pass (0 failures)
  • llama.ts formatting helpers (6 tests)
  • llama.ts wrapper with mocked node-llama-cpp (8 tests)
  • embeddings.ts llama routing + Ollama fallback (8 tests)
  • No mock.module cross-file conflicts
  • Manual: traul search "test" with model download on first run

🤖 Generated with Claude Code

dandaka and others added 20 commits March 18, 2026 21:58
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds `traul reset <layer>` (sync, chunks, embed, all) with optional
--source filter for sync layer. Deprecates the old `reset-embed` command.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements isQwenEmbeddingModel, formatQuery, and formatDoc pure
functions with full test coverage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements lazy-loaded singleton LlamaCpp wrapper with embedDoc,
embedQuery, embedDocBatch, and idle-timeout disposal. Fixes retry
logic in embedSingle to only truncate-retry when text exceeds the
retry limit, preventing false success on mock failures in tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- embeddings.ts now routes embed/embedQuery/embedBatch through llama.ts by default
- Ollama HTTP remains as automatic fallback when llama throws
- Added embedQuery() export for asymmetric embedding (query vs doc)
- Added _resetFallbackForTesting() export for test isolation
- Ollama fallback uses OLLAMA_MODEL constant (snowflake-arctic-embed2), not the HF URI
- EMBED_MODEL export now reflects llama.LLAMA_EMBED_MODEL
- Tests rewritten to mock llama.ts instead of fetch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split llama wrapper tests into llama-wrapper.test.ts to avoid
Bun's global mock.module conflicts with embeddings.test.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Migrations unconditionally wrote chunker_version, embed_model, and
embed_dims on every startup. When multiple traul processes ran
concurrently (e.g. parallel searches), they contended for the write
lock causing SQLITE_BUSY crashes. Now only writes when values differ.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set logLevel to "error" when initializing llama to hide the
"control-looking token was not control-type" warning that printed
on every hybrid search query.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The mock module was missing the LlamaLogLevel export, causing CI to
fail with "Export named 'formatDoc' not found" after we added the
LlamaLogLevel import to llama.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dandaka dandaka merged commit f84bbc0 into main Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant