kbx — Local Knowledge Base with Hybrid Search

Give your AI agents persistent memory. Index your markdown notes, meeting transcripts, and documentation into a hybrid search engine. Search with keywords or natural language. Everything runs locally — your data never leaves your machine.

kbx combines SQLite FTS5 full-text search with LanceDB vector search using Qwen3 embeddings — all on-device, with Apple Silicon acceleration via MLX.

You can read more about kbx's progress in the CHANGELOG.

Quick Start

# Install
pip install kbx                        # core CLI + FTS5 search
pip install "kbx[search]"              # + vector search (Qwen3 embeddings)
pip install "kbx[search,mlx]"          # + Apple Silicon acceleration

# Set up a knowledge base
kbx init                               # create kbx.toml in the current directory

# Index your markdown files
kbx index run                          # index everything under memory/
kbx index run --no-embed               # text-only index (fast, no model needed)

# Search
kbx search "quarterly planning"        # hybrid search (FTS5 + vector)
kbx search "quarterly planning" --fast # keyword-only (~instant, no model needed)
kbx search "MFA rollout" --json        # structured output for scripts

# Browse
kbx view "memory/notes/decisions.md"   # read a document
kbx view "#a1b2c3"                     # by content-hash prefix
kbx list --type notes --from 2026-01-01

Using with AI Agents

kbx is built for agentic workflows. The --json output format, structured error responses, and built-in agent playbook make it a natural fit for AI assistants.

# Orient: get a compressed overview of all entities (~2K tokens)
kbx context

# Search with structured output
kbx search "authentication" --fast --json --limit 5

# Look up a person
kbx person find "Alice" --json

# Timeline of everything mentioning a project
kbx person timeline "Cloud Migration" --from 2026-01-01 --json

# Take notes that persist across sessions
kbx memory add "Decision: use Postgres" --tags decision,infra --pin
kbx memory add "Promoted to Staff" --entity "Bob"

# Pin important docs to the context window
kbx pin "memory/notes/priorities.md"

When you run kbx --help, it prints an agent playbook alongside the standard CLI help — a complete reference for AI agents to self-orient and use the knowledge base effectively.

MCP Server

kbx exposes an MCP server for tighter integration with Claude Desktop, Claude Code, Cursor, and other MCP-compatible tools.

Tools exposed:

kb_search — Hybrid or FTS-only search with date/tag filters
kb_person_find — Entity lookup by name, alias, or partial match
kb_person_timeline — Chronological document list for an entity
kb_view — Retrieve a document by path, glob, or #hash
kb_context — Compressed entity index for session orientation
kb_memory_add — Create notes or record facts about entities
kb_pin / kb_unpin — Pin documents to the context window
kb_usage — Index status and usage instructions

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "kbx": {
      "command": "/Users/YOU/.local/bin/kbx",
      "args": ["mcp"]
    }
  }
}

Note: Claude Desktop does not inherit your shell PATH. Use the full path to kbx — find it with which kbx (typically ~/.local/bin/kbx when installed via uv tool install).

Claude Code (.claude/settings.local.json):

{
  "mcpServers": {
    "kbx": {
      "command": "kbx",
      "args": ["mcp"],
      "type": "stdio"
    }
  }
}

See MCP plugin docs for full tool parameter reference.

Python API

Use kbx as a library in your own applications:

from kb import KnowledgeBase

with KnowledgeBase(thread_safe=True) as kb:
    # Search
    results = kb.search("cloud migration")

    # Entities
    people = kb.list_entities(entity_type="person")
    alice = kb.get_entity("Alice")
    timeline = kb.get_entity_timeline("Alice")

    # Context
    ctx = kb.context()

    # Index
    kb.index()

The KnowledgeBase class manages the full lifecycle — DB connections, embedder, auto-reindexing of stale files. All methods return Pydantic models.

See architecture docs for the full API surface.

Architecture

Write-through principle: Markdown files are the source of truth. All data writes go to flat files first; the database is a derived index rebuilt from those files. The DB is disposable — delete it and re-index.

Markdown files (source of truth)
        │
        ▼
┌─────────────────────────────────────────────────────┐
│                   Source Adapters                     │
│  meetings.py — walk memory/meetings/YYYY/MM/DD/     │
│  memory.py   — walk memory/people/, projects/, ...  │
└────────────────────────┬────────────────────────────┘
                         │ ParsedDocument
                         ▼
┌─────────────────────────────────────────────────────┐
│                      Indexer                          │
│  chunk → embed → store → link entities               │
└──────────┬──────────────────────────┬───────────────┘
           │                          │
           ▼                          ▼
┌──────────────────┐    ┌─────────────────────────────┐
│     SQLite        │    │         LanceDB              │
│  docs, chunks,    │    │  Qwen3-Embedding-0.6B        │
│  FTS5, entities,  │    │  1024-dim vectors             │
│  facts, mentions  │    │  float32, instruction-aware   │
└──────────────────┘    └─────────────────────────────┘
           │                          │
           └────────────┬─────────────┘
                        ▼
┌─────────────────────────────────────────────────────┐
│                  Hybrid Search                       │
│  FTS5 (BM25) + Vector → RRF Fusion → Recency Weight │
└─────────────────────────────────────────────────────┘

Search

kbx supports two search modes:

Mode	Flag	Speed	Method
Fast	`--fast`	~instant	FTS5 keyword search only
Hybrid	(default)	~2s	FTS5 + vector search + RRF fusion

Hybrid search uses Reciprocal Rank Fusion (RRF) to combine keyword and semantic results, with a 90-day half-life recency weight. A strong-signal fast path skips vector search entirely when FTS5 produces a high-confidence match.

Score interpretation: 0.8+ strong | 0.5–0.8 worth reading | <0.5 noise

See search docs for the full pipeline, score normalisation, and fusion strategy.

Entity System

kbx automatically links people, projects, teams, and glossary terms to your documents:

kbx person find "Alice" --json        # profile + linked documents
kbx person timeline "Alice"           # chronological mentions
kbx person create "Bob" --role "SRE Lead" --team "Platform"
kbx project find "Cloud Migration"    # project profile + linked docs
kbx entity stale --days 30            # entities not mentioned recently

Entities are seeded from memory/people/*.md and memory/projects/*.md files, then linked to documents via five-tier matching: YAML tags → title participants → title substrings → source IDs → content name matching.

See entity docs for the full linking pipeline.

Sync & Ingest

Pull meeting transcripts from external sources:

# Granola API sync
kbx sync granola --since 2026-01-01

# Notion AI Meeting Notes sync
kbx sync notion --since 2026-01-01

# Granola zip export ingest
kbx ingest export.zip

# View and edit synced meeting notes
kbx granola view <calendar-uid>
kbx granola edit <calendar-uid> --append "Action: follow up with Alice"

Sync is incremental — only new or updated meetings are fetched. Attendees are automatically matched to existing entities. See Granola plugin docs for configuration.

Configuration

kbx looks for configuration in this order:

$KBX_CONFIG environment variable
./kbx.toml in the current directory (walk up from CWD)
~/.config/kbx/config.toml

Run kbx init to generate a starter config.

Optional Extras

Extra	What it adds
`search`	LanceDB + sentence-transformers + NumPy for vector search
`mlx`	MLX backend for faster embeddings on Apple Silicon
`mcp`	MCP server for AI tool integration
`all`	Everything above plus test and dev dependencies

Install with: pip install "kbx[search,mlx,mcp]"

Requires Python 3.10+.

Data Storage

Index stored in the data directory (configurable via kbx.toml or $KB_DATA_DIR):

kbx-data/
├── metadata.db        # SQLite — documents, chunks, FTS5, entities, facts
└── vectors/           # LanceDB — Qwen3 embedding vectors (1024-dim)

The database is a derived index. Delete it and kbx index run to rebuild from your markdown files.

Development

git clone https://github.com/tenfourty/kbx.git
cd kbx
uv sync --all-extras
uv run pre-commit install
uv run pytest -x -q --cov           # 1361 tests, 90%+ coverage
uv run mypy src/                     # strict mode

Quick CI check locally:

make ci                              # mirror exact GitHub CI pipeline
make fix                             # auto-fix lint + format issues

See CONTRIBUTING.md for guidelines and testing docs for the test strategy.

Documentation

Doc	What it covers
Architecture	System design, data flow, module dependencies, Python API
Search	FTS5 + vector + RRF fusion pipeline, score normalisation
Entities	Entity seeding, five-tier linking, disambiguation
Indexing	Walk → chunk → embed → store pipeline
Chunking	Markdown-aware chunking strategy
CLI Reference	All commands and options
Output Formatting	JSON, table, CSV, JSONL, jq, field selection
Context Layer	Compressed entity index for AI agents
Testing	Test strategy, fixtures, markers
MCP Plugin	MCP server tools and resources
MLX Plugin	Apple Silicon embedding acceleration
Granola Plugin	Meeting transcript sync (view, edit, push)
Notion Plugin	Notion AI Meeting Notes sync
Integration	Ingest, migrations, search quality

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 241 Commits
.claude		.claude
.github		.github
docs		docs
example		example
scripts		scripts
src/kb		src/kb
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
IDEAS.md		IDEAS.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
mailmap		mailmap
pii-replacements.txt		pii-replacements.txt
pyproject.toml		pyproject.toml
release-please-config.json		release-please-config.json
restore-ggshield.txt		restore-ggshield.txt
runs		runs
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kbx — Local Knowledge Base with Hybrid Search

Quick Start

Using with AI Agents

MCP Server

Python API

Architecture

Search

Entity System

Sync & Ingest

Configuration

Optional Extras

Data Storage

Development

Documentation

License

About

Uh oh!

Releases 63

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

kbx — Local Knowledge Base with Hybrid Search

Quick Start

Using with AI Agents

MCP Server

Python API

Architecture

Search

Entity System

Sync & Ingest

Configuration

Optional Extras

Data Storage

Development

Documentation

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 63

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages