(C) 2026, Rijn Buve
This repository contains a solid implementation of Andrej Karpathy's idea for a LLM-maintained knowledge base, based on a Wiki. This implementation is meant for work-related notes, structured as an Obsidian vault, assisted by the semantic database QMD.
The implementation supports Anthropic Claude and Jetbrains Junie (both CLI) to ingest notes into the knowledge base.
The primary goal is efficient decision intelligence: understanding why decisions were taken, on what basis, by whom, and when. Secondary goals include mapping how technologies and systems relate, who is involved in what, and how competitors compare. And 'efficient', because the mechanism needs to be token (and environmentally) efficient.
Division of labor:
- The user curates source files in
raw/. - LLM does all writing, cross-referencing, and bookkeeping in
wiki/. - Obsidian is the UI for entering/accessing notes and asking questions (e.g. through
Claudian).
# 1. Clone this repo
git clone <repo-url> ~/my-knowledge-base
# 2. Create raw/ and wiki/ directories (these are not stored in git)
cd ~/my-knowledge-base
mkdir -p raw/{notes,clips,emails,transcripts,scans,slack} wiki
# 3. Install QMD (the semantic search engine)
npm install -g bun
npm install -g @tobilu/qmd
# 4. Register all subdirectories as QMD collections and build the index
./scripts/qmd-full-reindex.sh
# 5. Install the QMD skill for Claude/Junie
qmd skill install --global --yes
# 6. Register QMD as a Claude Code MCP server (add to ~/.claude/claude_desktop_config.json)
# Or just ask Claude: "read this README.md and install QMD as an MCP server"
# 7. Open this directory as an Obsidian vault: File → Open Folder as VaultAfter setup, put your notes in raw/ and tell Claude: "ingest new raw notes".
cd ~/my-knowledge-base && git pull
./scripts/qmd-full-reindex.sh # re-register and update any new subdirectoriesRequired:
- Claude Code (CLI) — or JetBrains Junie
- Node.js / npm — for installing bun and qmd
- QMD — local semantic search engine (
npm install -g @tobilu/qmd) - Obsidian — vault UI (free, Mac/Windows/Linux)
- git
Optional:
- pdftotext — faster/cheaper PDF extraction (
brew install poppler); LLM vision is the fallback - Obsidian Web Clipper — one-click web article saving to
raw/clips/ - Claudian — run Claude from within Obsidian (ask Claude to install it safely)
- Amphetamine (Mac App Store) — prevents Mac sleep during long overnight ingests
Register QMD as a MCP server in ~/.claude/claude_desktop_config.json (or ask Claude to do it):
{
"mcpServers": {
"qmd": {
"command": "qmd",
"args": ["mcp"]
}
}
}The Slack integration is managed via your claude.ai organization. Authorize it yourself at claude.ai → Settings → Connectors. Once authorized, the Slack tools are available automatically in all Claude sessions — no local configuration needed.
-
Create and collect notes:
- User produces raw notes and stores them in the
raw/notesdirectory. - User uses the Obsidian Web Clipper to store notes in
raw/clips. - User stores
.vttmeeting transcripts inraw/transcripts. - User drags
.emlemails toraw/emails. - User stored handwritten notes or scanned pages (PDF, JPG) in
raw/scans. - User fetches Slack channels and DMs by asking "fetch slack" — messages are written to
raw/slack/.
- User produces raw notes and stores them in the
-
Ingest notes:
- User asks to "ingest new raw notes", "ingest Confluence page
<URL>" or runswiki-ingest.loop.sh. - LLM converts non-Markdown inputs:
.vtttranscripts →raw/transcripts/converted/,.emlemails →raw/emails/converted/,.pdf/.jpgscans →raw/scans/converted/. - LLM partitions files into batches and processes them (large ingests use parallel LLM sessions 2–5; single batches are handled in one session).
- After all batches are done, user says "finalize ingest" to merge session logs, rebuild
_index.mdfiles, and run post-processing (QMD re-index + health check).
- User asks to "ingest new raw notes", "ingest Confluence page
-
Query wiki:
- User asks a high-level question.
- LLM queries semantic database (with the
qmdskill) for relevant page links (fast/token-efficient). - LLM processes suggested pages and produces answer to user.
- LLM stores valuable conversations in
wiki/conversations/to extend the knowledge base.
The combination of using a semantic database to fetch relevant pages before analyzing documents and reasoning about them, makes this implementation of a knowledge significantly faster and more token efficient than when it's using Markdown files only.
These skills commands and natural-language triggers are available:
| Command / phrase | Description |
|---|---|
| "ingest new notes" | Start a new ingest of raw notes (Session 1 — coordinator flow) |
| "fetch slack" | Fetch Slack threads and DMs into raw/slack/, then run wiki-ingest-loop.sh to ingest |
| "ingest next batch" | Continue ingesting the next batch (Sessions 2–N flow) |
| "finalize ingest" | Finalize the ingest: merge logs, rebuild indexes, run post-processing |
| "health check" or "lint" | Check for orphaned pages, broken links, contradictions |
| "add missing [topic]" | Create a new Wiki page for a missing concept, person, system, etc. |
| "clear ingest batches" | Remove incomplete batch files to restart a failed ingest |
| ask any question | Query the knowledge base (default behavior) |
The ingest next batch and finalize ingest commands are only needed for importing large amounts of notes. LLM will notify you when you ingest new notes and it sees it requires batched importing.
You can use the script "scripts/wiki-ingest-loop.sh" to start ingesting new notes. The advantage of this script is that it will try to ingest new notes in batches, and wait if your 5h limit has been reached. It will first execute "ingest new notes" followed by as many "ingest next batch" prompts as necessary (up to a specified maximum). Use "--help" for help for this script.
You start it for a specific agent (Claude CLI or Junie CLI), like this
scripts/wiki-ingest.loop.sh [--agent claude|junie]
Use wiki-ingest.loop.sh --help for more options.
After each ingestion, the system can automatically run a health check on the knowledge base. It performs an automated, basic health-check and will check for missing topics, inconsistencies etc. using the LLM (takes time and tokens).
You can also run the basic health-check (which does not use the LLM) manually, by simply executing:
scripts/wiki-lint-check.py
This opens an interactive TUI to deal with:
- Broken links: these can be removed, flagged or simply replaced with plain text.
- Orphaned pages: these can be deleted, or kept (marked with
orphan: false). - Stub pages (that were identified by the LLM but never filled in): these can be deleted, or kept (no longer marked as
stub: true).
Using this interactive mode, you should be able to keep your knowledge base 100% free of false positive alerts so it's easy to see if the knowledge base is still sound or not. Use --batch-mode to suppress the TUI and get text/JSON output only.
Provide personal info on who you are, what you do, and what your focus is, in config/personal_info.md:
# Personal Info
My name is ...
I am ...
# My Main Focus
- Strategic decision making on technology choices.
- ...If the file is missing, or it contains no info topics, default topics will be used.
Add a # Slack section to config/personal_info.md to configure which channels and DMs to fetch:
| Channel / DM | Days | Mode
|-------------------------|——----|——------
| #architecture-decisions | 14 | signal
| #team-platform | all |
| @Alice van Dijk | 7 | software design decisions
#channel-name— a public or private Slack channel@Person Name— a direct message thread with that person- Days — how many calendar days back to fetch conversation updates (default: 7)
- Mode —
signalfilters out noise (absences, bot messages, bare acks);allincludes everything; any other text is treated as a topic filter (only threads directly about that topic are included)
You can run Claude from within Obsidian using the Claudian plugin. Install it by asking Claude:
Claude, I want you to install the following Obsidian plugin from Github. First, I want you to review
the plugin and make sure it is safe to install. And if it is safe, install it.
This is the repo: https://github.com/YishenTu/claudian
To re-create the entire Wiki, remove the wiki/ directory, /clear the LLM conversation and ask it to ingest new raw notes. Note that for large amounts of notes this may be expensive and take a long time.
Note: The wiki/log.jsonl file tracks which notes have already been ingested. If you share the wiki/ directory across machines, any client can run incremental ingestions without re-processing everything.
The database is automatically checked for errors after ingesting new notes. To check manually:
# Basic check (no LLM, fast):
./scripts/wiki-lint-check.py --batch-mode --format text
# Interactive TUI (deal with broken links, orphans, stubs):
./scripts/wiki-lint-check.py<root>/
├── .import/ ← in-progress batch import state (gitignored)
├── config/ ← config file for Obsidian web clipper
├── scripts/ ← helper scripts for CLAUDE.md
├── raw/
│ ├── clips/ ← web articles and saved pages (web clipper)
│ ├── confluence/ ← pages fetched from Atlassian Confluence (fetch cache)
│ ├── emails/ ← email threads (.eml)
│ │ └── converted/ ← LLM generated: emails converted to Markdown
│ ├── scans/ ← handwritten pages, whiteboards
│ │ └── converted/ ← LLM generated: scans converted to Markdown
│ ├── notes/ ← notes, 1:1s, and people-specific files
│ ├── slack/ ← Slack channel and DM threads (fetched by "fetch slack")
│ └── transcripts/ ← meeting and conversation transcripts (.vtt)
│ └── converted/ ← LLM generated: transcripts converted to Markdown
├── wiki/
│ ├── index.md ← top-level navigation to section indexes
│ ├── log.jsonl ← append-only ingest log (JSON Lines)
│ ├── concepts/ ← mental models and domain concepts
│ │ └── _index.md ← alphabetical index of concept pages
│ ├── competition/ ← competitor profiles
│ ├── conversations/ ← interesting and valuable conversations (query results)
│ ├── decisions/ ← decision records
│ ├── people/ ← people and team pages
│ ├── problems/ ← living problem tracking pages
│ ├── projects/ ← living project tracking pages
│ └── systems/ ← living system reference pages
├── CLAUDE.md ← schema and workflow instructions for Claude Code
└── README.md ← this file
The directories raw and wiki are not stored in Git. Create them manually before first use.
| Type | Purpose |
|---|---|
| competition | Competing companies, products, and approaches |
| concepts | Technologies, standards, mental models, domain vocabulary |
| conversations | Valuable results of earlier queries/conversations |
| decisions | Why decisions were taken, on what basis, by whom, and when |
| people | Colleagues, contacts, external stakeholders, teams |
| problems | Active and past problems |
| projects | Active and past initiatives |
| systems | System, products, platforms, and services |
raw/is immutable — LLM never writes there (exceptraw/confluence/as a fetch cache).wiki/is LLM-owned — LLM writes, the user reads.- The relevant
wiki/<type>/_index.mdfiles are rebuilt andwiki/log.jsonlis updated on every finalized ingest. - Hand-curated content in Wiki pages is never deleted or overwritten.
| Script | Purpose |
|---|---|
wiki-ingest-loop.sh |
Main ingestion pipeline: converts raw files (VTT, EML), creates batches if needed, and runs ingestion sessions in a loop until all notes are processed. The normal way to ingest new notes. |
wiki-lint-check.py |
Scans wiki Markdown files for broken internal and external links. Outputs structured JSON for AI consumption. Run periodically to keep the wiki healthy. |
| Script | Purpose |
|---|---|
wiki-remove-all-generated-files.sh |
Deletes all LLM-generated wiki files and batch state, resetting the wiki to a clean slate. Use when you want to re-ingest everything from scratch. |
wiki-remove-large-attachments.py |
Interactive TUI for browsing and removing large Obsidian attachments. Navigate with ↑↓, press d/D to move files to .trash/. Useful for reclaiming disk space. |
qmd-full-reindex.sh |
Reset and fully re-index the QMD database. |
| Script | Purpose |
|---|---|
system/wiki-create-import-batches.sh |
Partitions un-ingested notes into batch files for parallel import sessions. Called automatically by wiki-ingest-loop.sh and the wiki-ingest skill. |
system/wiki-create-index-pages.py |
Rebuilds _index.md files for each wiki section. Called by the wiki-finalize-ingest skill after a completed ingest run. |
system/convert-eml-to-md.py |
Converts .eml email files to Markdown with YAML frontmatter. Called by wiki-ingest-loop.sh before ingestion. |
system/convert-vtt-to-md.py |
Converts .vtt transcript files to readable Markdown with YAML frontmatter. Called by wiki-ingest-loop.sh before ingestion. |
system/copy-claude-skills-to-other-agents.sh |
Copies .claude/skills/ to other AI agent config directories (Junie, Gemini, Codex, etc.) so all agents share the same skill set. |
system/qmd-reset-collections.sh |
Removes all QMD collections and wipes the search index database. Use before a full re-sync. |
system/qmd-sync-collections.sh |
Adds all raw/ and wiki/ subdirectories as QMD collections (idempotent) and re-indexes them. Called by the wiki-finalize-ingest skill. |
- Andrej Karpathy - for his original idea for the LLM Wiki.
- Rob van der Most - for brainstorming and experimenting with this idea.