Knowledge base Wiki

This repository contains a solid implementation of Andrej Karpathy's idea for a LLM-maintained knowledge base, based on a Wiki. This implementation is meant for work-related notes, structured as an Obsidian vault, assisted by the semantic database QMD.

The implementation supports Anthropic Claude and Jetbrains Junie (both CLI) to ingest notes into the knowledge base.

Purpose

The primary goal is efficient decision intelligence: understanding why decisions were taken, on what basis, by whom, and when. Secondary goals include mapping how technologies and systems relate, who is involved in what, and how competitors compare. And 'efficient', because the mechanism needs to be token (and environmentally) efficient.

Division of labor:

The user curates source files in raw/.
LLM does all writing, cross-referencing, and bookkeeping in wiki/.
Obsidian is the UI for entering/accessing notes and asking questions (e.g. through Claudian).

Quick Start

# 1. Clone this repo
git clone <repo-url> ~/my-knowledge-base

# 2. Create raw/ and wiki/ directories (these are not stored in git)
cd ~/my-knowledge-base
mkdir -p raw/{notes,clips,emails,transcripts,scans,slack} wiki

# 3. Install QMD (the semantic search engine)
npm install -g bun
npm install -g @tobilu/qmd

# 4. Register all subdirectories as QMD collections and build the index
./scripts/qmd-full-reindex.sh

# 5. Install the QMD skill for Claude/Junie
qmd skill install --global --yes

# 6. Register QMD as a Claude Code MCP server (add to ~/.claude/claude_desktop_config.json)
#    Or just ask Claude: "read this README.md and install QMD as an MCP server"

# 7. Open this directory as an Obsidian vault: File → Open Folder as Vault

After setup, put your notes in raw/ and tell Claude: "ingest new raw notes".

Update

cd ~/my-knowledge-base && git pull
./scripts/qmd-full-reindex.sh   # re-register and update any new subdirectories

Prerequisites

Required:

Claude Code (CLI) — or JetBrains Junie
Node.js / npm — for installing bun and qmd
QMD — local semantic search engine (npm install -g @tobilu/qmd)
Obsidian — vault UI (free, Mac/Windows/Linux)
git

Optional:

pdftotext — faster/cheaper PDF extraction (brew install poppler); LLM vision is the fallback
Obsidian Web Clipper — one-click web article saving to raw/clips/
Claudian — run Claude from within Obsidian (ask Claude to install it safely)
Amphetamine (Mac App Store) — prevents Mac sleep during long overnight ingests

MCP Server Setup

Register QMD as a MCP server in ~/.claude/claude_desktop_config.json (or ask Claude to do it):

{
  "mcpServers": {
    "qmd": {
      "command": "qmd",
      "args": ["mcp"]
    }
  }
}

The Slack integration is managed via your claude.ai organization. Authorize it yourself at claude.ai → Settings → Connectors. Once authorized, the Slack tools are available automatically in all Claude sessions — no local configuration needed.

In a nutshell

Create and collect notes:
- User produces raw notes and stores them in the raw/notes directory.
- User uses the Obsidian Web Clipper to store notes in raw/clips.
- User stores .vtt meeting transcripts in raw/transcripts.
- User drags .eml emails to raw/emails.
- User stored handwritten notes or scanned pages (PDF, JPG) in raw/scans.
- User fetches Slack channels and DMs by asking "fetch slack" — messages are written to raw/slack/.
Ingest notes:
- User asks to "ingest new raw notes", "ingest Confluence page <URL>" or runs wiki-ingest.loop.sh.
- LLM converts non-Markdown inputs: .vtt transcripts → raw/transcripts/converted/, .eml emails → raw/emails/converted/, .pdf/.jpg scans → raw/scans/converted/.
- LLM partitions files into batches and processes them (large ingests use parallel LLM sessions 2–5; single batches are handled in one session).
- After all batches are done, user says "finalize ingest" to merge session logs, rebuild _index.md files, and run post-processing (QMD re-index + health check).
Query wiki:
- User asks a high-level question.
- LLM queries semantic database (with the qmd skill) for relevant page links (fast/token-efficient).
- LLM processes suggested pages and produces answer to user.
- LLM stores valuable conversations in wiki/conversations/ to extend the knowledge base.

The combination of using a semantic database to fetch relevant pages before analyzing documents and reasoning about them, makes this implementation of a knowledge significantly faster and more token efficient than when it's using Markdown files only.

Commands and skills

These skills commands and natural-language triggers are available:

Command / phrase	Description
"ingest new notes"	Start a new ingest of raw notes (Session 1 — coordinator flow)
"fetch slack"	Fetch Slack threads and DMs into `raw/slack/`, then run `wiki-ingest-loop.sh` to ingest
"ingest next batch"	Continue ingesting the next batch (Sessions 2–N flow)
"finalize ingest"	Finalize the ingest: merge logs, rebuild indexes, run post-processing
"health check" or "lint"	Check for orphaned pages, broken links, contradictions
"add missing [topic]"	Create a new Wiki page for a missing concept, person, system, etc.
"clear ingest batches"	Remove incomplete batch files to restart a failed ingest
ask any question	Query the knowledge base (default behavior)

The ingest next batch and finalize ingest commands are only needed for importing large amounts of notes. LLM will notify you when you ingest new notes and it sees it requires batched importing.

Pro-tip 1: use `wiki-ingest-loop.sh` to ingest multiple files

You can use the script "scripts/wiki-ingest-loop.sh" to start ingesting new notes. The advantage of this script is that it will try to ingest new notes in batches, and wait if your 5h limit has been reached. It will first execute "ingest new notes" followed by as many "ingest next batch" prompts as necessary (up to a specified maximum). Use "--help" for help for this script.

You start it for a specific agent (Claude CLI or Junie CLI), like this

scripts/wiki-ingest.loop.sh [--agent claude|junie]

Use wiki-ingest.loop.sh --help for more options.

Pro-tip 2: use `wiki-lint-check.py` to health-check your knowledge base

After each ingestion, the system can automatically run a health check on the knowledge base. It performs an automated, basic health-check and will check for missing topics, inconsistencies etc. using the LLM (takes time and tokens).

You can also run the basic health-check (which does not use the LLM) manually, by simply executing:

scripts/wiki-lint-check.py

This opens an interactive TUI to deal with:

Broken links: these can be removed, flagged or simply replaced with plain text.
Orphaned pages: these can be deleted, or kept (marked with orphan: false).
Stub pages (that were identified by the LLM but never filled in): these can be deleted, or kept (no longer marked as stub: true).

Using this interactive mode, you should be able to keep your knowledge base 100% free of false positive alerts so it's easy to see if the knowledge base is still sound or not. Use --batch-mode to suppress the TUI and get text/JSON output only.

Configuration

Personalizing your setup

Provide personal info on who you are, what you do, and what your focus is, in config/personal_info.md:

# Personal Info
My name is ...
I am ...

# My Main Focus
- Strategic decision making on technology choices.
- ...

If the file is missing, or it contains no info topics, default topics will be used.

Configuring Slack sources

Add a # Slack section to config/personal_info.md to configure which channels and DMs to fetch:

| Channel / DM | Days | Mode
|-------------------------|——----|——------ | #architecture-decisions | 14 | signal | #team-platform | all | | @Alice van Dijk | 7 | software design decisions

#channel-name — a public or private Slack channel
@Person Name — a direct message thread with that person
Days — how many calendar days back to fetch conversation updates (default: 7)
Mode — signal filters out noise (absences, bot messages, bare acks); all includes everything; any other text is treated as a topic filter (only threads directly about that topic are included)

Running Claude within Obsidian

You can run Claude from within Obsidian using the Claudian plugin. Install it by asking Claude:

Claude, I want you to install the following Obsidian plugin from Github. First, I want you to review
the plugin and make sure it is safe to install. And if it is safe, install it.
This is the repo: https://github.com/YishenTu/claudian

Re-creating the Wiki from Scratch

To re-create the entire Wiki, remove the wiki/ directory, /clear the LLM conversation and ask it to ingest new raw notes. Note that for large amounts of notes this may be expensive and take a long time.

Note: The wiki/log.jsonl file tracks which notes have already been ingested. If you share the wiki/ directory across machines, any client can run incremental ingestions without re-processing everything.

Checking Your Database

The database is automatically checked for errors after ingesting new notes. To check manually:

# Basic check (no LLM, fast):
./scripts/wiki-lint-check.py --batch-mode --format text

# Interactive TUI (deal with broken links, orphans, stubs):
./scripts/wiki-lint-check.py

Directory structure (condensed)

<root>/
├── .import/             ← in-progress batch import state (gitignored)
├── config/              ← config file for Obsidian web clipper
├── scripts/             ← helper scripts for CLAUDE.md
├── raw/
│   ├── clips/           ← web articles and saved pages (web clipper)
│   ├── confluence/      ← pages fetched from Atlassian Confluence (fetch cache)
│   ├── emails/          ← email threads (.eml)
│   │   └── converted/   ← LLM generated: emails converted to Markdown
│   ├── scans/           ← handwritten pages, whiteboards
│   │   └── converted/   ← LLM generated: scans converted to Markdown
│   ├── notes/           ← notes, 1:1s, and people-specific files
│   ├── slack/           ← Slack channel and DM threads (fetched by "fetch slack")
│   └── transcripts/     ← meeting and conversation transcripts (.vtt)
│       └── converted/   ← LLM generated: transcripts converted to Markdown
├── wiki/
│   ├── index.md         ← top-level navigation to section indexes
│   ├── log.jsonl        ← append-only ingest log (JSON Lines)
│   ├── concepts/        ← mental models and domain concepts
│   │   └── _index.md    ← alphabetical index of concept pages
│   ├── competition/     ← competitor profiles
│   ├── conversations/   ← interesting and valuable conversations (query results)
│   ├── decisions/       ← decision records
│   ├── people/          ← people and team pages
│   ├── problems/        ← living problem tracking pages
│   ├── projects/        ← living project tracking pages
│   └── systems/         ← living system reference pages
├── CLAUDE.md            ← schema and workflow instructions for Claude Code
└── README.md            ← this file

The directories raw and wiki are not stored in Git. Create them manually before first use.

Wiki topic types

Type	Purpose
competition	Competing companies, products, and approaches
concepts	Technologies, standards, mental models, domain vocabulary
conversations	Valuable results of earlier queries/conversations
decisions	Why decisions were taken, on what basis, by whom, and when
people	Colleagues, contacts, external stakeholders, teams
problems	Active and past problems
projects	Active and past initiatives
systems	System, products, platforms, and services

Key rules

raw/ is immutable — LLM never writes there (except raw/confluence/ as a fetch cache).
wiki/ is LLM-owned — LLM writes, the user reads.
The relevant wiki/<type>/_index.md files are rebuilt and wiki/log.jsonl is updated on every finalized ingest.
Hand-curated content in Wiki pages is never deleted or overwritten.

Scripts

Regular use

Script	Purpose
`wiki-ingest-loop.sh`	Main ingestion pipeline: converts raw files (VTT, EML), creates batches if needed, and runs ingestion sessions in a loop until all notes are processed. The normal way to ingest new notes.
`wiki-lint-check.py`	Scans wiki Markdown files for broken internal and external links. Outputs structured JSON for AI consumption. Run periodically to keep the wiki healthy.

Occasional use

Script	Purpose
`wiki-remove-all-generated-files.sh`	Deletes all LLM-generated wiki files and batch state, resetting the wiki to a clean slate. Use when you want to re-ingest everything from scratch.
`wiki-remove-large-attachments.py`	Interactive TUI for browsing and removing large Obsidian attachments. Navigate with ↑↓, press `d`/`D` to move files to `.trash/`. Useful for reclaiming disk space.
`qmd-full-reindex.sh`	Reset and fully re-index the QMD database.

For use by skills (not normally run directly)

Script	Purpose
`system/wiki-create-import-batches.sh`	Partitions un-ingested notes into batch files for parallel import sessions. Called automatically by `wiki-ingest-loop.sh` and the `wiki-ingest` skill.
`system/wiki-create-index-pages.py`	Rebuilds `_index.md` files for each wiki section. Called by the `wiki-finalize-ingest` skill after a completed ingest run.
`system/convert-eml-to-md.py`	Converts `.eml` email files to Markdown with YAML frontmatter. Called by `wiki-ingest-loop.sh` before ingestion.
`system/convert-vtt-to-md.py`	Converts `.vtt` transcript files to readable Markdown with YAML frontmatter. Called by `wiki-ingest-loop.sh` before ingestion.
`system/copy-claude-skills-to-other-agents.sh`	Copies `.claude/skills/` to other AI agent config directories (Junie, Gemini, Codex, etc.) so all agents share the same skill set.
`system/qmd-reset-collections.sh`	Removes all QMD collections and wipes the search index database. Use before a full re-sync.
`system/qmd-sync-collections.sh`	Adds all `raw/` and `wiki/` subdirectories as QMD collections (idempotent) and re-indexes them. Called by the `wiki-finalize-ingest` skill.

Recognition

Andrej Karpathy - for his original idea for the LLM Wiki.
Rob van der Most - for brainstorming and experimenting with this idea.

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
.agents		.agents
.claude		.claude
.import		.import
.junie		.junie
.obsidian		.obsidian
config		config
raw		raw
scripts		scripts
templates		templates
wiki		wiki
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
START_HERE.md		START_HERE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge base Wiki

Purpose

Quick Start

Update

Prerequisites

MCP Server Setup

In a nutshell

Commands and skills

Pro-tip 1: use `wiki-ingest-loop.sh` to ingest multiple files

Pro-tip 2: use `wiki-lint-check.py` to health-check your knowledge base

Configuration

Personalizing your setup

Configuring Slack sources

Running Claude within Obsidian

Re-creating the Wiki from Scratch

Checking Your Database

Directory structure (condensed)

Wiki topic types

Key rules

Scripts

Regular use

Occasional use

For use by skills (not normally run directly)

Recognition

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Knowledge base Wiki

Purpose

Quick Start

Update

Prerequisites

MCP Server Setup

In a nutshell

Commands and skills

Pro-tip 1: use wiki-ingest-loop.sh to ingest multiple files

Pro-tip 2: use wiki-lint-check.py to health-check your knowledge base

Configuration

Personalizing your setup

Configuring Slack sources

Running Claude within Obsidian

Re-creating the Wiki from Scratch

Checking Your Database

Directory structure (condensed)

Wiki topic types

Key rules

Scripts

Regular use

Occasional use

For use by skills (not normally run directly)

Recognition

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Pro-tip 1: use `wiki-ingest-loop.sh` to ingest multiple files

Pro-tip 2: use `wiki-lint-check.py` to health-check your knowledge base

Packages