Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 32 additions & 6 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
# Mnemonic — Development Guide

Mnemonic is a local-first, air-gapped semantic memory system built in Go. It uses 8 cognitive agents + orchestrator + reactor, SQLite with FTS5 + vector search, and LLMs (LM Studio locally or cloud APIs like Gemini) for semantic understanding.
## Your Role

You are a world-class AI/ML researcher and systems engineer working on one of the most ambitious projects in local AI: building a daemon that has its own brain. Not a wrapper around an API. Not a RAG pipeline. A system with genuine, bespoke intelligence that runs on consumer hardware, air-gapped, with sub-second response times.

This is bleeding-edge work. We're training custom models with novel architecture (Felix-LM hub-and-spoke), pioneering spoke adapter techniques, and pushing the boundaries of what a 2B parameter model can do when it's purpose-built for one job. The research matters. The engineering matters. Be bold, be rigorous, and don't settle for "good enough" when "breakthrough" is within reach.

## What Mnemonic Is

Mnemonic is a local-first, air-gapped semantic memory system built in Go. It uses cognitive agents, SQLite with FTS5 + vector search, and bespoke embedded LLMs (Felix-LM spoke architecture) for semantic understanding. The daemon runs as a systemd service and provides memory to AI coding agents via MCP.

## Build & Test

Expand Down Expand Up @@ -89,7 +97,13 @@ scripts/ Utility scripts
| Linux x86_64 | Supported — `serve`, `install`, `start`, `stop`, `uninstall` all work via systemd |
| Windows x86_64 | Supported — `serve`, `install`, `start`, `stop`, `uninstall` work via Windows Services |

## Training (Mnemonic-LM)
## Training (Felix-LM / Mnemonic-LM)

Felix-LM is a hub-and-spoke architecture for language models. The "central post" is a frozen pretrained base model (currently Gemma 4 E2B, previously Qwen 3.5 2B). "Spokes" are lightweight low-rank adapters (~27M params, <1% overhead) injected at each decoder layer via forward hooks. The spokes are the only trainable parameters — the base model is frozen.

The architecture supports hot-swappable task-specific spoke sets: encoding spokes, synthesis spokes, retrieval spokes, all sharing the same frozen post. This is the Felix-LM vision: one backbone, many specialized tools.

**Current state:** Encoding spokes achieve 100% novel schema compliance on Qwen 3.5 2B. Gemma 4 E2B training is in progress. See `training/docs/experiment_registry.md` for the full experiment history (EXP-1 through EXP-19).

Training scripts live in `training/scripts/` and require the **Felix-LM venv**:

Expand All @@ -99,10 +113,22 @@ source ~/Projects/felixlm/.venv/bin/activate

Key scripts:

- `train_mnemonic_lm.py` — Main training script (imports Felix-LM v3 from `~/Projects/felixlm`)
- `run_sweep.sh` — Run HP sweep configs sequentially with auto-logging to TSV
- `bisect_lr.sh` — Binary search for optimal LR using short probes + full confirmation
- `validate.py` — Quality gate pipeline for fine-tuning data
- `train_qwen_spokes.py` — Main training script (supports `--model-type qwen|gemma`)
- `qwen_spoke_adapter.py` — Qwen 3.5 2B spoke adapter + shared SpokeLayer class
- `gemma_spoke_adapter.py` — Gemma 4 E2B spoke adapter
- `eval_qwen_encoding.py` — Novel input evaluation (needs Gemma 4 support)
- `batch_encode.py` — Gemini Batch API pipeline for scalable training data generation
- `enrich_and_generate.py` — Async Gemini data enrichment + synthetic generation
- `extract_prenuke_data.py` — Extract training data from pre-nuke DB backup
- `merge_training_data.py` — Merge, dedup, and split training datasets

Key data:

- `training/data/finetune_gemma4_v5/` — Current Gemma 4 training data (9,945 train / 1,105 eval, encoding-only)
- `training/data/finetune_qwen_v5_encoding_only/` — Qwen training data (11,436 train / 1,270 eval)
- `training/data/finetune_qwen_v2/` — Original clean dataset (4,566 train / 507 eval)

The Felix-LM design paper is at `~/Projects/felixlm/docs/felix_lm_design.tex`. The spoke implementation originated in `~/Projects/felixlm/felix_lm/v3/spokes.py` and `~/Projects/nanochat/nanochat/gpt.py`.

All experiments must be pre-registered in `training/docs/experiment_registry.md` before running. See `.claude/rules/scientific-method.md` and `.claude/rules/experiment-logging.md`.

Expand Down
344 changes: 331 additions & 13 deletions training/docs/experiment_registry.md

Large diffs are not rendered by default.

444 changes: 444 additions & 0 deletions training/docs/hallucination_stress_test.json

Large diffs are not rendered by default.

269 changes: 269 additions & 0 deletions training/scripts/batch_encode.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,269 @@
#!/usr/bin/env python3
"""Batch-encode raw inputs via Gemini Batch API (50% cheaper, no rate limits).

1. Reads raw inputs from a JSONL file
2. Creates a batch JSONL file with encoding requests
3. Uploads to Gemini File API
4. Creates a batch job
5. Polls for completion
6. Downloads and parses results

Usage:
# Create and submit batch job
python batch_encode.py submit --input training/data/swebench_raw_inputs.jsonl

# Check status of a running job
python batch_encode.py status --job batches/YOUR_JOB_ID

# Download results from completed job
python batch_encode.py download --job batches/YOUR_JOB_ID --output training/data/swebench_encoded.jsonl
"""

import argparse
import json
import os
import sys
import time
from pathlib import Path

ENCODING_SYSTEM_PROMPT = (
"You are a memory encoding agent for Mnemonic, a semantic memory system. "
"You receive raw events (text observations from a developer's work) and output structured JSON.\n\n"
"Your output MUST be a single JSON object with exactly these 10 fields:\n"
"- gist: One-line summary, under 80 characters\n"
"- summary: 2-3 sentence summary of the key information\n"
"- content: Preserved detail — the important facts, decisions, and context\n"
"- narrative: A paragraph providing broader context and significance\n"
"- concepts: Array of 3-8 keyword strings (lowercase, no phrases longer than 3 words)\n"
"- structured_concepts: Object with 4 arrays:\n"
" - topics: [{label, path}] — what domains this touches\n"
" - entities: [{name, type, context}] — people, tools, systems mentioned\n"
" - actions: [{verb, object, details}] — what was done\n"
" - causality: [{relation, description}] — cause/effect relationships\n"
"- significance: One of \"critical\", \"important\", \"notable\", \"routine\", \"trivial\"\n"
"- emotional_tone: One of \"positive\", \"negative\", \"neutral\", \"frustrated\", \"excited\", \"analytical\", \"reflective\"\n"
"- outcome: Brief description of the result or status\n"
"- salience: Float 0.0-1.0 (how important is this to remember long-term)\n\n"
"Output ONLY the JSON object. No markdown fences, no explanation, no preamble."
)

API_KEY = os.environ.get("LLM_API_KEY", "")
MODEL = "gemini-3-flash-preview"


def create_batch_file(input_path: str, batch_path: str) -> int:
"""Create JSONL batch request file from raw inputs."""
count = 0
with open(batch_path, "w") as out:
for line in open(input_path):
ex = json.loads(line)
raw = ex["raw_input"][:3000]

request = {
"key": f"req-{count}",
"request": {
"contents": [{"parts": [{"text": raw}]}],
"system_instruction": {"parts": [{"text": ENCODING_SYSTEM_PROMPT}]},
"generation_config": {
"temperature": 0.7,
"max_output_tokens": 2048,
},
},
}
out.write(json.dumps(request) + "\n")
count += 1

print(f"Created batch file: {batch_path} ({count} requests)")
return count


def submit_batch(batch_path: str) -> str:
"""Upload file and create batch job."""
from google import genai
from google.genai import types

client = genai.Client(api_key=API_KEY)

print(f"Uploading {batch_path}...")
uploaded = client.files.upload(
file=batch_path,
config=types.UploadFileConfig(
display_name=Path(batch_path).stem,
mime_type="jsonl",
),
)
print(f"Uploaded: {uploaded.name}")

print(f"Creating batch job (model={MODEL})...")
job = client.batches.create(
model=MODEL,
src=uploaded.name,
config={"display_name": f"mnemonic-encode-{Path(batch_path).stem}"},
)
print(f"Job created: {job.name}")
print(f"State: {job.state.name}")
return job.name


def check_status(job_name: str):
"""Check batch job status."""
from google import genai

client = genai.Client(api_key=API_KEY)
job = client.batches.get(name=job_name)
print(f"Job: {job.name}")
print(f"State: {job.state.name}")
if hasattr(job, "dest") and job.dest:
print(f"Result file: {job.dest.file_name}")
return job


def download_results(job_name: str, output_path: str, raw_input_path: str):
"""Download batch results and merge with raw inputs."""
from google import genai

client = genai.Client(api_key=API_KEY)
job = client.batches.get(name=job_name)

if job.state.name != "JOB_STATE_SUCCEEDED":
print(f"Job not complete: {job.state.name}")
return

print(f"Downloading results from {job.dest.file_name}...")
content = client.files.download(file=job.dest.file_name)
result_lines = content.decode("utf-8").strip().split("\n")
print(f"Got {len(result_lines)} result lines")

# Load raw inputs for merging
raw_inputs = {}
for i, line in enumerate(open(raw_input_path)):
ex = json.loads(line)
raw_inputs[f"req-{i}"] = ex

# Parse results
REQUIRED = {"gist", "summary", "content", "narrative", "concepts",
"structured_concepts", "significance", "emotional_tone",
"outcome", "salience"}

success = 0
fail = 0
results = []

for line in result_lines:
try:
result = json.loads(line)
except json.JSONDecodeError:
fail += 1
continue

key = result.get("key", "")
response = result.get("response", {})

# Extract text from response
try:
text = response["candidates"][0]["content"]["parts"][0]["text"]
except (KeyError, IndexError):
fail += 1
continue

# Parse JSON from response
text = text.strip()
if text.startswith("```"):
lines = text.split("\n")
lines = [l for l in lines if not l.strip().startswith("```")]
text = "\n".join(lines).strip()

try:
encoded = json.loads(text)
except json.JSONDecodeError:
# Try to find JSON in text
start = text.find("{")
end = text.rfind("}") + 1
if start >= 0 and end > start:
try:
encoded = json.loads(text[start:end])
except json.JSONDecodeError:
fail += 1
continue
else:
fail += 1
continue

if not REQUIRED.issubset(encoded.keys()):
fail += 1
continue

raw = raw_inputs.get(key, {})
results.append({
"raw_input": raw.get("raw_input", ""),
"encoded": encoded,
"source": f"swebench_{raw.get('repo', 'unknown')}",
"task_type": "encoding",
})
success += 1

with open(output_path, "w") as f:
for r in results:
f.write(json.dumps(r) + "\n")

print(f"Results: {success} success, {fail} fail ({success/(success+fail)*100:.1f}% success rate)")
print(f"Written to: {output_path}")


def main():
parser = argparse.ArgumentParser()
sub = parser.add_subparsers(dest="command")

submit_p = sub.add_parser("submit")
submit_p.add_argument("--input", required=True, help="Raw inputs JSONL")

status_p = sub.add_parser("status")
status_p.add_argument("--job", required=True, help="Batch job name")

download_p = sub.add_parser("download")
download_p.add_argument("--job", required=True, help="Batch job name")
download_p.add_argument("--output", required=True, help="Output JSONL")
download_p.add_argument("--raw-input", required=True, help="Original raw input JSONL (for merging)")

poll_p = sub.add_parser("poll")
poll_p.add_argument("--job", required=True, help="Batch job name")
poll_p.add_argument("--output", required=True, help="Output JSONL")
poll_p.add_argument("--raw-input", required=True, help="Original raw input JSONL")
poll_p.add_argument("--interval", type=int, default=60, help="Poll interval seconds")

args = parser.parse_args()

if not API_KEY:
print("ERROR: LLM_API_KEY not set")
sys.exit(1)

if args.command == "submit":
batch_path = args.input.replace(".jsonl", "_batch.jsonl")
create_batch_file(args.input, batch_path)
job_name = submit_batch(batch_path)
print(f"\nJob submitted: {job_name}")
print(f"Check status: python {sys.argv[0]} status --job {job_name}")
print(f"Poll & download: python {sys.argv[0]} poll --job {job_name} --output OUTPUT.jsonl --raw-input {args.input}")

elif args.command == "status":
check_status(args.job)

elif args.command == "download":
download_results(args.job, args.output, args.raw_input)

elif args.command == "poll":
completed = {"JOB_STATE_SUCCEEDED", "JOB_STATE_FAILED", "JOB_STATE_CANCELLED", "JOB_STATE_EXPIRED"}
while True:
job = check_status(args.job)
if job.state.name in completed:
break
print(f" Waiting {args.interval}s...")
time.sleep(args.interval)
if job.state.name == "JOB_STATE_SUCCEEDED":
download_results(args.job, args.output, args.raw_input)
else:
print(f"Job ended with state: {job.state.name}")


if __name__ == "__main__":
main()
Loading