AppSprout-dev · CalebisGross · Apr 4, 2026 · Apr 4, 2026 · Apr 4, 2026 · Apr 4, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -1,6 +1,14 @@
 # Mnemonic — Development Guide
 
-Mnemonic is a local-first, air-gapped semantic memory system built in Go. It uses 8 cognitive agents + orchestrator + reactor, SQLite with FTS5 + vector search, and LLMs (LM Studio locally or cloud APIs like Gemini) for semantic understanding.
+## Your Role
+
+You are a world-class AI/ML researcher and systems engineer working on one of the most ambitious projects in local AI: building a daemon that has its own brain. Not a wrapper around an API. Not a RAG pipeline. A system with genuine, bespoke intelligence that runs on consumer hardware, air-gapped, with sub-second response times.
+
+This is bleeding-edge work. We're training custom models with novel architecture (Felix-LM hub-and-spoke), pioneering spoke adapter techniques, and pushing the boundaries of what a 2B parameter model can do when it's purpose-built for one job. The research matters. The engineering matters. Be bold, be rigorous, and don't settle for "good enough" when "breakthrough" is within reach.
+
+## What Mnemonic Is
+
+Mnemonic is a local-first, air-gapped semantic memory system built in Go. It uses cognitive agents, SQLite with FTS5 + vector search, and bespoke embedded LLMs (Felix-LM spoke architecture) for semantic understanding. The daemon runs as a systemd service and provides memory to AI coding agents via MCP.
 
 ## Build & Test
 
@@ -89,7 +97,13 @@ scripts/               Utility scripts
 | Linux x86_64 | Supported — `serve`, `install`, `start`, `stop`, `uninstall` all work via systemd |
 | Windows x86_64 | Supported — `serve`, `install`, `start`, `stop`, `uninstall` work via Windows Services |
 
-## Training (Mnemonic-LM)
+## Training (Felix-LM / Mnemonic-LM)
+
+Felix-LM is a hub-and-spoke architecture for language models. The "central post" is a frozen pretrained base model (currently Gemma 4 E2B, previously Qwen 3.5 2B). "Spokes" are lightweight low-rank adapters (~27M params, <1% overhead) injected at each decoder layer via forward hooks. The spokes are the only trainable parameters — the base model is frozen.
+
+The architecture supports hot-swappable task-specific spoke sets: encoding spokes, synthesis spokes, retrieval spokes, all sharing the same frozen post. This is the Felix-LM vision: one backbone, many specialized tools.
+
+**Current state:** Encoding spokes achieve 100% novel schema compliance on Qwen 3.5 2B. Gemma 4 E2B training is in progress. See `training/docs/experiment_registry.md` for the full experiment history (EXP-1 through EXP-19).
 
 Training scripts live in `training/scripts/` and require the **Felix-LM venv**:
 
@@ -99,10 +113,22 @@ source ~/Projects/felixlm/.venv/bin/activate
 
 Key scripts:
 
-- `train_mnemonic_lm.py` — Main training script (imports Felix-LM v3 from `~/Projects/felixlm`)
-- `run_sweep.sh` — Run HP sweep configs sequentially with auto-logging to TSV
-- `bisect_lr.sh` — Binary search for optimal LR using short probes + full confirmation
-- `validate.py` — Quality gate pipeline for fine-tuning data
+- `train_qwen_spokes.py` — Main training script (supports `--model-type qwen|gemma`)
+- `qwen_spoke_adapter.py` — Qwen 3.5 2B spoke adapter + shared SpokeLayer class
+- `gemma_spoke_adapter.py` — Gemma 4 E2B spoke adapter
+- `eval_qwen_encoding.py` — Novel input evaluation (needs Gemma 4 support)
+- `batch_encode.py` — Gemini Batch API pipeline for scalable training data generation
+- `enrich_and_generate.py` — Async Gemini data enrichment + synthetic generation
+- `extract_prenuke_data.py` — Extract training data from pre-nuke DB backup
+- `merge_training_data.py` — Merge, dedup, and split training datasets
+
+Key data:
+
+- `training/data/finetune_gemma4_v5/` — Current Gemma 4 training data (9,945 train / 1,105 eval, encoding-only)
+- `training/data/finetune_qwen_v5_encoding_only/` — Qwen training data (11,436 train / 1,270 eval)
+- `training/data/finetune_qwen_v2/` — Original clean dataset (4,566 train / 507 eval)
+
+The Felix-LM design paper is at `~/Projects/felixlm/docs/felix_lm_design.tex`. The spoke implementation originated in `~/Projects/felixlm/felix_lm/v3/spokes.py` and `~/Projects/nanochat/nanochat/gpt.py`.
 
 All experiments must be pre-registered in `training/docs/experiment_registry.md` before running. See `.claude/rules/scientific-method.md` and `.claude/rules/experiment-logging.md`.
 

diff --git a/training/docs/experiment_registry.md b/training/docs/experiment_registry.md
diff --git a/training/docs/hallucination_stress_test.json b/training/docs/hallucination_stress_test.json
diff --git a/training/scripts/batch_encode.py b/training/scripts/batch_encode.py
@@ -0,0 +1,269 @@
+#!/usr/bin/env python3
+"""Batch-encode raw inputs via Gemini Batch API (50% cheaper, no rate limits).
+
+1. Reads raw inputs from a JSONL file
+2. Creates a batch JSONL file with encoding requests
+3. Uploads to Gemini File API
+4. Creates a batch job
+5. Polls for completion
+6. Downloads and parses results
+
+Usage:
+    # Create and submit batch job
+    python batch_encode.py submit --input training/data/swebench_raw_inputs.jsonl
+
+    # Check status of a running job
+    python batch_encode.py status --job batches/YOUR_JOB_ID
+
+    # Download results from completed job
+    python batch_encode.py download --job batches/YOUR_JOB_ID --output training/data/swebench_encoded.jsonl
+"""
+
+import argparse
+import json
+import os
+import sys
+import time
+from pathlib import Path
+
+ENCODING_SYSTEM_PROMPT = (
+    "You are a memory encoding agent for Mnemonic, a semantic memory system. "
+    "You receive raw events (text observations from a developer's work) and output structured JSON.\n\n"
+    "Your output MUST be a single JSON object with exactly these 10 fields:\n"
+    "- gist: One-line summary, under 80 characters\n"
+    "- summary: 2-3 sentence summary of the key information\n"
+    "- content: Preserved detail — the important facts, decisions, and context\n"
+    "- narrative: A paragraph providing broader context and significance\n"
+    "- concepts: Array of 3-8 keyword strings (lowercase, no phrases longer than 3 words)\n"
+    "- structured_concepts: Object with 4 arrays:\n"
+    "    - topics: [{label, path}] — what domains this touches\n"
+    "    - entities: [{name, type, context}] — people, tools, systems mentioned\n"
+    "    - actions: [{verb, object, details}] — what was done\n"
+    "    - causality: [{relation, description}] — cause/effect relationships\n"
+    "- significance: One of \"critical\", \"important\", \"notable\", \"routine\", \"trivial\"\n"
+    "- emotional_tone: One of \"positive\", \"negative\", \"neutral\", \"frustrated\", \"excited\", \"analytical\", \"reflective\"\n"
+    "- outcome: Brief description of the result or status\n"
+    "- salience: Float 0.0-1.0 (how important is this to remember long-term)\n\n"
+    "Output ONLY the JSON object. No markdown fences, no explanation, no preamble."
+)
+
+API_KEY = os.environ.get("LLM_API_KEY", "")
+MODEL = "gemini-3-flash-preview"
+
+
+def create_batch_file(input_path: str, batch_path: str) -> int:
+    """Create JSONL batch request file from raw inputs."""
+    count = 0
+    with open(batch_path, "w") as out:
+        for line in open(input_path):
+            ex = json.loads(line)
+            raw = ex["raw_input"][:3000]
+
+            request = {
+                "key": f"req-{count}",
+                "request": {
+                    "contents": [{"parts": [{"text": raw}]}],
+                    "system_instruction": {"parts": [{"text": ENCODING_SYSTEM_PROMPT}]},
+                    "generation_config": {
+                        "temperature": 0.7,
+                        "max_output_tokens": 2048,
+                    },
+                },
+            }
+            out.write(json.dumps(request) + "\n")
+            count += 1
+
+    print(f"Created batch file: {batch_path} ({count} requests)")
+    return count
+
+
+def submit_batch(batch_path: str) -> str:
+    """Upload file and create batch job."""
+    from google import genai
+    from google.genai import types
+
+    client = genai.Client(api_key=API_KEY)
+
+    print(f"Uploading {batch_path}...")
+    uploaded = client.files.upload(
+        file=batch_path,
+        config=types.UploadFileConfig(
+            display_name=Path(batch_path).stem,
+            mime_type="jsonl",
+        ),
+    )
+    print(f"Uploaded: {uploaded.name}")
+
+    print(f"Creating batch job (model={MODEL})...")
+    job = client.batches.create(
+        model=MODEL,
+        src=uploaded.name,
+        config={"display_name": f"mnemonic-encode-{Path(batch_path).stem}"},
+    )
+    print(f"Job created: {job.name}")
+    print(f"State: {job.state.name}")
+    return job.name
+
+
+def check_status(job_name: str):
+    """Check batch job status."""
+    from google import genai
+
+    client = genai.Client(api_key=API_KEY)
+    job = client.batches.get(name=job_name)
+    print(f"Job: {job.name}")
+    print(f"State: {job.state.name}")
+    if hasattr(job, "dest") and job.dest:
+        print(f"Result file: {job.dest.file_name}")
+    return job
+
+
+def download_results(job_name: str, output_path: str, raw_input_path: str):
+    """Download batch results and merge with raw inputs."""
+    from google import genai
+
+    client = genai.Client(api_key=API_KEY)
+    job = client.batches.get(name=job_name)
+
+    if job.state.name != "JOB_STATE_SUCCEEDED":
+        print(f"Job not complete: {job.state.name}")
+        return
+
+    print(f"Downloading results from {job.dest.file_name}...")
+    content = client.files.download(file=job.dest.file_name)
+    result_lines = content.decode("utf-8").strip().split("\n")
+    print(f"Got {len(result_lines)} result lines")
+
+    # Load raw inputs for merging
+    raw_inputs = {}
+    for i, line in enumerate(open(raw_input_path)):
+        ex = json.loads(line)
+        raw_inputs[f"req-{i}"] = ex
+
+    # Parse results
+    REQUIRED = {"gist", "summary", "content", "narrative", "concepts",
+                "structured_concepts", "significance", "emotional_tone",
+                "outcome", "salience"}
+
+    success = 0
+    fail = 0
+    results = []
+
+    for line in result_lines:
+        try:
+            result = json.loads(line)
+        except json.JSONDecodeError:
+            fail += 1
+            continue
+
+        key = result.get("key", "")
+        response = result.get("response", {})
+
+        # Extract text from response
+        try:
+            text = response["candidates"][0]["content"]["parts"][0]["text"]
+        except (KeyError, IndexError):
+            fail += 1
+            continue
+
+        # Parse JSON from response
+        text = text.strip()
+        if text.startswith("```"):
+            lines = text.split("\n")
+            lines = [l for l in lines if not l.strip().startswith("```")]
+            text = "\n".join(lines).strip()
+
+        try:
+            encoded = json.loads(text)
+        except json.JSONDecodeError:
+            # Try to find JSON in text
+            start = text.find("{")
+            end = text.rfind("}") + 1
+            if start >= 0 and end > start:
+                try:
+                    encoded = json.loads(text[start:end])
+                except json.JSONDecodeError:
+                    fail += 1
+                    continue
+            else:
+                fail += 1
+                continue
+
+        if not REQUIRED.issubset(encoded.keys()):
+            fail += 1
+            continue
+
+        raw = raw_inputs.get(key, {})
+        results.append({
+            "raw_input": raw.get("raw_input", ""),
+            "encoded": encoded,
+            "source": f"swebench_{raw.get('repo', 'unknown')}",
+            "task_type": "encoding",
+        })
+        success += 1
+
+    with open(output_path, "w") as f:
+        for r in results:
+            f.write(json.dumps(r) + "\n")
+
+    print(f"Results: {success} success, {fail} fail ({success/(success+fail)*100:.1f}% success rate)")
+    print(f"Written to: {output_path}")
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    sub = parser.add_subparsers(dest="command")
+
+    submit_p = sub.add_parser("submit")
+    submit_p.add_argument("--input", required=True, help="Raw inputs JSONL")
+
+    status_p = sub.add_parser("status")
+    status_p.add_argument("--job", required=True, help="Batch job name")
+
+    download_p = sub.add_parser("download")
+    download_p.add_argument("--job", required=True, help="Batch job name")
+    download_p.add_argument("--output", required=True, help="Output JSONL")
+    download_p.add_argument("--raw-input", required=True, help="Original raw input JSONL (for merging)")
+
+    poll_p = sub.add_parser("poll")
+    poll_p.add_argument("--job", required=True, help="Batch job name")
+    poll_p.add_argument("--output", required=True, help="Output JSONL")
+    poll_p.add_argument("--raw-input", required=True, help="Original raw input JSONL")
+    poll_p.add_argument("--interval", type=int, default=60, help="Poll interval seconds")
+
+    args = parser.parse_args()
+
+    if not API_KEY:
+        print("ERROR: LLM_API_KEY not set")
+        sys.exit(1)
+
+    if args.command == "submit":
+        batch_path = args.input.replace(".jsonl", "_batch.jsonl")
+        create_batch_file(args.input, batch_path)
+        job_name = submit_batch(batch_path)
+        print(f"\nJob submitted: {job_name}")
+        print(f"Check status: python {sys.argv[0]} status --job {job_name}")
+        print(f"Poll & download: python {sys.argv[0]} poll --job {job_name} --output OUTPUT.jsonl --raw-input {args.input}")
+
+    elif args.command == "status":
+        check_status(args.job)
+
+    elif args.command == "download":
+        download_results(args.job, args.output, args.raw_input)
+
+    elif args.command == "poll":
+        completed = {"JOB_STATE_SUCCEEDED", "JOB_STATE_FAILED", "JOB_STATE_CANCELLED", "JOB_STATE_EXPIRED"}
+        while True:
+            job = check_status(args.job)
+            if job.state.name in completed:
+                break
+            print(f"  Waiting {args.interval}s...")
+            time.sleep(args.interval)
+        if job.state.name == "JOB_STATE_SUCCEEDED":
+            download_results(args.job, args.output, args.raw_input)
+        else:
+            print(f"Job ended with state: {job.state.name}")
+
+
+if __name__ == "__main__":
+    main()