AppSprout-dev · CalebisGross · Apr 8, 2026 · Apr 4, 2026 · Apr 4, 2026 · Apr 4, 2026
diff --git a/.claude/rules/advisory-board.md b/.claude/rules/advisory-board.md
@@ -0,0 +1,14 @@
+# Advisory Board Review
+
+When making significant decisions — architecture choices, experiment design, "should we do X or Y" moments, or planning multi-step work — consult the Advisory Board framework at `~/.claude/projects/-home-hubcaps-Projects-mem/memory/persona_advisory_board.md`.
+
+Run the decision through the 19 lenses. You don't need to list all 19 every time — pick the 3-4 most relevant voices for the specific decision and present the tensions. Caleb is the tiebreaker.
+
+Triggers:
+- "what should we do"
+- "which approach"
+- "let's plan"
+- "let's brainstorm"
+- Choosing between two architectures, datasets, or approaches
+- Deciding whether to ship or keep iterating
+- Any decision that commits GPU time or DO droplet credits
diff --git a/.gitignore b/.gitignore
@@ -63,3 +63,4 @@ models/
 # llama.cpp build artifacts
 third_party/llama.cpp/build/
 *.o
+lifecycle-test
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -31,7 +31,7 @@ cmd/benchmark/         End-to-end benchmark
 cmd/benchmark-quality/ Memory quality IR benchmark
 cmd/lifecycle-test/    Full lifecycle simulation (install → 3 months)
 internal/
-  agent/               8 cognitive agents + orchestrator + reactor + forum
+  agent/               8 cognitive agents + orchestrator + reactor + forum + utilities
     perception/        Watch filesystem/terminal/clipboard, heuristic filter
     encoding/          LLM compression, concept extraction, association linking
     episoding/         Temporal episode clustering
@@ -43,11 +43,13 @@ internal/
     orchestrator/      Autonomous scheduler, health monitoring
     reactor/           Event-driven rule engine
     forum/             Agent personality system for forum communication
+    agentutil/         Shared agent utilities
   api/                 REST API server + routes
   web/                 Embedded dashboard (forum-style, modular ES modules + CSS)
   mcp/                 MCP server (24 tools for Claude Code)
   store/               Store interface + SQLite implementation
   llm/                 LLM provider interface + implementations (LM Studio, Gemini/cloud API)
+    composite.go       CompositeProvider: routes completions → spoke, embeddings → main provider
     llamacpp/          Optional embedded llama.cpp backend (CGo, build-tagged)
   ingest/              Project ingestion engine
   watcher/             Filesystem (FSEvents/fsnotify), terminal, clipboard
@@ -62,14 +64,20 @@ internal/
 sdk/                   Python agent SDK (self-evolving assistant)
   agent/evolution/     Agent evolution data (created at runtime, gitignored)
   agent/evolution/examples/  Example evolution data for reference
+models/                GGUF model files (gitignored)
+  qwen3.5-2b/         HuggingFace Qwen 3.5 2B weights
+  qwen35-2b-f16.gguf  Base Qwen 3.5 2B in GGUF format
+  qwen35-2b-spokes-f16.gguf  Qwen 3.5 2B + trained encoding spokes
 training/              Mnemonic-LM training infrastructure
-  scripts/             Training, sweep, bisection, data download scripts
+  scripts/             Training, evaluation, data generation, GGUF export
   configs/             Data mix config (pretrain_mix.yaml)
   docs/                Experiment registry, analysis docs
-  data/                Tokenized pretraining shards (gitignored)
+  data/                Training datasets (gitignored)
   sweep_results.tsv    HP sweep results log
   probe_results.tsv    Short probe results from LR bisection
-third_party/           llama.cpp submodule (for embedded LLM builds)
+third_party/           llama.cpp submodule (custom fork with Felix-LM spoke support)
+checkpoints/           Training checkpoints by experiment (gitignored)
+tests/                 End-to-end tests
 migrations/            SQLite schema migrations
 scripts/               Utility scripts
 ```
@@ -81,6 +89,7 @@ scripts/               Utility scripts
 - **Error handling:** Wrap errors with context: `fmt.Errorf("encoding memory %s: %w", id, err)`
 - **Platform-specific code:** Use Go build tags (`//go:build darwin`, `//go:build !darwin`). See `internal/watcher/filesystem/` for examples.
 - **Config:** All tunables live in `config.yaml`. Add new fields to `internal/config/config.go` struct.
+- **Spoke routing:** When a spoke provider is configured (`LLM.Spoke` in config), specific agent tasks route to the spoke model via `CompositeProvider` (completions → spoke, embeddings → main provider). Configure task routing in `config.yaml`'s `LLM.Spoke.Tasks` list. Health-checked at startup in `cmd/mnemonic/serve.go`.
 
 ## Adding Things
 
@@ -93,44 +102,29 @@ scripts/               Utility scripts
 
 | Platform | Status |
 |----------|--------|
-| macOS ARM | Full support (primary dev platform) |
-| Linux x86_64 | Supported — `serve`, `install`, `start`, `stop`, `uninstall` all work via systemd |
+| macOS ARM | Full support |
+| Linux x86_64 | Full support (primary dev platform) — systemd service, RX 7800 XT + ROCm for training/inference |
 | Windows x86_64 | Supported — `serve`, `install`, `start`, `stop`, `uninstall` work via Windows Services |
 
 ## Training (Felix-LM / Mnemonic-LM)
 
-Felix-LM is a hub-and-spoke architecture for language models. The "central post" is a frozen pretrained base model (currently Gemma 4 E2B, previously Qwen 3.5 2B). "Spokes" are lightweight low-rank adapters (~27M params, <1% overhead) injected at each decoder layer via forward hooks. The spokes are the only trainable parameters — the base model is frozen.
+Felix-LM is a hub-and-spoke architecture for language models. The "central post" is a frozen pretrained base model. "Spokes" are lightweight low-rank adapters (~25M params, <1% overhead) injected at each decoder layer. The spokes are the only trainable parameters — the base model is frozen.
 
 The architecture supports hot-swappable task-specific spoke sets: encoding spokes, synthesis spokes, retrieval spokes, all sharing the same frozen post. This is the Felix-LM vision: one backbone, many specialized tools.
 
-**Current state:** Encoding spokes achieve 100% novel schema compliance on Qwen 3.5 2B. Gemma 4 E2B training is in progress. See `training/docs/experiment_registry.md` for the full experiment history (EXP-1 through EXP-19).
+**Current state:** Qwen 3.5 2B is the production encoding model (100% schema, 7/7 stress test). Deployed via custom llama.cpp fork at 95 tok/s on RX 7800 XT. Gemma 4 E2B explored but slower locally. See `training/docs/experiment_registry.md` for EXP-1 through EXP-21.
 
-Training scripts live in `training/scripts/` and require the **Felix-LM venv**:
+### Inference
 
-```bash
-source ~/Projects/felixlm/.venv/bin/activate
-```
-
-Key scripts:
-
-- `train_qwen_spokes.py` — Main training script (supports `--model-type qwen|gemma`)
-- `qwen_spoke_adapter.py` — Qwen 3.5 2B spoke adapter + shared SpokeLayer class
-- `gemma_spoke_adapter.py` — Gemma 4 E2B spoke adapter
-- `eval_qwen_encoding.py` — Novel input evaluation (needs Gemma 4 support)
-- `batch_encode.py` — Gemini Batch API pipeline for scalable training data generation
-- `enrich_and_generate.py` — Async Gemini data enrichment + synthetic generation
-- `extract_prenuke_data.py` — Extract training data from pre-nuke DB backup
-- `merge_training_data.py` — Merge, dedup, and split training datasets
+Custom llama.cpp fork (`third_party/llama.cpp/`) with Felix-LM spoke support in `src/models/qwen35.cpp`. Spoke GGUF at `models/qwen35-2b-spokes-f16.gguf`. Build with `-DGGML_HIP=ON`. Export via `training/scripts/export_qwen35_spokes.py`.
 
-Key data:
+### Training
 
-- `training/data/finetune_gemma4_v5/` — Current Gemma 4 training data (9,945 train / 1,105 eval, encoding-only)
-- `training/data/finetune_qwen_v5_encoding_only/` — Qwen training data (11,436 train / 1,270 eval)
-- `training/data/finetune_qwen_v2/` — Original clean dataset (4,566 train / 507 eval)
+Scripts in `training/scripts/`, require `source ~/Projects/felixlm/.venv/bin/activate`. Core: `train_qwen_spokes.py`, `qwen_spoke_adapter.py`, `export_qwen35_spokes.py`. Data gen: `batch_encode.py`, `validate.py`. Eval: `eval_qwen_encoding.py`, `stress_test_hallucination.py`, `compare_models.py`. Research: `turboquant.py` (KV cache compression).
 
-The Felix-LM design paper is at `~/Projects/felixlm/docs/felix_lm_design.tex`. The spoke implementation originated in `~/Projects/felixlm/felix_lm/v3/spokes.py` and `~/Projects/nanochat/nanochat/gpt.py`.
+Current dataset: `training/data/finetune_qwen_v6/` (4,255 train / 472 eval). Design paper: `~/Projects/felixlm/docs/felix_lm_design.tex`.
 
-All experiments must be pre-registered in `training/docs/experiment_registry.md` before running. See `.claude/rules/scientific-method.md` and `.claude/rules/experiment-logging.md`.
+All experiments must be pre-registered in `training/docs/experiment_registry.md`. See `.claude/rules/scientific-method.md` and `.claude/rules/experiment-logging.md`.
 
 ## Known Issues
 

diff --git a/cmd/benchmark-quality/main.go b/cmd/benchmark-quality/main.go
@@ -8,6 +8,7 @@ import (
 	"math"
 	"os"
 	"path/filepath"
+	"strings"
 	"time"
 
 	"github.com/appsprout-dev/mnemonic/internal/agent/abstraction"
@@ -100,8 +101,9 @@ func main() {
 			fmt.Fprintf(os.Stderr, "Error loading config: %v\n", cfgErr)
 			os.Exit(1)
 		}
-		if cfg.LLM.APIKey == "" {
-			fmt.Fprintln(os.Stderr, "Error: LLM_API_KEY environment variable is required for --llm mode")
+		isLocal := strings.Contains(cfg.LLM.Endpoint, "localhost") || strings.Contains(cfg.LLM.Endpoint, "127.0.0.1")
+		if cfg.LLM.APIKey == "" && !isLocal {
+			fmt.Fprintln(os.Stderr, "Error: LLM_API_KEY environment variable is required for --llm mode (not required for localhost)")
 			os.Exit(1)
 		}
 		provider = llm.NewLMStudioProvider(

diff --git a/cmd/lifecycle-test/main.go b/cmd/lifecycle-test/main.go
@@ -26,6 +26,7 @@ func main() {
 		skipFlag       string
 		checkpointDir  string
 		fromCheckpoint string
+		months         int
 	)
 
 	flag.BoolVar(&verbose, "verbose", false, "verbose output")
@@ -36,8 +37,14 @@ func main() {
 	flag.StringVar(&skipFlag, "skip", "", "comma-separated phases to skip")
 	flag.StringVar(&checkpointDir, "checkpoint", "", "save DB snapshot after each phase to this directory")
 	flag.StringVar(&fromCheckpoint, "from-checkpoint", "", "load DB from checkpoint file instead of creating fresh")
+	flag.IntVar(&months, "months", 3, "number of months to simulate in the growth phase (1-12)")
 	flag.Parse()
 
+	if months < 1 || months > 12 {
+		fmt.Fprintf(os.Stderr, "Error: --months must be between 1 and 12\n")
+		os.Exit(1)
+	}
+
 	logLevel := slog.LevelError
 	if verbose {
 		logLevel = slog.LevelDebug
@@ -53,8 +60,9 @@ func main() {
 			fmt.Fprintf(os.Stderr, "Error loading config: %v\n", err)
 			os.Exit(1)
 		}
-		if cfg.LLM.APIKey == "" {
-			fmt.Fprintln(os.Stderr, "Error: LLM_API_KEY environment variable is required for --llm mode")
+		isLocal := strings.Contains(cfg.LLM.Endpoint, "localhost") || strings.Contains(cfg.LLM.Endpoint, "127.0.0.1")
+		if cfg.LLM.APIKey == "" && !isLocal {
+			fmt.Fprintln(os.Stderr, "Error: LLM_API_KEY environment variable is required for --llm mode (not required for localhost)")
 			os.Exit(1)
 		}
 		provider = llm.NewLMStudioProvider(
@@ -91,14 +99,14 @@ func main() {
 		&PhaseDaily{},
 		&PhaseConsolidation{},
 		&PhaseDreaming{},
-		&PhaseGrowth{},
+		&PhaseGrowth{Months: months},
 		&PhaseLongterm{},
 	}
 
 	// Header.
 	fmt.Println()
 	fmt.Println("  Mnemonic Lifecycle Simulation")
-	fmt.Printf("  Version: %s  |  LLM: %s  |  Phases: %d\n", Version, llmLabel, len(allPhases))
+	fmt.Printf("  Version: %s  |  LLM: %s  |  Phases: %d  |  Months: %d\n", Version, llmLabel, len(allPhases), months)
 	fmt.Println()
 
 	ctx := context.Background()

diff --git a/cmd/lifecycle-test/phase_growth.go b/cmd/lifecycle-test/phase_growth.go
@@ -9,8 +9,11 @@ import (
 	"github.com/appsprout-dev/mnemonic/internal/agent/retrieval"
 )
 
-// PhaseGrowth scales the system to 700-1000 memories over simulated months 1-3.
-type PhaseGrowth struct{}
+// PhaseGrowth scales the system over simulated months, generating ~200 memories per month.
+// Months defaults to 3 if unset.
+type PhaseGrowth struct {
+	Months int
+}
 
 func (p *PhaseGrowth) Name() string { return "growth" }
 
@@ -23,8 +26,13 @@ func (p *PhaseGrowth) Run(ctx context.Context, h *Harness, verbose bool) (*Phase
 	rng := rand.New(rand.NewSource(99))
 	totalAdded := 0
 
-	// Simulate months 1-3: generate ~200 memories per month in weekly batches.
-	for month := 1; month <= 3; month++ {
+	months := p.Months
+	if months <= 0 {
+		months = 3
+	}
+
+	// Simulate months: generate ~200 memories per month in weekly batches.
+	for month := 1; month <= months; month++ {
 		for week := 0; week < 4; week++ {
 			h.Clock.Advance(7 * 24 * time.Hour)
 

diff --git a/cmd/mnemonic/serve.go b/cmd/mnemonic/serve.go
@@ -246,8 +246,53 @@ func serveCommand(configPath string) {
 	if cfg.LLM.Provider == "embedded" && cfg.LLM.Embedded.ChatModelFile != "" {
 		modelLabel = cfg.LLM.Embedded.ChatModelFile
 	}
+
+	// Set up spoke provider if configured. When enabled, specific agent tasks
+	// (e.g. "encoding") use the local spoke model for completions while the
+	// main provider handles embeddings.
+	var spokeProvider llm.Provider
+	spokeTasks := make(map[string]bool)
+	if cfg.LLM.Spoke.Enabled {
+		timeout := time.Duration(cfg.LLM.Spoke.TimeoutSec) * time.Second
+		if timeout <= 0 {
+			timeout = 120 * time.Second
+		}
+		maxConc := cfg.LLM.Spoke.MaxConcurrent
+		if maxConc <= 0 {
+			maxConc = 1
+		}
+		spokeProvider = llm.NewLMStudioProvider(
+			cfg.LLM.Spoke.Endpoint,
+			cfg.LLM.Spoke.Model,
+			"", // spoke server doesn't need a separate embedding model name
+			"", // no API key for local spoke
+			timeout,
+			maxConc,
+		)
+		spokeCtx, spokeCancel := context.WithTimeout(context.Background(), 10*time.Second)
+		if err := spokeProvider.Health(spokeCtx); err != nil {
+			log.Error("spoke provider unavailable", "endpoint", cfg.LLM.Spoke.Endpoint, "error", err)
+			fmt.Fprintf(os.Stderr, "\n%s✘ ERROR: Spoke provider is not reachable at %s%s\n", colorRed, cfg.LLM.Spoke.Endpoint, colorReset)
+			fmt.Fprintf(os.Stderr, "  Start the spoke server: python serve_spokes.py --spokes <checkpoint>\n\n")
+			spokeCancel()
+			return
+		}
+		spokeCancel()
+		for _, task := range cfg.LLM.Spoke.Tasks {
+			spokeTasks[task] = true
+		}
+		log.Info("spoke provider ready", "endpoint", cfg.LLM.Spoke.Endpoint, "model", cfg.LLM.Spoke.Model, "tasks", cfg.LLM.Spoke.Tasks)
+	}
+
 	wrap := func(caller string) llm.Provider {
-		var p llm.Provider = llm.NewInstrumentedProvider(llmProvider, memStore, caller, modelLabel)
+		var base llm.Provider
+		if spokeProvider != nil && spokeTasks[caller] {
+			// Route completions to spoke, embeddings to main provider
+			base = llm.NewCompositeProvider(spokeProvider, llmProvider)
+		} else {
+			base = llmProvider
+		}
+		var p llm.Provider = llm.NewInstrumentedProvider(base, memStore, caller, modelLabel)
 		if cfg.Training.CaptureEnabled && cfg.Training.CaptureDir != "" {
 			p = llm.NewTrainingCaptureProvider(p, caller, cfg.Training.CaptureDir)
 		}

diff --git a/internal/agent/consolidation/agent.go b/internal/agent/consolidation/agent.go
@@ -103,7 +103,6 @@ func DefaultConfig() ConsolidationConfig {
 	}
 }
 
-
 // ConsolidationAgent performs periodic memory consolidation — the "sleeping brain."
 // Each cycle: decay salience → transition states → prune associations → merge clusters → delete expired.
 type ConsolidationAgent struct {
@@ -403,6 +402,13 @@ func (ca *ConsolidationAgent) decaySalience(ctx context.Context) (decayed, proce
 	for _, mem := range allMemories {
 		processed++
 
+		// Skip handoff memories — their value is temporal, not usage-validated.
+		// They are already exempt from lossy merging (mergeClusters) and should
+		// maintain their initial salience so newest-first ordering works reliably.
+		if mem.Type == "handoff" {
+			continue
+		}
+
 		// Calculate recency factor: recently accessed memories decay slower
 		hoursSinceAccess := time.Since(mem.LastAccessed).Hours()
 		if mem.LastAccessed.IsZero() {
@@ -532,6 +538,16 @@ func (ca *ConsolidationAgent) mergeClusters(ctx context.Context) (int, error) {
 		return 0, err
 	}
 
+	// Exclude handoff memories — they contain unique per-session details
+	// that must not be merged into a lossy gist.
+	filtered := memories[:0]
+	for _, m := range memories {
+		if m.Type != "handoff" {
+			filtered = append(filtered, m)
+		}
+	}
+	memories = filtered
+
 	if len(memories) < ca.config.MinClusterSize {
 		return 0, nil // Not enough memories to form clusters
 	}
@@ -721,7 +737,7 @@ Respond with ONLY a JSON object:
 	now := time.Now()
 	return store.Memory{
 		ID:           uuid.New().String(),
-		RawID:        cluster[0].RawID, // reference first source
+		RawID:        "", // gist has no raw source (cluster sources tracked via gist_of)
 		Timestamp:    now,
 		Content:      gistContent,
 		Summary:      gistSummary,

diff --git a/internal/agent/encoding/agent.go b/internal/agent/encoding/agent.go
@@ -113,7 +113,7 @@ func DefaultConfig() EncodingConfig {
 		MaxSimilarSearchResults:   5,
 		EmbeddingModel:            "default",
 		CompletionModel:           "default",
-		CompletionMaxTokens:       1024,
+		CompletionMaxTokens:       4096,
 		CompletionTemperature:     0.3,
 		MaxConcurrentEncodings:    1,
 		EnableLLMClassification:   false,
@@ -1233,7 +1233,7 @@ Fill in every JSON field based on the actual file content below:
 - content: A compressed description of what the file contains and how it works.
 - narrative: The file's role in the project architecture and why it matters.
 - concepts: 3-5 keywords describing the file's domain. PREFER exact terms from the vocabulary list below; only use new terms if no vocabulary term fits.
-- structured_concepts: Extract topics, entities, actions, and causal relationships from the file.
+- structured_concepts: Extract topics, entities, actions, and causal relationships. Keep each array to 3-5 items max. Use short strings, not sentences.
 - significance: One of routine, notable, important, or critical.
 - emotional_tone: neutral.
 - outcome: success.
@@ -1249,7 +1249,7 @@ Fill in every JSON field based on the actual event content below:
 - content: The key details someone would need to understand this event later.
 - narrative: The story of what happened including context and meaning.
 - concepts: 3-5 keywords about the event. PREFER exact terms from the vocabulary list below; only use new terms if no vocabulary term fits.
-- structured_concepts: Extract topics, entities, actions, and causal relationships from the event.
+- structured_concepts: Extract topics, entities, actions, and causal relationships. Keep each array to 3-5 items max. Use short strings, not sentences.
 - significance: One of routine, notable, important, or critical.
 - emotional_tone: One of neutral, satisfying, frustrating, exciting, or concerning.
 - outcome: One of success, failure, ongoing, or unknown.

diff --git a/internal/agent/encoding/agent_test.go b/internal/agent/encoding/agent_test.go
@@ -245,8 +245,8 @@ func TestDefaultConfig(t *testing.T) {
 	if cfg.MaxSimilarSearchResults != 5 {
 		t.Errorf("expected max similar 5, got %d", cfg.MaxSimilarSearchResults)
 	}
-	if cfg.CompletionMaxTokens != 1024 {
-		t.Errorf("expected max tokens 1024, got %d", cfg.CompletionMaxTokens)
+	if cfg.CompletionMaxTokens != 4096 {
+		t.Errorf("expected max tokens 4096, got %d", cfg.CompletionMaxTokens)
 	}
 	if cfg.CompletionTemperature != 0.3 {
 		t.Errorf("expected temperature 0.3, got %v", cfg.CompletionTemperature)