Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
6d75333
feat: EXP-20 data quality pipeline, targeted data generation, MI300X …
CalebisGross Apr 4, 2026
0c1c5d1
fix: stress test --checkpoint arg, batch_encode model upgrade, misc f…
CalebisGross Apr 4, 2026
3ebecc1
feat: 210 mnemonic-specific scenarios, bespoke generator, fix encodin…
CalebisGross Apr 4, 2026
27a400b
fix: sparse templates with proper gist mapping, dedup to 51 unique
CalebisGross Apr 4, 2026
b1bfd96
feat: distribution balance data gen, fix batch_encode source preserva…
CalebisGross Apr 4, 2026
79ed030
feat: procedural generator + 96 handwritten mnemonic scenarios v2
CalebisGross Apr 4, 2026
304d884
feat: v6 smoke test 7/7 stress, add advisory board rule
CalebisGross Apr 4, 2026
040c596
feat: update EXP-20 config, pre-register EXP-21 (bottleneck rotation)
CalebisGross Apr 4, 2026
f51db44
feat: spoke routing infrastructure, llama.cpp inference, TurboQuant r…
CalebisGross Apr 6, 2026
e9fbfaa
feat: fix spoke GGUF export, gist merge bug, bump token limits to 4096
CalebisGross Apr 6, 2026
ead434e
chore: gitignore lifecycle-test artifacts
CalebisGross Apr 6, 2026
f8ccf51
feat: TurboQuant prompt cache compression, EXP-22 registration
CalebisGross Apr 6, 2026
042a1e3
fix: gist merge FK violation, ambiguous column in FTS concept search
CalebisGross Apr 6, 2026
834e9ff
docs: split EXP-20 into 20a (Qwen local) and 20b (Gemma MI300X)
CalebisGross Apr 6, 2026
dc42349
feat: MI300X Gemma 4 E2B training infrastructure, wandb logging
CalebisGross Apr 7, 2026
cd9e6c7
fix: stress test Gemma support, batched generation, JSON parser
CalebisGross Apr 7, 2026
f96dbbb
feat: add Gemma 4 E2B spoke GGUF export script
CalebisGross Apr 7, 2026
ba8e66d
feat: RotorQ RQ4 quantizer + benchmark scripts
CalebisGross Apr 7, 2026
0ca58bf
fix: handoff recall, type-filtered search, consolidation exclusions
CalebisGross Apr 7, 2026
dc6dabe
feat: add conciseness guidance for structured_concepts encoding
CalebisGross Apr 7, 2026
b603dbc
feat: RQ4 GPU inference, RQ3 experiment, spoke fusion, fused GGUF export
CalebisGross Apr 7, 2026
bcf040e
test: RQ4 lifecycle test config and quality test script
CalebisGross Apr 7, 2026
5b59868
chore: go fmt trailing whitespace
CalebisGross Apr 7, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .claude/rules/advisory-board.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Advisory Board Review

When making significant decisions — architecture choices, experiment design, "should we do X or Y" moments, or planning multi-step work — consult the Advisory Board framework at `~/.claude/projects/-home-hubcaps-Projects-mem/memory/persona_advisory_board.md`.

Run the decision through the 19 lenses. You don't need to list all 19 every time — pick the 3-4 most relevant voices for the specific decision and present the tensions. Caleb is the tiebreaker.

Triggers:
- "what should we do"
- "which approach"
- "let's plan"
- "let's brainstorm"
- Choosing between two architectures, datasets, or approaches
- Deciding whether to ship or keep iterating
- Any decision that commits GPU time or DO droplet credits
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,4 @@ models/
# llama.cpp build artifacts
third_party/llama.cpp/build/
*.o
lifecycle-test
52 changes: 23 additions & 29 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ cmd/benchmark/ End-to-end benchmark
cmd/benchmark-quality/ Memory quality IR benchmark
cmd/lifecycle-test/ Full lifecycle simulation (install → 3 months)
internal/
agent/ 8 cognitive agents + orchestrator + reactor + forum
agent/ 8 cognitive agents + orchestrator + reactor + forum + utilities
perception/ Watch filesystem/terminal/clipboard, heuristic filter
encoding/ LLM compression, concept extraction, association linking
episoding/ Temporal episode clustering
Expand All @@ -43,11 +43,13 @@ internal/
orchestrator/ Autonomous scheduler, health monitoring
reactor/ Event-driven rule engine
forum/ Agent personality system for forum communication
agentutil/ Shared agent utilities
api/ REST API server + routes
web/ Embedded dashboard (forum-style, modular ES modules + CSS)
mcp/ MCP server (24 tools for Claude Code)
store/ Store interface + SQLite implementation
llm/ LLM provider interface + implementations (LM Studio, Gemini/cloud API)
composite.go CompositeProvider: routes completions → spoke, embeddings → main provider
llamacpp/ Optional embedded llama.cpp backend (CGo, build-tagged)
ingest/ Project ingestion engine
watcher/ Filesystem (FSEvents/fsnotify), terminal, clipboard
Expand All @@ -62,14 +64,20 @@ internal/
sdk/ Python agent SDK (self-evolving assistant)
agent/evolution/ Agent evolution data (created at runtime, gitignored)
agent/evolution/examples/ Example evolution data for reference
models/ GGUF model files (gitignored)
qwen3.5-2b/ HuggingFace Qwen 3.5 2B weights
qwen35-2b-f16.gguf Base Qwen 3.5 2B in GGUF format
qwen35-2b-spokes-f16.gguf Qwen 3.5 2B + trained encoding spokes
training/ Mnemonic-LM training infrastructure
scripts/ Training, sweep, bisection, data download scripts
scripts/ Training, evaluation, data generation, GGUF export
configs/ Data mix config (pretrain_mix.yaml)
docs/ Experiment registry, analysis docs
data/ Tokenized pretraining shards (gitignored)
data/ Training datasets (gitignored)
sweep_results.tsv HP sweep results log
probe_results.tsv Short probe results from LR bisection
third_party/ llama.cpp submodule (for embedded LLM builds)
third_party/ llama.cpp submodule (custom fork with Felix-LM spoke support)
checkpoints/ Training checkpoints by experiment (gitignored)
tests/ End-to-end tests
migrations/ SQLite schema migrations
scripts/ Utility scripts
```
Expand All @@ -81,6 +89,7 @@ scripts/ Utility scripts
- **Error handling:** Wrap errors with context: `fmt.Errorf("encoding memory %s: %w", id, err)`
- **Platform-specific code:** Use Go build tags (`//go:build darwin`, `//go:build !darwin`). See `internal/watcher/filesystem/` for examples.
- **Config:** All tunables live in `config.yaml`. Add new fields to `internal/config/config.go` struct.
- **Spoke routing:** When a spoke provider is configured (`LLM.Spoke` in config), specific agent tasks route to the spoke model via `CompositeProvider` (completions → spoke, embeddings → main provider). Configure task routing in `config.yaml`'s `LLM.Spoke.Tasks` list. Health-checked at startup in `cmd/mnemonic/serve.go`.

## Adding Things

Expand All @@ -93,44 +102,29 @@ scripts/ Utility scripts

| Platform | Status |
|----------|--------|
| macOS ARM | Full support (primary dev platform) |
| Linux x86_64 | Supported — `serve`, `install`, `start`, `stop`, `uninstall` all work via systemd |
| macOS ARM | Full support |
| Linux x86_64 | Full support (primary dev platform) — systemd service, RX 7800 XT + ROCm for training/inference |
| Windows x86_64 | Supported — `serve`, `install`, `start`, `stop`, `uninstall` work via Windows Services |

## Training (Felix-LM / Mnemonic-LM)

Felix-LM is a hub-and-spoke architecture for language models. The "central post" is a frozen pretrained base model (currently Gemma 4 E2B, previously Qwen 3.5 2B). "Spokes" are lightweight low-rank adapters (~27M params, <1% overhead) injected at each decoder layer via forward hooks. The spokes are the only trainable parameters — the base model is frozen.
Felix-LM is a hub-and-spoke architecture for language models. The "central post" is a frozen pretrained base model. "Spokes" are lightweight low-rank adapters (~25M params, <1% overhead) injected at each decoder layer. The spokes are the only trainable parameters — the base model is frozen.

The architecture supports hot-swappable task-specific spoke sets: encoding spokes, synthesis spokes, retrieval spokes, all sharing the same frozen post. This is the Felix-LM vision: one backbone, many specialized tools.

**Current state:** Encoding spokes achieve 100% novel schema compliance on Qwen 3.5 2B. Gemma 4 E2B training is in progress. See `training/docs/experiment_registry.md` for the full experiment history (EXP-1 through EXP-19).
**Current state:** Qwen 3.5 2B is the production encoding model (100% schema, 7/7 stress test). Deployed via custom llama.cpp fork at 95 tok/s on RX 7800 XT. Gemma 4 E2B explored but slower locally. See `training/docs/experiment_registry.md` for EXP-1 through EXP-21.

Training scripts live in `training/scripts/` and require the **Felix-LM venv**:
### Inference

```bash
source ~/Projects/felixlm/.venv/bin/activate
```

Key scripts:

- `train_qwen_spokes.py` — Main training script (supports `--model-type qwen|gemma`)
- `qwen_spoke_adapter.py` — Qwen 3.5 2B spoke adapter + shared SpokeLayer class
- `gemma_spoke_adapter.py` — Gemma 4 E2B spoke adapter
- `eval_qwen_encoding.py` — Novel input evaluation (needs Gemma 4 support)
- `batch_encode.py` — Gemini Batch API pipeline for scalable training data generation
- `enrich_and_generate.py` — Async Gemini data enrichment + synthetic generation
- `extract_prenuke_data.py` — Extract training data from pre-nuke DB backup
- `merge_training_data.py` — Merge, dedup, and split training datasets
Custom llama.cpp fork (`third_party/llama.cpp/`) with Felix-LM spoke support in `src/models/qwen35.cpp`. Spoke GGUF at `models/qwen35-2b-spokes-f16.gguf`. Build with `-DGGML_HIP=ON`. Export via `training/scripts/export_qwen35_spokes.py`.

Key data:
### Training

- `training/data/finetune_gemma4_v5/` — Current Gemma 4 training data (9,945 train / 1,105 eval, encoding-only)
- `training/data/finetune_qwen_v5_encoding_only/` — Qwen training data (11,436 train / 1,270 eval)
- `training/data/finetune_qwen_v2/` — Original clean dataset (4,566 train / 507 eval)
Scripts in `training/scripts/`, require `source ~/Projects/felixlm/.venv/bin/activate`. Core: `train_qwen_spokes.py`, `qwen_spoke_adapter.py`, `export_qwen35_spokes.py`. Data gen: `batch_encode.py`, `validate.py`. Eval: `eval_qwen_encoding.py`, `stress_test_hallucination.py`, `compare_models.py`. Research: `turboquant.py` (KV cache compression).

The Felix-LM design paper is at `~/Projects/felixlm/docs/felix_lm_design.tex`. The spoke implementation originated in `~/Projects/felixlm/felix_lm/v3/spokes.py` and `~/Projects/nanochat/nanochat/gpt.py`.
Current dataset: `training/data/finetune_qwen_v6/` (4,255 train / 472 eval). Design paper: `~/Projects/felixlm/docs/felix_lm_design.tex`.

All experiments must be pre-registered in `training/docs/experiment_registry.md` before running. See `.claude/rules/scientific-method.md` and `.claude/rules/experiment-logging.md`.
All experiments must be pre-registered in `training/docs/experiment_registry.md`. See `.claude/rules/scientific-method.md` and `.claude/rules/experiment-logging.md`.

## Known Issues

Expand Down
6 changes: 4 additions & 2 deletions cmd/benchmark-quality/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"math"
"os"
"path/filepath"
"strings"
"time"

"github.com/appsprout-dev/mnemonic/internal/agent/abstraction"
Expand Down Expand Up @@ -100,8 +101,9 @@ func main() {
fmt.Fprintf(os.Stderr, "Error loading config: %v\n", cfgErr)
os.Exit(1)
}
if cfg.LLM.APIKey == "" {
fmt.Fprintln(os.Stderr, "Error: LLM_API_KEY environment variable is required for --llm mode")
isLocal := strings.Contains(cfg.LLM.Endpoint, "localhost") || strings.Contains(cfg.LLM.Endpoint, "127.0.0.1")
if cfg.LLM.APIKey == "" && !isLocal {
fmt.Fprintln(os.Stderr, "Error: LLM_API_KEY environment variable is required for --llm mode (not required for localhost)")
os.Exit(1)
}
provider = llm.NewLMStudioProvider(
Expand Down
16 changes: 12 additions & 4 deletions cmd/lifecycle-test/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ func main() {
skipFlag string
checkpointDir string
fromCheckpoint string
months int
)

flag.BoolVar(&verbose, "verbose", false, "verbose output")
Expand All @@ -36,8 +37,14 @@ func main() {
flag.StringVar(&skipFlag, "skip", "", "comma-separated phases to skip")
flag.StringVar(&checkpointDir, "checkpoint", "", "save DB snapshot after each phase to this directory")
flag.StringVar(&fromCheckpoint, "from-checkpoint", "", "load DB from checkpoint file instead of creating fresh")
flag.IntVar(&months, "months", 3, "number of months to simulate in the growth phase (1-12)")
flag.Parse()

if months < 1 || months > 12 {
fmt.Fprintf(os.Stderr, "Error: --months must be between 1 and 12\n")
os.Exit(1)
}

logLevel := slog.LevelError
if verbose {
logLevel = slog.LevelDebug
Expand All @@ -53,8 +60,9 @@ func main() {
fmt.Fprintf(os.Stderr, "Error loading config: %v\n", err)
os.Exit(1)
}
if cfg.LLM.APIKey == "" {
fmt.Fprintln(os.Stderr, "Error: LLM_API_KEY environment variable is required for --llm mode")
isLocal := strings.Contains(cfg.LLM.Endpoint, "localhost") || strings.Contains(cfg.LLM.Endpoint, "127.0.0.1")
if cfg.LLM.APIKey == "" && !isLocal {
fmt.Fprintln(os.Stderr, "Error: LLM_API_KEY environment variable is required for --llm mode (not required for localhost)")
os.Exit(1)
}
provider = llm.NewLMStudioProvider(
Expand Down Expand Up @@ -91,14 +99,14 @@ func main() {
&PhaseDaily{},
&PhaseConsolidation{},
&PhaseDreaming{},
&PhaseGrowth{},
&PhaseGrowth{Months: months},
&PhaseLongterm{},
}

// Header.
fmt.Println()
fmt.Println(" Mnemonic Lifecycle Simulation")
fmt.Printf(" Version: %s | LLM: %s | Phases: %d\n", Version, llmLabel, len(allPhases))
fmt.Printf(" Version: %s | LLM: %s | Phases: %d | Months: %d\n", Version, llmLabel, len(allPhases), months)
fmt.Println()

ctx := context.Background()
Expand Down
16 changes: 12 additions & 4 deletions cmd/lifecycle-test/phase_growth.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,11 @@ import (
"github.com/appsprout-dev/mnemonic/internal/agent/retrieval"
)

// PhaseGrowth scales the system to 700-1000 memories over simulated months 1-3.
type PhaseGrowth struct{}
// PhaseGrowth scales the system over simulated months, generating ~200 memories per month.
// Months defaults to 3 if unset.
type PhaseGrowth struct {
Months int
}

func (p *PhaseGrowth) Name() string { return "growth" }

Expand All @@ -23,8 +26,13 @@ func (p *PhaseGrowth) Run(ctx context.Context, h *Harness, verbose bool) (*Phase
rng := rand.New(rand.NewSource(99))
totalAdded := 0

// Simulate months 1-3: generate ~200 memories per month in weekly batches.
for month := 1; month <= 3; month++ {
months := p.Months
if months <= 0 {
months = 3
}

// Simulate months: generate ~200 memories per month in weekly batches.
for month := 1; month <= months; month++ {
for week := 0; week < 4; week++ {
h.Clock.Advance(7 * 24 * time.Hour)

Expand Down
47 changes: 46 additions & 1 deletion cmd/mnemonic/serve.go
Original file line number Diff line number Diff line change
Expand Up @@ -246,8 +246,53 @@ func serveCommand(configPath string) {
if cfg.LLM.Provider == "embedded" && cfg.LLM.Embedded.ChatModelFile != "" {
modelLabel = cfg.LLM.Embedded.ChatModelFile
}

// Set up spoke provider if configured. When enabled, specific agent tasks
// (e.g. "encoding") use the local spoke model for completions while the
// main provider handles embeddings.
var spokeProvider llm.Provider
spokeTasks := make(map[string]bool)
if cfg.LLM.Spoke.Enabled {
timeout := time.Duration(cfg.LLM.Spoke.TimeoutSec) * time.Second
if timeout <= 0 {
timeout = 120 * time.Second
}
maxConc := cfg.LLM.Spoke.MaxConcurrent
if maxConc <= 0 {
maxConc = 1
}
spokeProvider = llm.NewLMStudioProvider(
cfg.LLM.Spoke.Endpoint,
cfg.LLM.Spoke.Model,
"", // spoke server doesn't need a separate embedding model name
"", // no API key for local spoke
timeout,
maxConc,
)
spokeCtx, spokeCancel := context.WithTimeout(context.Background(), 10*time.Second)
if err := spokeProvider.Health(spokeCtx); err != nil {
log.Error("spoke provider unavailable", "endpoint", cfg.LLM.Spoke.Endpoint, "error", err)
fmt.Fprintf(os.Stderr, "\n%s✘ ERROR: Spoke provider is not reachable at %s%s\n", colorRed, cfg.LLM.Spoke.Endpoint, colorReset)
fmt.Fprintf(os.Stderr, " Start the spoke server: python serve_spokes.py --spokes <checkpoint>\n\n")
spokeCancel()
return
}
spokeCancel()
for _, task := range cfg.LLM.Spoke.Tasks {
spokeTasks[task] = true
}
log.Info("spoke provider ready", "endpoint", cfg.LLM.Spoke.Endpoint, "model", cfg.LLM.Spoke.Model, "tasks", cfg.LLM.Spoke.Tasks)
}

wrap := func(caller string) llm.Provider {
var p llm.Provider = llm.NewInstrumentedProvider(llmProvider, memStore, caller, modelLabel)
var base llm.Provider
if spokeProvider != nil && spokeTasks[caller] {
// Route completions to spoke, embeddings to main provider
base = llm.NewCompositeProvider(spokeProvider, llmProvider)
} else {
base = llmProvider
}
var p llm.Provider = llm.NewInstrumentedProvider(base, memStore, caller, modelLabel)
if cfg.Training.CaptureEnabled && cfg.Training.CaptureDir != "" {
p = llm.NewTrainingCaptureProvider(p, caller, cfg.Training.CaptureDir)
}
Expand Down
20 changes: 18 additions & 2 deletions internal/agent/consolidation/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,6 @@ func DefaultConfig() ConsolidationConfig {
}
}


// ConsolidationAgent performs periodic memory consolidation — the "sleeping brain."
// Each cycle: decay salience → transition states → prune associations → merge clusters → delete expired.
type ConsolidationAgent struct {
Expand Down Expand Up @@ -403,6 +402,13 @@ func (ca *ConsolidationAgent) decaySalience(ctx context.Context) (decayed, proce
for _, mem := range allMemories {
processed++

// Skip handoff memories — their value is temporal, not usage-validated.
// They are already exempt from lossy merging (mergeClusters) and should
// maintain their initial salience so newest-first ordering works reliably.
if mem.Type == "handoff" {
continue
}

// Calculate recency factor: recently accessed memories decay slower
hoursSinceAccess := time.Since(mem.LastAccessed).Hours()
if mem.LastAccessed.IsZero() {
Expand Down Expand Up @@ -532,6 +538,16 @@ func (ca *ConsolidationAgent) mergeClusters(ctx context.Context) (int, error) {
return 0, err
}

// Exclude handoff memories — they contain unique per-session details
// that must not be merged into a lossy gist.
filtered := memories[:0]
for _, m := range memories {
if m.Type != "handoff" {
filtered = append(filtered, m)
}
}
memories = filtered

if len(memories) < ca.config.MinClusterSize {
return 0, nil // Not enough memories to form clusters
}
Expand Down Expand Up @@ -721,7 +737,7 @@ Respond with ONLY a JSON object:
now := time.Now()
return store.Memory{
ID: uuid.New().String(),
RawID: cluster[0].RawID, // reference first source
RawID: "", // gist has no raw source (cluster sources tracked via gist_of)
Timestamp: now,
Content: gistContent,
Summary: gistSummary,
Expand Down
6 changes: 3 additions & 3 deletions internal/agent/encoding/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ func DefaultConfig() EncodingConfig {
MaxSimilarSearchResults: 5,
EmbeddingModel: "default",
CompletionModel: "default",
CompletionMaxTokens: 1024,
CompletionMaxTokens: 4096,
CompletionTemperature: 0.3,
MaxConcurrentEncodings: 1,
EnableLLMClassification: false,
Expand Down Expand Up @@ -1233,7 +1233,7 @@ Fill in every JSON field based on the actual file content below:
- content: A compressed description of what the file contains and how it works.
- narrative: The file's role in the project architecture and why it matters.
- concepts: 3-5 keywords describing the file's domain. PREFER exact terms from the vocabulary list below; only use new terms if no vocabulary term fits.
- structured_concepts: Extract topics, entities, actions, and causal relationships from the file.
- structured_concepts: Extract topics, entities, actions, and causal relationships. Keep each array to 3-5 items max. Use short strings, not sentences.
- significance: One of routine, notable, important, or critical.
- emotional_tone: neutral.
- outcome: success.
Expand All @@ -1249,7 +1249,7 @@ Fill in every JSON field based on the actual event content below:
- content: The key details someone would need to understand this event later.
- narrative: The story of what happened including context and meaning.
- concepts: 3-5 keywords about the event. PREFER exact terms from the vocabulary list below; only use new terms if no vocabulary term fits.
- structured_concepts: Extract topics, entities, actions, and causal relationships from the event.
- structured_concepts: Extract topics, entities, actions, and causal relationships. Keep each array to 3-5 items max. Use short strings, not sentences.
- significance: One of routine, notable, important, or critical.
- emotional_tone: One of neutral, satisfying, frustrating, exciting, or concerning.
- outcome: One of success, failure, ongoing, or unknown.
Expand Down
4 changes: 2 additions & 2 deletions internal/agent/encoding/agent_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -245,8 +245,8 @@ func TestDefaultConfig(t *testing.T) {
if cfg.MaxSimilarSearchResults != 5 {
t.Errorf("expected max similar 5, got %d", cfg.MaxSimilarSearchResults)
}
if cfg.CompletionMaxTokens != 1024 {
t.Errorf("expected max tokens 1024, got %d", cfg.CompletionMaxTokens)
if cfg.CompletionMaxTokens != 4096 {
t.Errorf("expected max tokens 4096, got %d", cfg.CompletionMaxTokens)
}
if cfg.CompletionTemperature != 0.3 {
t.Errorf("expected temperature 0.3, got %v", cfg.CompletionTemperature)
Expand Down
Loading