feat: continuous learning Phases B+C — curriculum generation & training trigger (#391) by CalebisGross · Pull Request #406 · AppSprout-dev/mnemonic

CalebisGross · 2026-04-14T03:06:30Z

Summary

Phase B — Curriculum Generation: During dreaming, re-encodes needs_improvement memories via the teacher model (Gemini API), producing corrected outputs that become training pairs for the local spoke model. Gated by config, minimum entry threshold, and cooldown timer.
Phase C — Training Trigger & Orchestration: When enough untrained experience accumulates (default: 50 entries), assembles JSONL training batches, runs spoke fine-tuning via Python subprocess, evaluates against quality gates (EPR >= 0.90, FR <= 0.05, SC >= 0.95), and deploys new spokes atomically with rollback. Includes train_model MCP tool (Move evolution examples out of runtime directory #25) for manual trigger.
Training data assembly: JSONL writer that combines gold and corrected encoding pairs with 70/30 experience/replay split
26 new tests across curriculum, training data, and trigger logic
Context overhead reduction (~3,350 tokens saved per session startup)

Test plan

make build passes
make test passes (0 failures across full suite)
10 curriculum generation tests (disabled, cooldown, teacher errors, EPR rejection, multi-entry, context cancel)
8 training data assembly tests (gold-only, corrective-only, mixed, empty buffer, manifest, default max, missing raw)
8 training trigger tests (disabled, auto-trigger off, insufficient data, assembles+records, window parsing, eval output parsing, quality gate pass/fail)
MCP tools list test updated (24 → 25 tools)
End-to-end training cycle with real Python env (requires GPU + training venv)

🤖 Generated with Claude Code

Adds curriculum generation to the dreaming agent (Phase 4.75). When enabled, the dreaming cycle re-encodes needs_improvement memories via the teacher model (API provider), producing corrected outputs that become training pairs for the local spoke model. New infrastructure: - CLCurriculumConfig with enable flag, cooldown, and batch limits - Migration 017: corrected_output columns on experience_buffer + curriculum_runs tracking table - 5 new ContinuousLearningStore methods (ListNeedsImprovement, UpdateExperienceCorrectedOutput, curriculum run CRUD) - Export BuildCompressionPrompt for cross-package prompt reuse - 5 store tests covering correction lifecycle, dedup, and limits The pipeline: reclassify experience buffer → fetch worst entries → rebuild identical encoding prompt → call teacher model → validate response (JSON + required fields + EPR > 0.7) → store correction. Gated by config flag, minimum entry threshold, and cooldown timer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

AssembleTrainingBatch exports gold and corrected encoding pairs as JSONL for spoke fine-tuning. Splits 70/30: 70% from experience buffer (gold + corrective pairs), 30% reserved for replay mixing by the Python training script. Each example includes the full encoding prompt and target output for direct tokenization. Writes batch_{id}.jsonl + batch_{id}_manifest.json with provenance. Called by Phase C (automated training trigger) or via MCP tool. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge scientific-method.md, experiment-logging.md, and peer-review-standard.md into a single research-standards.md. Remove MCP tools table from CLAUDE.md (redundant with MCP server schemas). Deduplicate conventions and platform sections that were repeated across rules files and CLAUDE.md. Tighten git-safety.md and code-quality.md to remove rules already enforced by Claude Code system prompt and git hooks. Saves ~3,350 tokens per session startup (~13.4KB). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…391) Adds the automated spoke training pipeline (Phase C) to the continuous learning system. When enough experience data accumulates in the buffer, the daemon can assemble training batches, run spoke fine-tuning via Python subprocess, evaluate against quality gates, and deploy new spokes. New infrastructure: - TrainingRun type and 5 new ContinuousLearningStore methods (WriteTrainingRun, UpdateTrainingRun, GetLastTrainingRunTime, CountUntrainedExperience, MarkExperienceUsedInTraining) - Migration 018: training_runs table for audit trail - Training orchestrator in dreaming agent (Phase 4.85 in dream cycle) with subprocess execution, quality gate (EPR >= 0.90, FR <= 0.05, SC >= 0.95), and atomic deployment with rollback - train_model MCP tool (#25) for manual training trigger - 26 new tests across curriculum, training data, and trigger logic The pipeline: check untrained count >= threshold → assemble JSONL batch → run train_spokes.py subprocess → evaluate via eval_encoding.py → deploy via deploy_model.sh if quality passes → record result. Gated by config flags, training window, and minimum data threshold. Also includes minor fixes from other agents: episoding debug logging, embedded LLM grammar improvements. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Changes TrainingExample from prompt/output to raw_input/encoded/task_type format matching prepare_gemma_finetune_data.py input. Adds a tokenization step (prepareTrainingData) before training, and updates runSpokeTraining args to match current train_spokes.py CLI for Gemma. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Specifies the refactor from inline subprocess training to systemd- orchestrated training. Daemon writes a request file; systemd path unit triggers a separate service that stops the daemon, trains, and restarts. Eliminates VRAM contention that was crashing the system. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The daemon was running Python training subprocesses while holding VRAM for the embedded llama.cpp model, causing OOM crashes. Refactored to match the spec's hybrid orchestration design: - RunTrainingCycle now assembles data and writes pending.json, no subprocess calls. Returns "training_requested" status. - New continuous_train.sh stops the daemon (freeing VRAM), runs tokenization/training/eval/deploy, always restarts the daemon. - New systemd units: mnemonic-train.path watches for pending.json, mnemonic-train.service runs the training script with 30min timeout. - Daemon picks up result.json on startup to close the feedback loop. - MCP train_model tool now returns async (request_id, status). - Duplicate request prevention (skips if pending.json already exists). Tested: unit tests (9/9), full suite (0 failures), manual systemd path trigger with fake request — daemon stopped, script ran, failed correctly on missing batch, daemon restarted, result picked up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Pipeline is built, tested, and safe (systemd-orchestrated). No reason to keep it gated behind a flag. Auto-trigger runs during 02:00-06:00 training window, curriculum generation runs during dreaming hours. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CalebisGross and others added 8 commits April 13, 2026 18:11

CalebisGross merged commit 28860cf into main Apr 14, 2026

CalebisGross deleted the feat/continuous-learning-phase-b branch April 14, 2026 04:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: continuous learning Phases B+C — curriculum generation & training trigger (#391)#406

feat: continuous learning Phases B+C — curriculum generation & training trigger (#391)#406
CalebisGross merged 8 commits intomainfrom
feat/continuous-learning-phase-b

CalebisGross commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CalebisGross commented Apr 14, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant