feat: continuous learning Phases B+C — curriculum generation & training trigger (#391)#406
Merged
CalebisGross merged 8 commits intomainfrom Apr 14, 2026
Merged
Conversation
Adds curriculum generation to the dreaming agent (Phase 4.75). When enabled, the dreaming cycle re-encodes needs_improvement memories via the teacher model (API provider), producing corrected outputs that become training pairs for the local spoke model. New infrastructure: - CLCurriculumConfig with enable flag, cooldown, and batch limits - Migration 017: corrected_output columns on experience_buffer + curriculum_runs tracking table - 5 new ContinuousLearningStore methods (ListNeedsImprovement, UpdateExperienceCorrectedOutput, curriculum run CRUD) - Export BuildCompressionPrompt for cross-package prompt reuse - 5 store tests covering correction lifecycle, dedup, and limits The pipeline: reclassify experience buffer → fetch worst entries → rebuild identical encoding prompt → call teacher model → validate response (JSON + required fields + EPR > 0.7) → store correction. Gated by config flag, minimum entry threshold, and cooldown timer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AssembleTrainingBatch exports gold and corrected encoding pairs as
JSONL for spoke fine-tuning. Splits 70/30: 70% from experience buffer
(gold + corrective pairs), 30% reserved for replay mixing by the
Python training script. Each example includes the full encoding prompt
and target output for direct tokenization.
Writes batch_{id}.jsonl + batch_{id}_manifest.json with provenance.
Called by Phase C (automated training trigger) or via MCP tool.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge scientific-method.md, experiment-logging.md, and peer-review-standard.md into a single research-standards.md. Remove MCP tools table from CLAUDE.md (redundant with MCP server schemas). Deduplicate conventions and platform sections that were repeated across rules files and CLAUDE.md. Tighten git-safety.md and code-quality.md to remove rules already enforced by Claude Code system prompt and git hooks. Saves ~3,350 tokens per session startup (~13.4KB). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…391) Adds the automated spoke training pipeline (Phase C) to the continuous learning system. When enough experience data accumulates in the buffer, the daemon can assemble training batches, run spoke fine-tuning via Python subprocess, evaluate against quality gates, and deploy new spokes. New infrastructure: - TrainingRun type and 5 new ContinuousLearningStore methods (WriteTrainingRun, UpdateTrainingRun, GetLastTrainingRunTime, CountUntrainedExperience, MarkExperienceUsedInTraining) - Migration 018: training_runs table for audit trail - Training orchestrator in dreaming agent (Phase 4.85 in dream cycle) with subprocess execution, quality gate (EPR >= 0.90, FR <= 0.05, SC >= 0.95), and atomic deployment with rollback - train_model MCP tool (#25) for manual training trigger - 26 new tests across curriculum, training data, and trigger logic The pipeline: check untrained count >= threshold → assemble JSONL batch → run train_spokes.py subprocess → evaluate via eval_encoding.py → deploy via deploy_model.sh if quality passes → record result. Gated by config flags, training window, and minimum data threshold. Also includes minor fixes from other agents: episoding debug logging, embedded LLM grammar improvements. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Changes TrainingExample from prompt/output to raw_input/encoded/task_type format matching prepare_gemma_finetune_data.py input. Adds a tokenization step (prepareTrainingData) before training, and updates runSpokeTraining args to match current train_spokes.py CLI for Gemma. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Specifies the refactor from inline subprocess training to systemd- orchestrated training. Daemon writes a request file; systemd path unit triggers a separate service that stops the daemon, trains, and restarts. Eliminates VRAM contention that was crashing the system. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The daemon was running Python training subprocesses while holding VRAM for the embedded llama.cpp model, causing OOM crashes. Refactored to match the spec's hybrid orchestration design: - RunTrainingCycle now assembles data and writes pending.json, no subprocess calls. Returns "training_requested" status. - New continuous_train.sh stops the daemon (freeing VRAM), runs tokenization/training/eval/deploy, always restarts the daemon. - New systemd units: mnemonic-train.path watches for pending.json, mnemonic-train.service runs the training script with 30min timeout. - Daemon picks up result.json on startup to close the feedback loop. - MCP train_model tool now returns async (request_id, status). - Duplicate request prevention (skips if pending.json already exists). Tested: unit tests (9/9), full suite (0 failures), manual systemd path trigger with fake request — daemon stopped, script ran, failed correctly on missing batch, daemon restarted, result picked up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pipeline is built, tested, and safe (systemd-orchestrated). No reason to keep it gated behind a flag. Auto-trigger runs during 02:00-06:00 training window, curriculum generation runs during dreaming hours. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
train_modelMCP tool (Move evolution examples out of runtime directory #25) for manual trigger.Test plan
make buildpassesmake testpasses (0 failures across full suite)🤖 Generated with Claude Code