Skip to content

feat: continuous learning Phases B+C — curriculum generation & training trigger (#391)#406

Merged
CalebisGross merged 8 commits intomainfrom
feat/continuous-learning-phase-b
Apr 14, 2026
Merged

feat: continuous learning Phases B+C — curriculum generation & training trigger (#391)#406
CalebisGross merged 8 commits intomainfrom
feat/continuous-learning-phase-b

Conversation

@CalebisGross
Copy link
Copy Markdown
Collaborator

Summary

  • Phase B — Curriculum Generation: During dreaming, re-encodes needs_improvement memories via the teacher model (Gemini API), producing corrected outputs that become training pairs for the local spoke model. Gated by config, minimum entry threshold, and cooldown timer.
  • Phase C — Training Trigger & Orchestration: When enough untrained experience accumulates (default: 50 entries), assembles JSONL training batches, runs spoke fine-tuning via Python subprocess, evaluates against quality gates (EPR >= 0.90, FR <= 0.05, SC >= 0.95), and deploys new spokes atomically with rollback. Includes train_model MCP tool (Move evolution examples out of runtime directory #25) for manual trigger.
  • Training data assembly: JSONL writer that combines gold and corrected encoding pairs with 70/30 experience/replay split
  • 26 new tests across curriculum, training data, and trigger logic
  • Context overhead reduction (~3,350 tokens saved per session startup)

Test plan

  • make build passes
  • make test passes (0 failures across full suite)
  • 10 curriculum generation tests (disabled, cooldown, teacher errors, EPR rejection, multi-entry, context cancel)
  • 8 training data assembly tests (gold-only, corrective-only, mixed, empty buffer, manifest, default max, missing raw)
  • 8 training trigger tests (disabled, auto-trigger off, insufficient data, assembles+records, window parsing, eval output parsing, quality gate pass/fail)
  • MCP tools list test updated (24 → 25 tools)
  • End-to-end training cycle with real Python env (requires GPU + training venv)

🤖 Generated with Claude Code

CalebisGross and others added 8 commits April 13, 2026 18:11
Adds curriculum generation to the dreaming agent (Phase 4.75). When
enabled, the dreaming cycle re-encodes needs_improvement memories via
the teacher model (API provider), producing corrected outputs that
become training pairs for the local spoke model.

New infrastructure:
- CLCurriculumConfig with enable flag, cooldown, and batch limits
- Migration 017: corrected_output columns on experience_buffer +
  curriculum_runs tracking table
- 5 new ContinuousLearningStore methods (ListNeedsImprovement,
  UpdateExperienceCorrectedOutput, curriculum run CRUD)
- Export BuildCompressionPrompt for cross-package prompt reuse
- 5 store tests covering correction lifecycle, dedup, and limits

The pipeline: reclassify experience buffer → fetch worst entries →
rebuild identical encoding prompt → call teacher model → validate
response (JSON + required fields + EPR > 0.7) → store correction.
Gated by config flag, minimum entry threshold, and cooldown timer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AssembleTrainingBatch exports gold and corrected encoding pairs as
JSONL for spoke fine-tuning. Splits 70/30: 70% from experience buffer
(gold + corrective pairs), 30% reserved for replay mixing by the
Python training script. Each example includes the full encoding prompt
and target output for direct tokenization.

Writes batch_{id}.jsonl + batch_{id}_manifest.json with provenance.
Called by Phase C (automated training trigger) or via MCP tool.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge scientific-method.md, experiment-logging.md, and
peer-review-standard.md into a single research-standards.md.
Remove MCP tools table from CLAUDE.md (redundant with MCP server
schemas). Deduplicate conventions and platform sections that were
repeated across rules files and CLAUDE.md. Tighten git-safety.md
and code-quality.md to remove rules already enforced by Claude Code
system prompt and git hooks.

Saves ~3,350 tokens per session startup (~13.4KB).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…391)

Adds the automated spoke training pipeline (Phase C) to the continuous
learning system. When enough experience data accumulates in the buffer,
the daemon can assemble training batches, run spoke fine-tuning via
Python subprocess, evaluate against quality gates, and deploy new spokes.

New infrastructure:
- TrainingRun type and 5 new ContinuousLearningStore methods
  (WriteTrainingRun, UpdateTrainingRun, GetLastTrainingRunTime,
  CountUntrainedExperience, MarkExperienceUsedInTraining)
- Migration 018: training_runs table for audit trail
- Training orchestrator in dreaming agent (Phase 4.85 in dream cycle)
  with subprocess execution, quality gate (EPR >= 0.90, FR <= 0.05,
  SC >= 0.95), and atomic deployment with rollback
- train_model MCP tool (#25) for manual training trigger
- 26 new tests across curriculum, training data, and trigger logic

The pipeline: check untrained count >= threshold → assemble JSONL batch
→ run train_spokes.py subprocess → evaluate via eval_encoding.py →
deploy via deploy_model.sh if quality passes → record result. Gated by
config flags, training window, and minimum data threshold.

Also includes minor fixes from other agents: episoding debug logging,
embedded LLM grammar improvements.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Changes TrainingExample from prompt/output to raw_input/encoded/task_type
format matching prepare_gemma_finetune_data.py input. Adds a tokenization
step (prepareTrainingData) before training, and updates runSpokeTraining
args to match current train_spokes.py CLI for Gemma.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Specifies the refactor from inline subprocess training to systemd-
orchestrated training. Daemon writes a request file; systemd path unit
triggers a separate service that stops the daemon, trains, and restarts.
Eliminates VRAM contention that was crashing the system.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The daemon was running Python training subprocesses while holding VRAM
for the embedded llama.cpp model, causing OOM crashes. Refactored to
match the spec's hybrid orchestration design:

- RunTrainingCycle now assembles data and writes pending.json, no
  subprocess calls. Returns "training_requested" status.
- New continuous_train.sh stops the daemon (freeing VRAM), runs
  tokenization/training/eval/deploy, always restarts the daemon.
- New systemd units: mnemonic-train.path watches for pending.json,
  mnemonic-train.service runs the training script with 30min timeout.
- Daemon picks up result.json on startup to close the feedback loop.
- MCP train_model tool now returns async (request_id, status).
- Duplicate request prevention (skips if pending.json already exists).

Tested: unit tests (9/9), full suite (0 failures), manual systemd
path trigger with fake request — daemon stopped, script ran, failed
correctly on missing batch, daemon restarted, result picked up.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pipeline is built, tested, and safe (systemd-orchestrated). No reason
to keep it gated behind a flag. Auto-trigger runs during 02:00-06:00
training window, curriculum generation runs during dreaming hours.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@CalebisGross CalebisGross merged commit 28860cf into main Apr 14, 2026
@CalebisGross CalebisGross deleted the feat/continuous-learning-phase-b branch April 14, 2026 04:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant