Vision
The encoding model should improve from operational experience. Every encoding it produces, every recall that succeeds or fails, every feedback signal — all of this is training data that currently gets discarded. The model that encodes memory #10,000 should be measurably better than the one that encoded memory #1.
Full design document: docs/DESIGN_continuous_learning.md
Architecture
Four tiers that build on each other:
Tier 1 — Experience Collection (always running, zero cost)
- Verification gate scores every encoding (EPR, FR, TE) at write time
- Feedback from recall sessions links back to the encodings
- Experience buffer accumulates gold + bad encoding pairs
Tier 2 — Curriculum Generation (nightly, during dreaming)
- Bad encodings get re-encoded by Gemini to produce corrected versions
- Creates (input, bad_local, good_gemini) tuples — DPO training data
- Hard example mining focuses on the model's weakest areas
Tier 3 — Model Update (triggered when ≥50 new training pairs)
- Short spoke fine-tuning (SFT, later DPO) on new experience
- 30% replay from base dataset prevents catastrophic forgetting
- Quality gate: must pass EXP-25 probes before deployment
- Atomic spoke deployment with rollback
Tier 4 — Self-Assessment (metacognition-driven)
- Rolling quality window detects encoding drift
- Domain shift detection flags new patterns
- Autonomous training trigger when quality degrades
What Makes This Special
The feedback signal is downstream and delayed — an encoding is "good" not because it looks right at write time, but because it leads to useful recalls days or weeks later. The model learns what makes memories findable and useful, not just structurally correct.
Key Dependencies
- Verification gate (designed in EXP-29, not yet implemented)
- EXP-25 probe evaluation framework (exists)
- Spoke training pipeline (exists)
- Dreaming agent as training orchestrator (exists, needs extension)
- Gemini API for corrective re-encoding (available)
Implementation Roadmap
| Phase |
Work |
Estimate |
| 1. Experience Collection |
DB migration, verification gate, experience buffer |
2-3 days |
| 2. Recall-Encoding Linkage |
recall_feedback table, feedback processing |
1-2 days |
| 3. Curriculum Generation |
Dreaming phase extension, Gemini re-encoding |
2-3 days |
| 4. Automated Training |
Spoke training integration, quality gate, deployment |
3-5 days |
| 5. Metacognition Integration |
Quality tracking, domain shift, training triggers |
2-3 days |
| 6. DPO (stretch) |
Preference optimization from corrective pairs |
2-3 days |
Total: ~2-3 weeks
Success Criteria
- 1 month: ≥500 experience buffer entries, ≥2 training cycles, EPR +5pp
- 3 months: ≥10 training cycles, handles new domains without quality drop
- 6 months: model adapted to user's patterns, exceeds Gemini on user's actual inputs
Vision
The encoding model should improve from operational experience. Every encoding it produces, every recall that succeeds or fails, every feedback signal — all of this is training data that currently gets discarded. The model that encodes memory #10,000 should be measurably better than the one that encoded memory #1.
Full design document:
docs/DESIGN_continuous_learning.mdArchitecture
Four tiers that build on each other:
Tier 1 — Experience Collection (always running, zero cost)
Tier 2 — Curriculum Generation (nightly, during dreaming)
Tier 3 — Model Update (triggered when ≥50 new training pairs)
Tier 4 — Self-Assessment (metacognition-driven)
What Makes This Special
The feedback signal is downstream and delayed — an encoding is "good" not because it looks right at write time, but because it leads to useful recalls days or weeks later. The model learns what makes memories findable and useful, not just structurally correct.
Key Dependencies
Implementation Roadmap
Total: ~2-3 weeks
Success Criteria