Skip to content

Continuous learning: encoding model that improves from operational experience #391

@CalebisGross

Description

@CalebisGross

Vision

The encoding model should improve from operational experience. Every encoding it produces, every recall that succeeds or fails, every feedback signal — all of this is training data that currently gets discarded. The model that encodes memory #10,000 should be measurably better than the one that encoded memory #1.

Full design document: docs/DESIGN_continuous_learning.md

Architecture

Four tiers that build on each other:

Tier 1 — Experience Collection (always running, zero cost)

  • Verification gate scores every encoding (EPR, FR, TE) at write time
  • Feedback from recall sessions links back to the encodings
  • Experience buffer accumulates gold + bad encoding pairs

Tier 2 — Curriculum Generation (nightly, during dreaming)

  • Bad encodings get re-encoded by Gemini to produce corrected versions
  • Creates (input, bad_local, good_gemini) tuples — DPO training data
  • Hard example mining focuses on the model's weakest areas

Tier 3 — Model Update (triggered when ≥50 new training pairs)

  • Short spoke fine-tuning (SFT, later DPO) on new experience
  • 30% replay from base dataset prevents catastrophic forgetting
  • Quality gate: must pass EXP-25 probes before deployment
  • Atomic spoke deployment with rollback

Tier 4 — Self-Assessment (metacognition-driven)

  • Rolling quality window detects encoding drift
  • Domain shift detection flags new patterns
  • Autonomous training trigger when quality degrades

What Makes This Special

The feedback signal is downstream and delayed — an encoding is "good" not because it looks right at write time, but because it leads to useful recalls days or weeks later. The model learns what makes memories findable and useful, not just structurally correct.

Key Dependencies

  • Verification gate (designed in EXP-29, not yet implemented)
  • EXP-25 probe evaluation framework (exists)
  • Spoke training pipeline (exists)
  • Dreaming agent as training orchestrator (exists, needs extension)
  • Gemini API for corrective re-encoding (available)

Implementation Roadmap

Phase Work Estimate
1. Experience Collection DB migration, verification gate, experience buffer 2-3 days
2. Recall-Encoding Linkage recall_feedback table, feedback processing 1-2 days
3. Curriculum Generation Dreaming phase extension, Gemini re-encoding 2-3 days
4. Automated Training Spoke training integration, quality gate, deployment 3-5 days
5. Metacognition Integration Quality tracking, domain shift, training triggers 2-3 days
6. DPO (stretch) Preference optimization from corrective pairs 2-3 days

Total: ~2-3 weeks

Success Criteria

  • 1 month: ≥500 experience buffer entries, ≥2 training cycles, EPR +5pp
  • 3 months: ≥10 training cycles, handles new domains without quality drop
  • 6 months: model adapted to user's patterns, exceeds Gemini on user's actual inputs

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority:highImportant, fix soonresearchML research experimentstrainingModel training, data, and evaluation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions