Skip to content

fix: retrieval_feedback table stores bloated traversal data (300-600KB per row) #333

@CalebisGross

Description

@CalebisGross

Problem

The retrieval_feedback table stores full graph traversal snapshots in the traversed_assocs JSON column. Each record is 300-600KB, making 176 rows consume ~22MB of the 116MB database (19% of total DB size for 0.5% of rows).

Current behavior

When a retrieval query runs, WriteRetrievalFeedback stores:

  • traversed_assocs: Full list of every association edge visited during spread activation (300-600KB JSON)
  • access_snapshot: Memory access state at query time (up to 800B)
  • retrieved_memory_ids: IDs of returned memories (~400B)

This data is read back by HandleFeedback in feedback.go to apply Hebbian learning -- strengthening/weakening the specific association edges based on the quality rating.

Why it's a problem

  • 176 feedback records = ~22MB. At scale this will dominate DB size.
  • The traversal data is only needed until feedback is applied. After HandleFeedback adjusts the association strengths, the raw traversal is never read again.
  • Most records (128/176) have empty feedback -- they stored the traversal but were never rated, so the data was written for nothing.

Proposed fix

Two options (not mutually exclusive):

Option A: Prune after feedback is applied

  • After HandleFeedback processes a record, null out traversed_assocs and access_snapshot
  • Keeps the query text and rating for analytics but drops the bulk

Option B: Only store traversals for rated queries

  • Don't write the feedback record during retrieval
  • Instead, hold traversal data in memory (keyed by query_id) with a TTL (e.g., 30 min)
  • Only persist to DB when HandleFeedback is called and needs to apply adjustments
  • Most queries are never rated, so most traversals never hit disk

Option C: Store only edge IDs, not full objects

  • traversed_assocs currently stores full TraversedAssoc structs
  • Could store just (source_id, target_id) pairs -- that's all HandleFeedback needs
  • Would reduce each record from 300-600KB to a few KB

Data from pre-nuke DB (2026-03-21)

Total feedback records: 176
Rated (helpful/partial/irrelevant): 48
Unrated (empty): 128
Min traversed_assocs size: unknown (need to re-measure)
Max traversed_assocs size: ~634KB
Total retrieval_feedback size: ~22MB of 116MB DB

Backup at ~/.mnemonic/memory.db.backup-2026-03-21

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions