Skip to content

feat(memory): add per-model embedding tables#2010

Merged
senamakel merged 1 commit into
tinyhumansai:mainfrom
honor2030:feat/embedding-storage-model-signatures
May 18, 2026
Merged

feat(memory): add per-model embedding tables#2010
senamakel merged 1 commit into
tinyhumansai:mainfrom
honor2030:feat/embedding-storage-model-signatures

Conversation

@honor2030
Copy link
Copy Markdown
Contributor

@honor2030 honor2030 commented May 17, 2026

Summary

  • Add per-model embedding storage tables for memory-tree chunks and summaries.
  • Add per-model embedding storage tables for unified memory events and conversation segments.
  • Add scoped read/write helpers and round-trip tests so embeddings from different provider/model signatures can coexist without touching the legacy single-embedding columns.

This is intended as a small Stage 1 slice for #1574: schema + helper support only. Existing query paths still use the legacy columns until a follow-up PR wires dual-write/query-time model-signature filtering.

Verification

  • rustup run 1.93.0 cargo fmt --all
  • RUSTC=/Users/lee/.rustup/toolchains/1.93.0-aarch64-apple-darwin/bin/rustc GGML_NATIVE=OFF rustup run 1.93.0 cargo test --manifest-path Cargo.toml embeddings_are_scoped_by_model_signature --lib -- --nocapture
    • 4 passed, 0 failed
  • RUSTC=/Users/lee/.rustup/toolchains/1.93.0-aarch64-apple-darwin/bin/rustc GGML_NATIVE=OFF CARGO_TARGET_DIR=/Users/lee/Documents/Claude/Projects/10\ Work\ OS/projects/openhuman/target rustup run 1.93.0 cargo check --manifest-path Cargo.toml
    • finished successfully with existing warnings only

Note: an initial cargo check using the worktree-local /tmp/.../target failed with No space left on device; rerunning with the already-used local OpenHuman target dir completed successfully.

Fixes part of #1574.

Summary by CodeRabbit

  • New Features

    • Added support for storing and retrieving embeddings across multiple models for events, segments, and memory chunks. Each embedding is now independently managed per model signature while maintaining backward compatibility with existing storage.
  • Tests

    • Added validation tests confirming embeddings are properly isolated and scoped by their respective model signatures.

Review Change Stack

@honor2030 honor2030 requested a review from a team May 17, 2026 15:16
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 84123049-f647-4d5e-b5b6-d3dd6d5d81d6

📥 Commits

Reviewing files that changed from the base of the PR and between f9de38d and d278e2d.

📒 Files selected for processing (8)
  • src/openhuman/memory/store/unified/events.rs
  • src/openhuman/memory/store/unified/events_tests.rs
  • src/openhuman/memory/store/unified/segments.rs
  • src/openhuman/memory/store/unified/segments_tests.rs
  • src/openhuman/memory/tree/store.rs
  • src/openhuman/memory/tree/store_tests.rs
  • src/openhuman/memory/tree/tree_source/store.rs
  • src/openhuman/memory/tree/tree_source/store_tests.rs

📝 Walkthrough

Walkthrough

This PR implements per-provider/model embedding storage across the memory system by introducing signature-keyed embedding tables in the unified event/segment stores and tree-based chunk/summary stores. Each storage type gains a new embedding table, upsert/fetch APIs, and tests validating isolation by model signature.

Changes

Per-provider/model embedding storage

Layer / File(s) Summary
Unified event embeddings
src/openhuman/memory/store/unified/events.rs, src/openhuman/memory/store/unified/events_tests.rs
New event_embeddings table keyed by (event_id, model_signature); event_embedding_upsert and event_embedding_get APIs persist and retrieve embeddings with blob validation; tests confirm signature-scoped isolation and None for missing signatures.
Unified segment embeddings
src/openhuman/memory/store/unified/segments.rs, src/openhuman/memory/store/unified/segments_tests.rs
New segment_embeddings table keyed by (segment_id, model_signature); segment_embedding_upsert and segment_embedding_get APIs with dimension inference and validation; tests confirm signature-scoped storage and independence from legacy embedding field.
Tree chunk embeddings
src/openhuman/memory/tree/store.rs, src/openhuman/memory/tree/store_tests.rs
New mem_tree_chunk_embeddings table keyed by (chunk_id, model_signature); set_chunk_embedding_for_signature and get_chunk_embedding_for_signature APIs with little-endian encoding/decoding; tests verify signature-scoped retrieval and generic accessor returns None.
Tree summary embeddings
src/openhuman/memory/tree/tree_source/store.rs, src/openhuman/memory/tree/tree_source/store_tests.rs
New mem_tree_summary_embeddings table; set_summary_embedding_for_signature and get_summary_embedding_for_signature APIs with blob packing/decoding and dimension validation; tests confirm signature scoping and non-signature accessor returns None.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • tinyhumansai/openhuman#1901: Adds EmbeddingProvider::model_id() and signature generation that directly produce the model_signature strings used as keys in these new per-provider/model embedding tables.

Suggested reviewers

  • senamakel

Poem

🐰 Four tables hop in line today,
Each signature marks the embedding's way,
From events to summaries, chunks to all,
Per-model embeddings answer the call! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(memory): add per-model embedding tables' accurately describes the main change—adding new per-model embedding storage tables across multiple modules.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@senamakel senamakel merged commit 3b7c4cd into tinyhumansai:main May 18, 2026
25 checks passed
AusAgentSmith pushed a commit to AusAgentSmith/openhuman that referenced this pull request May 23, 2026
Co-authored-by: honor2030 <19909783+honor2030@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants