Skip to content

feat: expose embedding provider model signatures#1901

Merged
senamakel merged 1 commit into
tinyhumansai:mainfrom
honor2030:feat/memory-embedding-model-signatures
May 17, 2026
Merged

feat: expose embedding provider model signatures#1901
senamakel merged 1 commit into
tinyhumansai:mainfrom
honor2030:feat/memory-embedding-model-signatures

Conversation

@honor2030
Copy link
Copy Markdown
Contributor

@honor2030 honor2030 commented May 16, 2026

Summary

  • add EmbeddingProvider::model_id() so each embedder can expose the configured embedding model
  • add a default EmbeddingProvider::signature() derived from provider, model, and dimensions
  • cover cloud, Ollama, OpenAI, noop, factory, and store test fake providers

Why this slice

This is the first small, reviewable step toward #1574. It creates a stable embedding-space identity that later storage/migration PRs can attach to rows without changing vector storage behavior in this PR.

Non-goals

  • no schema migration
  • no per-row storage changes
  • no retrieval filtering changes yet

Test Plan

  • cargo fmt --all
  • RUSTC=/Users/lee/.rustup/toolchains/1.93.0-aarch64-apple-darwin/bin/rustc GGML_NATIVE=OFF cargo test -p openhuman --lib openhuman::embeddings -- --nocapture (109 passed)
  • RUSTC=/Users/lee/.rustup/toolchains/1.93.0-aarch64-apple-darwin/bin/rustc GGML_NATIVE=OFF cargo check -p openhuman
  • git diff --check

Refs #1574

Summary by CodeRabbit

  • New Features
    • Embedding providers now explicitly expose their configured model identifier.
    • Added signature generation to identify and verify embedding configuration (provider, model, and dimensions), enabling compatibility checks with previously generated embeddings.

Review Change Stack

@honor2030 honor2030 requested a review from a team May 16, 2026 07:49
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 16, 2026

📝 Walkthrough

Walkthrough

EmbeddingProvider trait now requires a model_id() method and provides a default signature() implementation that combines provider name, model identifier, and dimensions. All embedding providers (Cloud, Noop, Ollama, OpenAI), test fakes, and corresponding tests are updated to implement and verify this new contract.

Changes

EmbeddingProvider model_id expansion

Layer / File(s) Summary
EmbeddingProvider trait contract
src/openhuman/embeddings/provider_trait.rs
EmbeddingProvider adds required model_id() method and a stable default signature() implementation that formats provider, model identifier, and dimension count.
Production provider implementations
src/openhuman/embeddings/cloud.rs, src/openhuman/embeddings/noop.rs, src/openhuman/embeddings/ollama.rs, src/openhuman/embeddings/openai.rs
Cloud, Noop, Ollama, and OpenAI embedding providers each implement model_id() to return their configured or constant model identifier.
Test fake implementations
src/openhuman/embeddings/store_tests.rs
Test fakes FakeEmbedding and MismatchEmbedding implement model_id() returning "fake" and "mismatch" respectively.
Test assertions for model_id and signature
src/openhuman/embeddings/ollama_tests.rs, src/openhuman/embeddings/openai_tests.rs, src/openhuman/embeddings/mod.rs, src/openhuman/embeddings/cloud.rs
All accessor and factory tests are updated to assert model_id() values and verify signature() produces the expected formatted string with provider, model, and dimensions.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A model needs its name so clear,
The signature rings loud and near,
Each provider now proclaims with pride,
Its identity, dimensions wide—
From cloud to noop, the trait's complete!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: expose embedding provider model signatures' directly and accurately describes the main change—adding a model_id() method and signature() default implementation to the EmbeddingProvider trait across all embedding providers.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/openhuman/embeddings/mod.rs (1)

79-79: ⚡ Quick win

Avoid hardcoding the default model name in the signature assertion.

This assertion duplicates the model literal ("bge-m3") instead of reusing DEFAULT_OLLAMA_MODEL, which makes the test brittle when defaults change.

♻️ Proposed fix
-        assert_eq!(p.signature(), "provider=ollama;model=bge-m3;dims=768");
+        assert_eq!(
+            p.signature(),
+            format!("provider=ollama;model={DEFAULT_OLLAMA_MODEL};dims=768")
+        );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/embeddings/mod.rs` at line 79, The test currently hardcodes the
default Ollama model in the signature assertion; update the assertion to use the
constant DEFAULT_OLLAMA_MODEL instead of the literal "bge-m3" so it
automatically reflects changes to the default. Locate the assertion comparing
p.signature() and replace the hardcoded model fragment with a constructed
expected string that interpolates DEFAULT_OLLAMA_MODEL (keeping
"provider=ollama" and "dims=768" as before) so the test remains correct when
DEFAULT_OLLAMA_MODEL changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/embeddings/mod.rs`:
- Line 79: The test currently hardcodes the default Ollama model in the
signature assertion; update the assertion to use the constant
DEFAULT_OLLAMA_MODEL instead of the literal "bge-m3" so it automatically
reflects changes to the default. Locate the assertion comparing p.signature()
and replace the hardcoded model fragment with a constructed expected string that
interpolates DEFAULT_OLLAMA_MODEL (keeping "provider=ollama" and "dims=768" as
before) so the test remains correct when DEFAULT_OLLAMA_MODEL changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0948dff6-bc4e-4130-b092-4b68bb84facd

📥 Commits

Reviewing files that changed from the base of the PR and between bd5c2ed and c5d5079.

📒 Files selected for processing (9)
  • src/openhuman/embeddings/cloud.rs
  • src/openhuman/embeddings/mod.rs
  • src/openhuman/embeddings/noop.rs
  • src/openhuman/embeddings/ollama.rs
  • src/openhuman/embeddings/ollama_tests.rs
  • src/openhuman/embeddings/openai.rs
  • src/openhuman/embeddings/openai_tests.rs
  • src/openhuman/embeddings/provider_trait.rs
  • src/openhuman/embeddings/store_tests.rs

@senamakel senamakel merged commit c87be66 into tinyhumansai:main May 17, 2026
24 of 25 checks passed
AusAgentSmith pushed a commit to AusAgentSmith/openhuman that referenced this pull request May 23, 2026
Co-authored-by: honor2030 <19909783+honor2030@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants