feat: expose embedding provider model signatures by honor2030 · Pull Request #1901 · tinyhumansai/openhuman

honor2030 · 2026-05-16T07:49:39Z

Summary

add EmbeddingProvider::model_id() so each embedder can expose the configured embedding model
add a default EmbeddingProvider::signature() derived from provider, model, and dimensions
cover cloud, Ollama, OpenAI, noop, factory, and store test fake providers

Why this slice

This is the first small, reviewable step toward #1574. It creates a stable embedding-space identity that later storage/migration PRs can attach to rows without changing vector storage behavior in this PR.

Non-goals

no schema migration
no per-row storage changes
no retrieval filtering changes yet

Test Plan

cargo fmt --all
RUSTC=/Users/lee/.rustup/toolchains/1.93.0-aarch64-apple-darwin/bin/rustc GGML_NATIVE=OFF cargo test -p openhuman --lib openhuman::embeddings -- --nocapture (109 passed)
RUSTC=/Users/lee/.rustup/toolchains/1.93.0-aarch64-apple-darwin/bin/rustc GGML_NATIVE=OFF cargo check -p openhuman
git diff --check

Refs #1574

Summary by CodeRabbit

New Features
- Embedding providers now explicitly expose their configured model identifier.
- Added signature generation to identify and verify embedding configuration (provider, model, and dimensions), enabling compatibility checks with previously generated embeddings.

coderabbitai · 2026-05-16T07:49:52Z

📝 Walkthrough

Walkthrough

EmbeddingProvider trait now requires a model_id() method and provides a default signature() implementation that combines provider name, model identifier, and dimensions. All embedding providers (Cloud, Noop, Ollama, OpenAI), test fakes, and corresponding tests are updated to implement and verify this new contract.

Changes

EmbeddingProvider model_id expansion

Layer / File(s)	Summary
EmbeddingProvider trait contract `src/openhuman/embeddings/provider_trait.rs`	EmbeddingProvider adds required `model_id()` method and a stable default `signature()` implementation that formats provider, model identifier, and dimension count.
Production provider implementations `src/openhuman/embeddings/cloud.rs`, `src/openhuman/embeddings/noop.rs`, `src/openhuman/embeddings/ollama.rs`, `src/openhuman/embeddings/openai.rs`	Cloud, Noop, Ollama, and OpenAI embedding providers each implement `model_id()` to return their configured or constant model identifier.
Test fake implementations `src/openhuman/embeddings/store_tests.rs`	Test fakes FakeEmbedding and MismatchEmbedding implement `model_id()` returning "fake" and "mismatch" respectively.
Test assertions for model_id and signature `src/openhuman/embeddings/ollama_tests.rs`, `src/openhuman/embeddings/openai_tests.rs`, `src/openhuman/embeddings/mod.rs`, `src/openhuman/embeddings/cloud.rs`	All accessor and factory tests are updated to assert `model_id()` values and verify `signature()` produces the expected formatted string with provider, model, and dimensions.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A model needs its name so clear,
The signature rings loud and near,
Each provider now proclaims with pride,
Its identity, dimensions wide—
From cloud to noop, the trait's complete!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: expose embedding provider model signatures' directly and accurately describes the main change—adding a model_id() method and signature() default implementation to the EmbeddingProvider trait across all embedding providers.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

src/openhuman/embeddings/mod.rs (1)

79-79: ⚡ Quick win

Avoid hardcoding the default model name in the signature assertion.

This assertion duplicates the model literal ("bge-m3") instead of reusing DEFAULT_OLLAMA_MODEL, which makes the test brittle when defaults change.

♻️ Proposed fix

-        assert_eq!(p.signature(), "provider=ollama;model=bge-m3;dims=768");
+        assert_eq!(
+            p.signature(),
+            format!("provider=ollama;model={DEFAULT_OLLAMA_MODEL};dims=768")
+        );

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/embeddings/mod.rs` at line 79, The test currently hardcodes the
default Ollama model in the signature assertion; update the assertion to use the
constant DEFAULT_OLLAMA_MODEL instead of the literal "bge-m3" so it
automatically reflects changes to the default. Locate the assertion comparing
p.signature() and replace the hardcoded model fragment with a constructed
expected string that interpolates DEFAULT_OLLAMA_MODEL (keeping
"provider=ollama" and "dims=768" as before) so the test remains correct when
DEFAULT_OLLAMA_MODEL changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/embeddings/mod.rs`:
- Line 79: The test currently hardcodes the default Ollama model in the
signature assertion; update the assertion to use the constant
DEFAULT_OLLAMA_MODEL instead of the literal "bge-m3" so it automatically
reflects changes to the default. Locate the assertion comparing p.signature()
and replace the hardcoded model fragment with a constructed expected string that
interpolates DEFAULT_OLLAMA_MODEL (keeping "provider=ollama" and "dims=768" as
before) so the test remains correct when DEFAULT_OLLAMA_MODEL changes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0948dff6-bc4e-4130-b092-4b68bb84facd

📥 Commits

Reviewing files that changed from the base of the PR and between bd5c2ed and c5d5079.

📒 Files selected for processing (9)

src/openhuman/embeddings/cloud.rs
src/openhuman/embeddings/mod.rs
src/openhuman/embeddings/noop.rs
src/openhuman/embeddings/ollama.rs
src/openhuman/embeddings/ollama_tests.rs
src/openhuman/embeddings/openai.rs
src/openhuman/embeddings/openai_tests.rs
src/openhuman/embeddings/provider_trait.rs
src/openhuman/embeddings/store_tests.rs

Co-authored-by: honor2030 <19909783+honor2030@users.noreply.github.com>

feat: expose embedding provider model signatures

c5d5079

honor2030 requested a review from a team May 16, 2026 07:49

coderabbitai Bot reviewed May 16, 2026

View reviewed changes

coderabbitai Bot approved these changes May 16, 2026

View reviewed changes

senamakel approved these changes May 17, 2026

View reviewed changes

senamakel merged commit c87be66 into tinyhumansai:main May 17, 2026
24 of 25 checks passed

This was referenced May 17, 2026

feat(memory): add per-model embedding tables #2010

Merged

feat(memory): #1574 Stage 2 — per-(row,model) embedding storage live (cutover + migration + re-embed backfill + switch UX) #2153

Merged

AusAgentSmith pushed a commit to AusAgentSmith/openhuman that referenced this pull request May 23, 2026

feat: expose embedding provider model signatures (tinyhumansai#1901)

6b8c33d

Co-authored-by: honor2030 <19909783+honor2030@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expose embedding provider model signatures#1901

feat: expose embedding provider model signatures#1901
senamakel merged 1 commit into
tinyhumansai:mainfrom
honor2030:feat/memory-embedding-model-signatures

honor2030 commented May 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

honor2030 commented May 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this slice

Non-goals

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

honor2030 commented May 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 16, 2026 •

edited

Loading