Skip to content

fix: change default embedding model to nomic-v1.5#54

Merged
carlos-alm merged 4 commits intomainfrom
fix/default-embedding-model
Feb 23, 2026
Merged

fix: change default embedding model to nomic-v1.5#54
carlos-alm merged 4 commits intomainfrom
fix/default-embedding-model

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Change default embedding model from jina-code to nomic-v1.5 in src/embedder.js, src/config.js, src/cli.js, and README.md
  • jina-code is gated on HuggingFace and crashes without HF_TOKEN, making codegraph embed fail out-of-the-box
  • nomic-v1.5 is public, same 768d dimensions, improved quality, 8192 token context
  • Updated config defaults and test expectations to match

Test plan

  • codegraph models shows nomic-v1.5 as default
  • codegraph embed --help shows nomic-v1.5 as default
  • All 24 search/embedding tests pass
  • All 31 config unit tests pass
  • Verified nomic-v1.5 is publicly accessible on HuggingFace (no auth needed)

Change benchmark trigger from release:published to workflow_run on
Publish so the two workflows no longer race. Benchmark now starts
only after Publish completes successfully.
Tested all 22 CLI commands against the codegraph repo itself.
Found 4 bugs: structure "." filter, stale embeddings after rebuild,
default embed model requires HF auth, cross-language false positive.
jina-code requires HuggingFace authentication (gated model), causing
`codegraph embed` to crash for users without HF_TOKEN. nomic-v1.5 is
public, same 768d dimensions, and improved quality with 8192 context.
@claude
Copy link

claude bot commented Feb 23, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR: Change Default Embedding Model to nomic-v1.5 ⭐⭐⭐⭐☆

Todo List:

  • Read repository guidelines (CLAUDE.md)
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity verification
  • Phase 4: Code quality review
  • Provide final rating and recommendation

Root Cause Analysis ✅

Problem Being Solved: The current default embedding model jina-code is gated on HuggingFace and requires HF_TOKEN authentication, causing codegraph embed to fail out-of-the-box for new users. This creates a poor first-time user experience.

Why This Approach: Switching to nomic-v1.5 makes sense because:

  • It's publicly accessible (no auth required)
  • Same 768-dimensional vector space as jina-code
  • Improved quality and 8192 token context
  • Maintains compatibility with existing workflows

Risk Assessment: LOW - This is a sensible default change that fixes a real usability issue without breaking existing functionality.

Configuration Change Detection ✅

Examining all configuration changes:

src/embedder.js:58 - ✅ DEFAULT_MODEL changed from 'jina-code' to 'nomic-v1.5'
src/config.js:22 - ✅ DEFAULTS.embeddings.model changed from 'jina-code' to 'nomic-v1.5'
src/cli.js:377,392 - ✅ Help text and default updated to show 'nomic-v1.5 (default)'
README.md - ✅ Documentation updated, model table shows new default
tests/unit/config.test.js:58 - ✅ Test expectation updated to match new default

No concerning relaxations detected - All changes are consistent model updates, not rule/threshold relaxations.

Test Integrity Verification ✅

  • ✅ Test expectations correctly updated: config.test.js:58 now expects nomic-v1.5 instead of jina-code
  • ✅ No weakened assertions: The test still validates the exact model value, just with the new default
  • ✅ Consistent across commits: Two commits properly separated core change from test updates
  • ✅ Clean change pattern: Model constant → test expectation → documentation - proper sequencing

Code Quality Review ✅

  • ✅ Consistent implementation: All 5 locations properly updated with same value
  • ✅ Documentation synchronization: README.md model table updated, help text matches
  • ✅ No breaking changes: Vector dimensions remain 768d, batch size mapping preserved
  • ✅ Proper model validation: getModelConfig() will catch invalid models at runtime
  • ✅ Clean commits: Logical separation of changes across focused commits

Critical Concerns

None identified. This is a well-executed default change.

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: jina-code model requires HuggingFace auth (HF_TOKEN), breaking out-of-box experience
**Why This Approach**: nomic-v1.5 is public, same 768d dimensions, improved quality/context
**Risk Assessment**: LOW - Compatible upgrade that fixes real usability issue

## Critical Concerns
None. Clean implementation with proper test updates.

## Final Recommendation
- Rating: ⭐⭐⭐⭐☆ (4/5)
- Action: APPROVE  
- Reasoning: Fixes genuine user-blocking issue, maintains compatibility, well-tested, properly documented. Minor loss of one star due to this being a reactive fix rather than proactive design.

@carlos-alm carlos-alm merged commit 3917801 into main Feb 23, 2026
30 checks passed
@carlos-alm carlos-alm deleted the fix/default-embedding-model branch February 23, 2026 08:57
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 23, 2026

Greptile Summary

Changed default embedding model from jina-code to nomic-v1.5 across the codebase to fix out-of-the-box failures. The previous default (jina-code) is gated on HuggingFace and requires an HF_TOKEN environment variable, causing codegraph embed to crash for new users. The new default (nomic-v1.5) is publicly accessible, maintains the same 768-dimensional output, offers improved quality, and supports 8192 token context.

Changes include:

  • Updated DEFAULT_MODEL constant in src/embedder.js
  • Updated config defaults in src/config.js
  • Updated CLI help text and default flag value in src/cli.js
  • Updated test expectations in tests/unit/config.test.js
  • Updated README documentation table to clarify which models require auth
  • Added dogfooding report documenting the bug this PR fixes (bug Bump actions/github-script from 7 to 8 #3)
  • Fixed benchmark workflow trigger to prevent cancellation during publish

All changes are consistent and properly synchronized across code, tests, and documentation. The PR directly addresses a documented bug from dogfooding and improves the out-of-the-box user experience.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk - it's a straightforward configuration change with proper test coverage
  • The change is a simple constant update propagated consistently across all relevant files (source, config, CLI, tests, docs). The new default model has identical dimensions (768d) to the old one, ensuring backward compatibility with existing embeddings. All 31 config tests and 24 search/embedding tests pass according to the test plan. The change fixes a real bug (HF auth requirement blocking new users) documented in the dogfooding report. No logic changes, no new dependencies, no API modifications.
  • No files require special attention

Important Files Changed

Filename Overview
src/embedder.js Changed DEFAULT_MODEL from jina-code to nomic-v1.5 - straightforward constant update
src/config.js Updated DEFAULTS.embeddings.model from jina-code to nomic-v1.5 to match new default
src/cli.js Updated CLI help text and default argument for --model flag to reflect nomic-v1.5 as default
tests/unit/config.test.js Updated test expectation to match new default embedding model nomic-v1.5
.github/workflows/benchmark.yml Changed trigger from release.published to workflow_run after Publish workflow completes - prevents workflow cancellation

Last reviewed commit: 3a88b4c

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant