Skip to content

fix: change default embedding model to nomic-v1.5#53

Closed
carlos-alm wants to merge 4 commits intomainfrom
docs/dogfood-report-v2.2.0
Closed

fix: change default embedding model to nomic-v1.5#53
carlos-alm wants to merge 4 commits intomainfrom
docs/dogfood-report-v2.2.0

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Change default embedding model from jina-code to nomic-v1.5
  • jina-code is gated on HuggingFace and crashes without HF_TOKEN, making codegraph embed fail out-of-the-box
  • nomic-v1.5 is public, same 768d dimensions, improved quality, 8192 token context
  • Updated default in src/embedder.js, CLI help in src/cli.js, and model table in README.md
  • Includes dogfooding report for v2.2.0 (generated/DOGFOOD_REPORT_v2.2.0.md)

Test plan

  • codegraph models shows nomic-v1.5 as default
  • codegraph embed --help shows nomic-v1.5 as default
  • All 24 search/embedding tests pass
  • Verified nomic-v1.5 is publicly accessible on HuggingFace (no auth needed)

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 23, 2026

Greptile Summary

Changed default embedding model from jina-code to nomic-v1.5 to fix out-of-the-box failures. The jina-code model is gated on HuggingFace and requires HF_TOKEN, causing codegraph embed to crash for new users. The nomic-v1.5 model is public, has identical 768d dimensions, improved quality, and 8192 token context.

Key changes:

  • Updated DEFAULT_MODEL in src/embedder.js:58
  • Updated CLI help text in src/cli.js:377,391-392
  • Updated model table in README.md to note jina-code requires HF token
  • Fixed benchmark workflow trigger to prevent cancellation
  • Added comprehensive dogfooding report documenting the bug this PR fixes

Note: README.md includes substantial unrelated documentation updates (tool count 13→17, new command descriptions) that appear to have been merged from another branch. These doc changes are accurate but create a larger-than-expected diff for a "default model change" PR.

Confidence Score: 5/5

  • Safe to merge - fixes a critical user experience bug with minimal risk
  • The core change (default model switch) is a single-line constant update in 3 locations, fully tested according to the PR description (24 tests pass), and directly addresses a documented bug. The nomic-v1.5 model has identical dimensions to jina-code, ensuring no breaking changes for existing users. The workflow fix is isolated and prevents a known issue with workflow cancellation.
  • No files require special attention - all changes are straightforward and well-justified

Important Files Changed

Filename Overview
src/embedder.js Changed DEFAULT_MODEL from jina-code to nomic-v1.5 (line 58). No logic issues found.
src/cli.js Updated default model references in models command (line 377) and embed command (lines 391-392). Consistent with embedder.js change.
README.md Updated model table to reflect nomic-v1.5 as default, noted jina-code requires HF token. Also includes unrelated doc updates (tool count 13→17, new features).
.github/workflows/benchmark.yml Changed trigger from release.published to workflow_run for "Publish" workflow. Prevents benchmark workflow cancellation when publish completes.
generated/DOGFOOD_REPORT_v2.2.0.md New dogfooding report documenting bug #3 (embed default model requiring HF auth) that this PR fixes. Report is comprehensive and well-structured.

Last reviewed commit: 54ee772

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Change benchmark trigger from release:published to workflow_run on
Publish so the two workflows no longer race. Benchmark now starts
only after Publish completes successfully.
Tested all 22 CLI commands against the codegraph repo itself.
Found 4 bugs: structure "." filter, stale embeddings after rebuild,
default embed model requires HF auth, cross-language false positive.
jina-code requires HuggingFace authentication (gated model), causing
`codegraph embed` to crash for users without HF_TOKEN. nomic-v1.5 is
public, same 768d dimensions, and improved quality with 8192 context.
@carlos-alm carlos-alm force-pushed the docs/dogfood-report-v2.2.0 branch from 54ee772 to f40bb91 Compare February 23, 2026 08:52
@claude
Copy link

claude bot commented Feb 23, 2026

Claude finished @carlos-alm's task —— View job


PR Review Complete ✅

Reviewing PR #53: fix: change default embedding model to nomic-v1.5

Todo List:

  • Phase 1: Root cause analysis - understand the problem being solved
  • Phase 2: Configuration change detection - check for high-risk changes
  • Phase 3: Test integrity - verify no weakened assertions or removed tests
  • Phase 4: Code quality - assess implementation quality
  • Final Assessment: Provide rating and recommendation

Root Cause Analysis ✅

Problem Being Solved: The default jina-code embedding model is gated on HuggingFace and requires an HF_TOKEN environment variable. When users run codegraph embed out-of-the-box, it crashes with an "Unauthorized access" error instead of working immediately.

Why This Approach: Switching to nomic-v1.5 as the default makes sense because:

  • It's publicly accessible (no auth required)
  • Same 768d dimensions as jina-code (no breaking changes)
  • Improved quality and 8192 token context vs jina-code
  • This is documented as Bug Bump actions/github-script from 7 to 8 #3 in the comprehensive dogfooding report

Evidence: The dogfooding report specifically documents this issue: "codegraph embed . crashes with Error: Unauthorized access to file for the default jina-code model" and notes the error is an unhandled exception with poor user experience.

Configuration Change Detection ⚠️

Critical Issue Found: There's a failing test that expects the old default:

Location: /tests/unit/config.test.js:58

expect(DEFAULTS.embeddings).toEqual({ model: 'jina-code', llmProvider: null });

This test will fail because it hardcodes jina-code as the expected default but the code now uses nomic-v1.5. This test needs to be updated to match the new default.

Other Changes: The workflow change from release.published to workflow_run is justified to prevent benchmark workflow cancellation when the publish workflow completes.

Test Integrity ❌

RED FLAG: The config test expects the old default model but wasn't updated. This indicates:

  1. Tests might not have been run after the change
  2. The change breaks existing test expectations
  3. This PR would fail CI if the test suite was properly executed

Required Fix: Update line 58 in /tests/unit/config.test.js:

// Change from:
expect(DEFAULTS.embeddings).toEqual({ model: 'jina-code', llmProvider: null });
// To:
expect(DEFAULTS.embeddings).toEqual({ model: 'nomic-v1.5', llmProvider: null });

Code Quality ✅

  • Core changes are minimal and correct: Only 3 locations changed (DEFAULT_MODEL constant + 2 CLI help text updates)
  • Backward compatibility maintained: Same 768d dimensions, existing embeddings still work
  • Documentation updated properly: README reflects the change and notes jina-code now requires HF token
  • Dogfooding report is comprehensive: Well-structured documentation of the bug this PR fixes

Critical Concerns

  1. FAILING TEST: The config test hardcodes the old default and will break CI
  2. Missing test verification: The PR description claims "All 24 search/embedding tests pass" but the config test contradiction suggests tests weren't fully validated

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: `jina-code` model requires HF auth, crashes for new users out-of-the-box
**Why This Approach**: `nomic-v1.5` is public, same dimensions, better quality - perfect drop-in replacement
**Risk Assessment**: Low risk change, but incomplete due to missing test update

## Critical Concerns
- **FAILING TEST**: `/tests/unit/config.test.js:58` expects old default, will break CI
- **Test validation gap**: Claims all tests pass but config test wasn't updated

## Final Recommendation
- Rating: ⭐⭐☆☆☆ (2/5)
- Action: **REQUEST CHANGES**
- Reasoning: The core fix is excellent and well-justified, but there's a definitive test failure that must be addressed. The hardcoded test expectation proves the test suite wasn't properly validated. This is a simple fix but critical for CI stability.

Required Change: Update the test expectation in /tests/unit/config.test.js:58 from 'jina-code' to 'nomic-v1.5'.


@claude
Copy link

claude bot commented Feb 23, 2026

Claude finished @carlos-alm's task —— View job


PR Review - fix: change default embedding model to nomic-v1.5

Todo List:

  • Phase 1: Root cause analysis - understand the problem being solved
  • Phase 2: Configuration change detection - check for high-risk changes
  • Phase 3: Test integrity - verify no weakened assertions or removed tests
  • Phase 4: Code quality - assess implementation quality
  • Final Assessment: Provide rating and recommendation

Root Cause Analysis ✅

Problem Being Solved: The default jina-code embedding model is gated on HuggingFace and requires an HF_TOKEN environment variable. When users run codegraph embed out-of-the-box, it crashes with an "Unauthorized access" error instead of working immediately.

Why This Approach: Switching to nomic-v1.5 as the default makes sense because:

  • It's publicly accessible (no auth required)
  • Same 768d dimensions as jina-code (no breaking changes)
  • Improved quality and 8192 token context vs jina-code
  • This is documented as Bug Bump actions/github-script from 7 to 8 #3 in the comprehensive dogfooding report

Evidence: The dogfooding report specifically documents this issue: "codegraph embed . crashes with Error: Unauthorized access to file for the default jina-code model" and notes the error is an unhandled exception with poor user experience.

Configuration Change Detection ✅

No High-Risk Changes Detected:

  • No linting rules disabled or relaxed
  • No test timeouts increased or coverage thresholds lowered
  • No TypeScript strict mode flags disabled
  • Workflow change is justified: Changed from release.published to workflow_run to prevent benchmark workflow cancellation when the publish workflow completes - this is a legitimate fix, not a workaround

Core Changes Are Minimal:

  • src/embedder.js:58: DEFAULT_MODEL = 'nomic-v1.5' (was 'jina-code')
  • src/config.js:22: Updated DEFAULTS to match
  • src/cli.js:377,391-392: Updated CLI help text to reflect new default
  • tests/unit/config.test.js:58: PROPERLY UPDATED test expectation to match new default

Test Integrity ✅

No Weakened Tests Found:

  • Critical test fixed: Previous Claude review correctly identified that tests/unit/config.test.js:58 expected the old default. This test has been properly updated to expect 'nomic-v1.5' instead of 'jina-code'
  • No other test dependencies: Grep search confirms no other test files reference jina-code or DEFAULT_MODEL
  • Test remains strict: The assertion still validates the exact default value - it wasn't weakened, just updated to match the intentional change
  • No removed or skipped tests: All existing test structure is preserved

Code Quality ✅

Implementation is Excellent:

  • Consistent changes: All 4 locations updated properly (embedder.js, config.js, cli.js, config.test.js)
  • Documentation updated: README.md table correctly shows nomic-v1.5 as default and notes jina-code requires HF token
  • Backward compatibility preserved: Same 768d dimensions ensure existing embeddings remain compatible
  • No breaking changes: Users with existing jina-code embeddings can still use them

Quality Indicators:

  • Minimal scope: Only changes the default, no logic modifications
  • Clear rationale: Dogfooding report thoroughly documents the UX problem
  • Proper testing: Config test properly updated to verify new default
  • Complete coverage: All CLI help text, defaults, and documentation updated consistently

No Security or Performance Concerns:

  • Model swap is functionally equivalent (same dimensions)
  • No new dependencies or attack surface
  • Public model reduces auth complexity (security improvement)

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: `jina-code` model requires HF auth, crashes for new users out-of-the-box
**Why This Approach**: `nomic-v1.5` is public, same dimensions, better quality - perfect drop-in replacement
**Risk Assessment**: Very low risk - backward compatible change with comprehensive testing

## Critical Concerns
None found. This PR addresses all previous review concerns:
- ✅ Test failure fixed (config.test.js properly updated)
- ✅ All affected files consistently updated
- ✅ Documentation reflects changes accurately
- ✅ No breaking changes or weakened tests

## Final Recommendation
- **Rating**: ⭐⭐⭐⭐⭐ (5/5)
- **Action**: **APPROVE**
- **Reasoning**: Exemplary fix for a real user problem. The PR solves a critical UX issue (out-of-the-box crashes) with minimal, well-tested changes. All code locations are consistently updated, the test suite properly validates the new default, and documentation is complete. The previous Claude review's concern about test failures was correctly identified and has been resolved. This is exactly how default changes should be implemented - thorough, consistent, and well-documented.

Confidence: Very High - This addresses a documented bug with a minimal, safe, and complete solution.


@carlos-alm
Copy link
Contributor Author

Recreating with correct branch name

@carlos-alm carlos-alm closed this Feb 23, 2026
@carlos-alm carlos-alm deleted the docs/dogfood-report-v2.2.0 branch February 23, 2026 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant