Skip to content

feat: add normalizeSymbol utility for stable JSON schema#259

Closed
carlos-alm wants to merge 2 commits intomainfrom
feat/normalize-symbol-schema
Closed

feat: add normalizeSymbol utility for stable JSON schema#259
carlos-alm wants to merge 2 commits intomainfrom
feat/normalize-symbol-schema

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Add normalizeSymbol(row, db, hashCache) utility that returns a consistent 7-field symbol shape (name, kind, file, line, endLine, role, fileHash) across all query and search commands
  • Update 7 query functions in queries.js and 4 search functions in embedder.js to use normalizeSymbol, closing field gaps where endLine/role/fileHash were missing
  • Update SQL in listFunctionsData, rolesData, iterators, _prepareSearch, and ftsSearchData to include end_line and role columns
  • Export normalizeSymbol from index.js for programmatic consumers
  • Add docs/json-schema.md documenting the stable base shape, valid kinds/roles, and command envelope shapes
  • Add 8 unit tests for normalizeSymbol and 7 integration schema conformance tests

Test plan

  • All 1099 existing tests pass
  • 8 new unit tests for normalizeSymbol (coalescing, caching, null db)
  • 7 new integration tests asserting all 7 stable fields on queryNameData, fnDepsData, fnImpactData, whereData, explainData, listFunctionsData
  • Lint clean (biome check)

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 3, 2026

Greptile Summary

This PR introduces normalizeSymbol, a utility that standardizes symbol metadata across all query and search commands to a consistent 7-field shape (name, kind, file, line, endLine, role, fileHash). The implementation updates 11 functions across queries.js and embedder.js to use this utility, adds comprehensive documentation in json-schema.md, and includes 15 new tests (8 unit + 7 integration) to verify schema conformance.

Key changes:

  • Added normalizeSymbol(row, db, hashCache) with per-file hash caching for performance
  • Updated SQL queries to include end_line and role columns where missing
  • Exported the utility from index.js for programmatic API consumers
  • Comprehensive schema documentation with examples for all commands

Additional commits in branch:

  • Change journal infrastructure (change-journal.js, watcher.js updates)
  • Commitlint workflow update to allow benchmark/ branch prefix

The PR is well-tested, thoroughly documented, and implements a clean abstraction that closes field gaps across the codebase.

Confidence Score: 5/5

  • This PR is safe to merge with no identified issues
  • The implementation is clean, well-tested, and follows good engineering practices. The normalizeSymbol utility is properly abstracted with hash caching for performance. All query/search functions are updated consistently. Test coverage is comprehensive (15 new tests). Documentation is thorough. No bugs, security issues, or logical errors found.
  • No files require special attention

Important Files Changed

Filename Overview
src/queries.js Added normalizeSymbol utility and updated 7 query functions to return consistent 7-field symbol shape; SQL updated to include end_line and role columns
src/embedder.js Updated 4 search functions to use normalizeSymbol for consistent schema; SQL queries updated to select end_line and role
src/index.js Exported normalizeSymbol for programmatic API consumers
docs/json-schema.md Comprehensive documentation of stable 7-field symbol shape, valid kinds/roles, and command envelope examples
tests/unit/normalize-symbol.test.js 8 unit tests covering normalizeSymbol behavior: field coalescing, defaults, hash caching, null db handling
tests/integration/queries.test.js Added file hash test data and 7 integration tests asserting all query functions return stable 7-field schema

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Query/Search Function] --> B{Has DB rows?}
    B -->|Yes| C[Create hashCache Map]
    C --> D[For each row]
    D --> E[Call normalizeSymbol row, db, hashCache]
    E --> F{fileHash in cache?}
    F -->|Yes| G[Use cached hash]
    F -->|No| H[getFileHash from DB]
    H --> I[Cache hash for file]
    I --> J[Return 7-field object]
    G --> J
    J --> K{More rows?}
    K -->|Yes| D
    K -->|No| L[Return normalized results]
    B -->|No| M[Return empty results]
    
    style E fill:#e1f5ff
    style J fill:#d4edda
    style H fill:#fff3cd
Loading

Last reviewed commit: df57e02

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@claude
Copy link

claude bot commented Mar 3, 2026

Claude finished @carlos-alm's task —— View job


PR Review Complete

Following strict review guidelines for codegraph PR #259: feat: add normalizeSymbol utility for stable JSON schema

Review Checklist:

  • Phase 1: Root cause analysis - understand problem being solved
  • Phase 2: Configuration change detection (high scrutiny)
  • Phase 3: Test integrity verification
  • Phase 4: Code quality review
  • Final assessment with rating and recommendation

Root Cause Analysis

Problem Being Solved: This PR addresses the lack of consistency in symbol metadata across different query and search commands. Previously, some functions returned symbols missing endLine, role, or fileHash fields, creating an inconsistent API experience for programmatic consumers.

Why This Approach: The normalizeSymbol utility is a clean abstraction that standardizes all symbol returns to a consistent 7-field shape. The approach includes:

  • Per-file hash caching for performance optimization
  • Graceful handling of missing/null values
  • Export from index.js for programmatic API access
  • Comprehensive SQL updates to include missing columns

Risk Assessment: ✅ LOW RISK - This is a pure additive change that improves API consistency without breaking existing functionality.

Configuration & Test Analysis

✅ Configuration Changes (Clean)

  • No linting rules relaxed or disabled
  • No test coverage thresholds lowered
  • No timeout increases or strict mode disabling
  • Only legitimate configuration change: .github/workflows/commitlint.yml updated to allow benchmark/ branch prefix (justified by ci: allow benchmark/ branch name prefix #258)

✅ Test Integrity (Strong)

  • 15 new tests added (8 unit + 7 integration) with comprehensive coverage
  • No existing tests weakened or assertions removed
  • Unit tests cover edge cases: caching, null handling, field coalescing
  • Integration tests verify all 7 stable fields across all affected query functions
  • Tests follow existing patterns and maintain quality standards

Code Quality Assessment

✅ Implementation Quality

normalizeSymbol utility:

  • Clean, well-documented function with proper JSDoc types
  • Efficient hash caching strategy prevents redundant DB queries
  • Proper fallback handling (end_lineendLinenull)
  • Graceful null handling for missing DB/hash data

SQL Updates:

  • All affected queries updated to include end_line and role columns
  • Consistent application across 11 functions in queries.js and embedder.js
  • No SQL injection vulnerabilities introduced

Performance Considerations:

  • Hash caching prevents N+1 query problem for multi-symbol results
  • Minimal overhead for single-symbol queries
  • No unnecessary full rebuilds or missing incremental optimizations

✅ Security & Stability

  • No command injection risks
  • No exposure of sensitive data
  • All new code follows established patterns
  • Export properly added to index.js for API consumers

✅ Documentation

  • Comprehensive docs/json-schema.md with examples for all commands
  • Clear field definitions and valid value enumerations
  • Proper API documentation for the utility function

Minor Observations

  1. Good: Consistent use of ?? nullish coalescing for field defaults
  2. Good: Proper separation of concerns with hash caching logic
  3. Good: Maintains backward compatibility while improving consistency
  4. Good: Following existing code patterns and import conventions

Final Recommendation

## Root Cause Analysis
**Problem Being Solved**: Inconsistent symbol metadata across query/search commands missing endLine/role/fileHash fields
**Why This Approach**: Clean normalizeSymbol utility with hash caching provides stable 7-field API shape
**Risk Assessment**: LOW RISK - Pure additive improvement with comprehensive testing

## Critical Concerns
None identified. This is well-engineered, thoroughly tested code.

## Final Recommendation  
- Rating: ⭐⭐⭐⭐☆ (4/5)
- Action: APPROVE
- Reasoning: High-quality implementation solving a real API consistency problem. Comprehensive testing, good performance optimization, excellent documentation. Only missing the 5th star due to being an incremental improvement rather than groundbreaking functionality.

This PR demonstrates excellent engineering practices with proper abstraction, comprehensive testing, and thorough documentation. The normalizeSymbol utility solves a real problem and improves the developer experience for API consumers.


@carlos-alm carlos-alm force-pushed the feat/normalize-symbol-schema branch from 9a6401e to 1f1cff0 Compare March 3, 2026 01:28
@carlos-alm carlos-alm closed this Mar 3, 2026
@carlos-alm carlos-alm reopened this Mar 3, 2026
@github-actions github-actions bot locked and limited conversation to collaborators Mar 3, 2026
@carlos-alm carlos-alm force-pushed the feat/normalize-symbol-schema branch 2 times, most recently from 1f1cff0 to 628d4cd Compare March 3, 2026 01:35
@carlos-alm carlos-alm closed this Mar 3, 2026
@carlos-alm carlos-alm reopened this Mar 3, 2026
Add normalizeSymbol(row, db, hashCache) that returns a consistent
7-field symbol shape (name, kind, file, line, endLine, role, fileHash)
across all query and search commands.

Update queryNameData, fnDepsData, fnImpactData, explainFunctionImpl,
listFunctionsData, rolesData, whereSymbolImpl in queries.js and
searchData, multiSearchData, ftsSearchData, hybridSearchData in
embedder.js to use normalizeSymbol. Update SQL in listFunctionsData,
rolesData, iterListFunctions, iterRoles, _prepareSearch, and
ftsSearchData to include end_line and role columns.

Export normalizeSymbol from index.js. Add docs/json-schema.md
documenting the stable schema. Add 8 unit tests and 7 integration
schema conformance tests.

Impact: 13 functions changed, 33 affected
@carlos-alm
Copy link
Contributor Author

Superseded by #267 (rebased, clean commit history, all CI passing).

@carlos-alm carlos-alm closed this Mar 3, 2026
@carlos-alm carlos-alm deleted the feat/normalize-symbol-schema branch March 3, 2026 06:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant