Skip to content

feat: add normalizeSymbol utility for stable JSON schema#267

Merged
carlos-alm merged 1 commit intomainfrom
feat/normalize-symbol-schema-v2
Mar 3, 2026
Merged

feat: add normalizeSymbol utility for stable JSON schema#267
carlos-alm merged 1 commit intomainfrom
feat/normalize-symbol-schema-v2

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Add normalizeSymbol(row, db, hashCache) utility that returns a consistent 7-field symbol shape (name, kind, file, line, endLine, role, fileHash) across all query and search commands
  • Update queryNameData, fnDepsData, fnImpactData, explainFunctionImpl, listFunctionsData, rolesData, whereSymbolImpl in queries.js to use normalizeSymbol
  • Update searchData, multiSearchData, ftsSearchData, hybridSearchData in embedder.js to use normalizeSymbol
  • Add docs/json-schema.md documenting the stable schema
  • Add 8 unit tests and 7 integration schema conformance tests

Split from #261 — this PR contains only the normalizeSymbol feature.

Test plan

  • npm test passes
  • All query commands return the 7-field base shape
  • fileHash cache prevents redundant DB lookups (unit test)
  • Iterator functions yield endLine and role without fileHash (by design)

Add normalizeSymbol(row, db, hashCache) that returns a consistent
7-field symbol shape (name, kind, file, line, endLine, role, fileHash)
across all query and search commands.

Update queryNameData, fnDepsData, fnImpactData, explainFunctionImpl,
listFunctionsData, rolesData, whereSymbolImpl in queries.js and
searchData, multiSearchData, ftsSearchData, hybridSearchData in
embedder.js to use normalizeSymbol. Update SQL in listFunctionsData,
rolesData, iterListFunctions, iterRoles, _prepareSearch, and
ftsSearchData to include end_line and role columns.

Export normalizeSymbol from index.js. Add docs/json-schema.md
documenting the stable schema. Add 8 unit tests and 7 integration
schema conformance tests.

Impact: 13 functions changed, 33 affected

Impact: 14 functions changed, 42 affected

Impact: 13 functions changed, 21 affected
@claude
Copy link

claude bot commented Mar 3, 2026

Claude finished @carlos-alm's task —— View job


PR Review: normalizeSymbol utility

Todo List:

  • Read and analyze all changed files
  • Understand the problem being solved (root cause analysis)
  • Check for configuration changes or relaxed rules
  • Review test integrity and coverage
  • Analyze code quality and potential issues
  • Provide final assessment with rating

Root Cause Analysis

Problem Being Solved: This PR addresses the lack of a consistent symbol schema across different query and search commands. Previously, different commands returned symbol objects with varying field structures, making it difficult for API consumers to reliably process results. The normalizeSymbol utility creates a stable 7-field base shape that all commands now use.

Why This Approach: The approach is well-justified:

  • Introduces a centralized utility function (normalizeSymbol) that transforms raw database rows into a consistent schema
  • Updates all major query functions to use this utility, ensuring consistency
  • Adds comprehensive documentation and tests
  • Uses a caching mechanism (hashCache) to optimize performance by avoiding redundant file hash lookups

Risk Assessment: Low risk. This is primarily a refactoring that standardizes existing functionality without changing core behavior.

Configuration Changes

No problematic configuration changes detected. No linting rules were relaxed, test thresholds lowered, or build configurations modified.

Test Integrity

Strong test coverage:

  • 8 comprehensive unit tests in normalize-symbol.test.js covering:
    • Full row normalization with all fields
    • Minimal row with null defaults
    • Column name priority (end_line vs endLine)
    • Hash cache optimization and reuse
    • Edge cases (missing files, null db)
  • 7 integration tests verifying schema conformance across all major query commands
  • Tests are well-structured with clear expectations and proper mocking

Code Quality Analysis

Strengths

  1. Clean Implementation (src/queries.js:2753-2774):

    • Simple, focused function with clear purpose
    • Proper JSDoc documentation
    • Handles edge cases gracefully (null db, missing fields)
    • Smart caching mechanism prevents redundant database calls
  2. Consistent Integration: All relevant functions updated uniformly:

    • queryNameData, fnDepsData, fnImpactData, explainFunctionImpl
    • listFunctionsData, rolesData, whereSymbolImpl
    • Search functions in embedder.js
  3. Excellent Documentation (docs/json-schema.md):

    • Clear schema definition with examples
    • Comprehensive command coverage
    • Proper distinction between full symbols and lightweight references
  4. Export Consistency: Function properly exported in both queries.js and index.js for API consumers

⚠️ Minor Concerns

  1. Performance Impact: Adding fileHash lookups to every normalized symbol could impact performance for large result sets. However, the hashCache parameter mitigates this well.

  2. Breaking Change Potential: While this standardizes the schema, existing API consumers might be affected if they relied on the inconsistent previous structure. The PR description doesn't mention backwards compatibility considerations.

  3. Database Dependency: The function requires a database connection to populate fileHash, which could complicate usage in some contexts. However, it gracefully handles null db.

🔍 Code Inspection

Line-by-Line Review of normalizeSymbol (src/queries.js:2753-2774):

  • ✅ Proper parameter validation and null handling
  • ✅ Efficient caching logic prevents redundant DB queries
  • ✅ Correct field priority handling (end_line over endLine)
  • ✅ Consistent return structure matches documented schema
  • ✅ No security concerns (uses parameterized queries via getFileHash)

Integration Points:

  • ✅ All function updates follow the same pattern: ...normalizeSymbol(row, db, hc)
  • ✅ Hash cache properly initialized and passed through call chains
  • ✅ No breaking changes to existing lightweight reference structures

Security & Performance

  • ✅ No security vulnerabilities introduced
  • ✅ No SQL injection risks (uses existing parameterized query functions)
  • ✅ Performance optimized with hash caching
  • ✅ Memory usage reasonable (cache is bounded by unique file count)

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: Inconsistent symbol schema across query/search commands
**Why This Approach**: Centralized utility with caching for performance optimization
**Risk Assessment**: Low risk - primarily standardization refactoring

## Critical Concerns
None. This is a well-executed feature addition with comprehensive testing.

## Final Recommendation
- Rating: ⭐⭐⭐⭐☆ (4/5)
- Action: APPROVE WITH CONDITIONS
- Reasoning: High-quality implementation with excellent test coverage. The only concern is potential backwards compatibility impact for API consumers, but the benefits of schema consistency outweigh this minor risk.

Conditions for Approval:

  1. Consider adding a note in CHANGELOG about the schema standardization for API consumers
  2. Monitor performance in production to ensure hash caching is effective

This PR represents solid software engineering: it solves a real problem (schema inconsistency) with a clean, well-tested solution that includes proper documentation and performance considerations.

@carlos-alm carlos-alm merged commit 8549f32 into main Mar 3, 2026
17 checks passed
@carlos-alm carlos-alm deleted the feat/normalize-symbol-schema-v2 branch March 3, 2026 01:42
@github-actions github-actions bot locked and limited conversation to collaborators Mar 3, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant