Skip to content

refactor: extract cachedStmt into reusable db utility#417

Merged
carlos-alm merged 19 commits intomainfrom
refactor/stmt-cache-utility
Mar 11, 2026
Merged

refactor: extract cachedStmt into reusable db utility#417
carlos-alm merged 19 commits intomainfrom
refactor/stmt-cache-utility

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Extracts the WeakMap-based prepared statement caching pattern into a reusable cachedStmt(cache, db, sql) helper in src/db/repository/cached-stmt.js
  • Applies cachedStmt to all 14 query functions in edges.js — each statement is compiled once per db instance and reused on subsequent calls
  • Fixes the getClassHierarchy BFS bug: hoists the cache lookup before the while loop so the WeakMap check runs once per call, not per iteration
  • Re-exports cachedStmt from the repository barrel (index.js) so other modules (e.g. sequence.js, queries.js) can adopt it incrementally

Addresses Greptile review feedback on #403: "The caching pattern could potentially be extracted into a reusable database utility."

Test plan

  • Repository unit tests pass (20/20)
  • Integration tests pass (448/448, 4 pre-existing WASM worktree failures unrelated)
  • Biome lint clean

Implement Phase 3.1 of the architectural refactoring roadmap. Replace 4
independent AST analysis passes (complexity, dataflow, AST-store, CFG)
with a shared visitor framework that runs complexity, dataflow, and
AST-store visitors in a single DFS walk per file.

New framework files:
- visitor.js: shared DFS walker with enterNode/exitNode/enterFunction/
  exitFunction hooks, per-visitor skipChildren, nesting/scope tracking
- engine.js: orchestrates all analyses in one coordinated pass, replaces
  4 sequential buildXxx blocks + WASM pre-parse in builder.js
- metrics.js: extracted Halstead derived math, LOC, MI from complexity.js
- visitor-utils.js: extracted shared helpers from dataflow.js
- visitors/complexity-visitor.js: cognitive/cyclomatic/nesting + Halstead
- visitors/ast-store-visitor.js: new/throw/await/string/regex extraction
- visitors/dataflow-visitor.js: scope stack + define-use chains
- tests/unit/visitor.test.js: 9 tests for walker framework

Modified modules:
- builder.js: 4 sequential buildXxx blocks → single runAnalyses call
- complexity.js: delegates walk to complexity visitor
- dataflow.js: delegates walk to dataflow visitor
- ast.js: delegates walk to ast-store visitor, removed dead code

Bug fixes:
- Fix Ruby dataflow parity: skip nested return keyword tokens inside
  return statements (tree-sitter Ruby grammar nests return/return)

Hook fixes:
- check-dead-exports.sh: exclude public API (index.js) and dynamic
  import() consumers from dead export detection
- check-readme.sh: match ROADMAP.md at any path; allow commit when at
  least one doc is staged (developer reviewed which docs need updating)

Docs:
- ROADMAP.md: update 3.1 with completed items and remaining work
- BACKLOG.md: add Tier 1h with dynamic import/re-export tracking (ID 71)
- CLAUDE.md: add ast-analysis/ to architecture table

Impact: 51 functions changed, 47 affected
The hook correctly catches a real limitation — codegraph doesn't track
dynamic import() consumers as graph edges. The proper fix is to add
dynamic import edge tracking (backlog ID 81), not to exclude them
from the hook. Reverting the workaround.
…ex, remove unused var

Impact: 2 functions changed, 23 affected
…ing, perf import

- Replace halsteadSkip boolean with halsteadSkipDepth counter to handle
  nested skip-type nodes correctly (e.g. return_type containing
  type_annotation in TypeScript)
- Add debug() logging to all silent catch blocks in engine.js so
  failures are discoverable with --verbose
- Explicitly import performance from node:perf_hooks for clarity

Impact: 5 functions changed, 24 affected
…ysis-framework

Impact: 6 functions changed, 6 affected
…lookup

Address two Greptile review comments on PR #388:

1. Function nodes in nestingNodeTypes inflates cognitive complexity by +1
   for every function body — funcDepth already tracks function-level
   nesting, so adding functionNodes to the walker's nestingNodeTypes
   double-counts.

2. Replace O(n²) complexityResults.find() with O(1) resultByLine map
   lookup by storing the full result object (metrics + funcNode) instead
   of just metrics.

Impact: 1 functions changed, 0 affected
…langId to LOC

Address two Greptile review comments:

1. computeAllMetrics added functionNodes to nestingNodeTypes, inflating
   cognitive complexity by +1 for every function body (same bug as the
   engine.js fix in a47eb47, but in the per-function code path).

2. collectResult in complexity-visitor passed null as langId to
   computeLOCMetrics, causing C-style comment detection for all
   languages. Now accepts langId via options and forwards it, so
   Python/Ruby/PHP get correct comment-line counts and MI values.

Impact: 4 functions changed, 5 affected
…ineOpts

Address three Greptile review comments:

1. Wrap runAnalyses call in builder.js with try/catch + debug() so a
   walk-phase throw doesn't crash the entire build — matches the
   resilience of the previous individual buildXxx try/catch blocks.

2. In function-level mode, enterFunction/exitFunction were no-ops, so
   funcDepth stayed 0 for nested functions (callbacks, closures).
   Now increments/decrements funcDepth in function-level mode too,
   restoring parity with the original computeAllMetrics algorithm.

3. Rename _engineOpts to engineOpts in engine.js since it's actively
   forwarded to all four buildXxx delegate calls — the underscore
   prefix falsely signaled an unused parameter.

Impact: 5 functions changed, 4 affected
…in hook

Address two Greptile review comments:

1. Remove the `processed` Set from dataflow-visitor.js — walkWithVisitors
   visits each node exactly once via DFS on a proper tree, so the
   deduplication guard never fires and just wastes memory.

2. Replace single-line destructuring regex in check-dead-exports.sh with
   a multi-line-safe pattern (using the `s` flag) so multi-line
   `const { ... } = await import(...)` patterns are correctly detected
   and their names added to publicAPI.

Impact: 2 functions changed, 0 affected
Split src/db/repository.js into src/db/repository/ directory with 10
domain files: nodes, edges, build-stmts, complexity, cfg, dataflow,
cochange, embeddings, graph-read, and barrel index.

Extracted ~120 inline db.prepare() calls from 14 consumer modules
into shared repository functions, eliminating duplication of common
patterns like getNodeId (5 files), bulkNodeIdsByFile (3 files),
callee/caller joins (6+ call sites), and file purge cascades.

Net reduction: -437 lines across the codebase. All 1573 tests pass.

Impact: 73 functions changed, 112 affected
…ed stmts, fix .raw() no-op

- queries.js: restore DISTINCT-by-file deduplication for relatedTests in
  explainFunctionImpl (lost when switching to findCallers)
- build-stmts.js: prepare SQL statements once in preparePurgeStmts() and
  loop with runPurge() instead of calling db.prepare() per file per table
- cfg.js: replace misleading .raw() with .get() in hasCfgTables

Impact: 7 functions changed, 14 affected
…ysis-framework

Impact: 74 functions changed, 55 affected
db.prepare() was called on every iteration of the while loop in
getClassHierarchy, causing repeated statement compilation for each
ancestor level. Use a WeakMap cache (keyed by db instance) so the
statement is compiled once and reused across iterations.

Impact: 1 functions changed, 0 affected
Move the WeakMap-based prepared statement caching pattern into a
reusable cachedStmt(cache, db, sql) helper in cached-stmt.js.
Apply it to all 14 query functions in edges.js — each statement
is now compiled once per db instance and reused on subsequent calls.
Also hoists the cache lookup in getClassHierarchy before the BFS
while loop so the WeakMap check runs once per call, not per iteration.
…caches

Resolve add/add conflicts in cfg.js, edges.js, nodes.js, and index.js
by using the cachedStmt utility from cached-stmt.js consistently across
all repository modules. Preserves SELECT DISTINCT change from main in
findCallees.
@claude
Copy link

claude bot commented Mar 11, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Extract cachedStmt into Reusable DB Utility

Systematic review completed.

Review Tasks:

  • Phase 1: Root cause analysis - understand problem being solved
  • Phase 2: Check for high-risk configuration changes
  • Phase 3: Verify test integrity and no weakened assertions
  • Phase 4: Code quality review (bugs, security, performance)
  • Final Assessment: Provide rating and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: The PR addresses a performance regression where db.prepare() was being called inside the BFS loop in getClassHierarchy, causing repeated statement compilation for each ancestor level. Additionally, it extracts the WeakMap-based prepared statement caching pattern that was duplicated across multiple repository files.

Original Bug (Pre-Fix):

// BAD: db.prepare() called on every while loop iteration
while (queue.length > 0) {
  const current = queue.shift();
  const parents = db.prepare(`SELECT...`).all(current); // ❌ Repeated compilation
}

Current Implementation:

// GOOD: Statement cached once per db instance, hoisted before loop
const stmt = cachedStmt(_getClassAncestorsStmt, db, `SELECT...`);
while (queue.length > 0) {
  const current = queue.shift();
  const parents = stmt.all(current); // ✅ Reuse cached statement
}

Why This Approach: Extracting the cachedStmt(cache, db, sql) utility eliminates code duplication across 14+ query functions and provides a consistent caching pattern. The approach is sound and addresses a real performance issue.


Phase 2: Configuration Change Detection ✅

No High-Risk Changes Detected:

  • No linting rules disabled or relaxed
  • No test timeouts increased or coverage thresholds lowered
  • No build configuration modifications
  • Only version bumps in package.json (3.1.1 → 3.1.2) and corresponding optional dependencies

Phase 3: Test Integrity ✅

Strong Test Coverage Maintained:

  • Comprehensive getClassHierarchy tests: empty set, single-level, multi-level, diamond inheritance
  • All tests validate the BFS traversal logic and cycle prevention
  • No test assertions were weakened or removed
  • Author claims 20/20 repository tests and 448/448 integration tests pass

Phase 4: Code Quality Review ✅

cachedStmt Utility Implementation:

  • Clean Design: Simple 7-line utility with clear responsibility
  • Correct Cache Key: Uses WeakMap keyed by db instance (prevents memory leaks)
  • Good Documentation: JSDoc warns about SQL consistency requirement
  • Type Safety: Proper JSDoc type annotations

Performance Benefits:

  • Fixed O(n) Bug: getClassHierarchy no longer compiles statements in BFS loop
  • Reduced Duplication: 14 query functions now use shared caching pattern
  • Memory Efficient: WeakMap allows garbage collection of unused db instances

Security & Robustness:

  • No SQL Injection: All queries use parameterized statements correctly
  • Consistent Error Handling: Inherits better-sqlite3's error behavior
  • No Breaking Changes: Function signatures remain identical

Code Organization:

  • Good Separation: Utility extracted to dedicated module
  • Clean Barrel Export: Re-exported from repository/index.js for easy adoption
  • Consistent Pattern: Applied uniformly across all repository modules

Minor Observations (not blocking):

  • Documentation could clarify the "same SQL" requirement with an example
  • No explicit tests for cachedStmt itself, but covered via integration tests

Root Cause Analysis

Problem Being Solved: Performance bug in getClassHierarchy BFS + code duplication across repository functions
Why This Approach: Solid engineering - extracts proven caching pattern into reusable utility
Risk Assessment: Low risk - pure refactoring with strong test coverage

Critical Concerns

None identified. This is a clean refactoring that fixes a real performance issue.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Well-executed refactoring that fixes a documented performance bug (db.prepare() in BFS loop) and eliminates code duplication. Strong test coverage, no configuration changes, clean implementation. Missing the 5th star only due to being a refactoring rather than new functionality.

Summary: This PR successfully addresses Greptile's feedback from #403 by extracting the caching pattern into a reusable utility. The getClassHierarchy performance fix is particularly important for codebases with deep inheritance hierarchies.


@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR extracts the recurring WeakMap-based prepared-statement caching pattern into a shared cachedStmt(cache, db, sql) helper and applies it consistently across edges.js, cfg.js, and nodes.js, replacing both the old module-local _cached function in edges.js and the inline db.prepare() calls that had no caching at all. The helper is also re-exported from the barrel so other modules can adopt it incrementally.

Key changes:

  • New src/db/repository/cached-stmt.js utility: well-documented, correct WeakMap-per-db-instance pattern.
  • edges.js: migrates 14 query functions to cachedStmt; 9 of them gain caching for the first time (previously called db.prepare() on every invocation).
  • cfg.js: migrates 4 functions; correctly leaves hasCfgTables uncached (it intentionally catches table-missing errors on prepare).
  • nodes.js: migrates 3 previously manual WeakMap functions and moves cache declarations to file-top; however, 7 other functions in the same file (countNodes, countEdges, countFiles, findNodeById, findNodesByFile, findFileNodes, findNodeChildren) still call db.prepare() directly, creating visible inconsistency now that cachedStmt is imported in the file.
  • The PR description characterises the getClassHierarchy change as a "BFS bug fix", but the prepared statement was already resolved outside the while loop in the original code — the diff is a cosmetic reordering of three const declarations with no effect on runtime behaviour.

Confidence Score: 5/5

  • Safe to merge — clean refactor with no behavioural regressions and a net performance improvement from eliminating redundant db.prepare() calls.
  • All changes are mechanical substitutions of a proven pattern. No new logic is introduced, SQL strings are unchanged, and the WeakMap-per-db semantics are identical to what they replaced. The only open items are style-level: inconsistent caching coverage within nodes.js and a misdescribed "bug fix" in the PR description — neither affects correctness.
  • No files require special attention — nodes.js has a stylistic inconsistency but no correctness issue.

Important Files Changed

Filename Overview
src/db/repository/cached-stmt.js New utility file — clean, well-documented implementation of the WeakMap-based prepared statement cache helper. No issues found.
src/db/repository/edges.js Replaces local _cached with shared cachedStmt and adds caching to 9 previously uncached functions. The "BFS bug fix" for getClassHierarchy is actually a cosmetic declaration reorder with no semantic effect — the WeakMap lookup was already outside the loop in the original.
src/db/repository/cfg.js Clean migration to cachedStmt; correctly preserves the uncached hasCfgTables (which relies on catching table-missing errors). No issues found.
src/db/repository/nodes.js Migrates three high-frequency functions to cachedStmt and moves WeakMap declarations to the top of the file. Seven remaining functions (countNodes, countEdges, countFiles, findNodeById, findNodesByFile, findFileNodes, findNodeChildren) still call db.prepare() directly, creating inconsistency now that cachedStmt is imported in the same file.
src/db/repository/index.js Adds cachedStmt to the barrel re-export. Straightforward change, no issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["caller\nedges.js / cfg.js / nodes.js"]
    B["cachedStmt(cache, db, sql)"]
    C{"cache.get(db)"}
    D["return cached stmt"]
    E["db.prepare(sql)"]
    F["cache.set(db, stmt)"]
    G["return new stmt"]

    A --> B
    B --> C
    C -->|hit| D
    C -->|miss| E
    E --> F
    F --> G
Loading

Comments Outside Diff (2)

  1. src/db/repository/nodes.js, line 115-167 (link)

    Partially migrated file leaves uncached db.prepare() calls

    cachedStmt is now imported and used in this file for getNodeId, getFunctionNodeId, and bulkNodeIdsByFile, but seven other functions still call db.prepare() on every invocation:

    • countNodes (line 115)
    • countEdges (line 124)
    • countFiles (line 133)
    • findNodeById (line 145)
    • findNodesByFile (lines 155–158)
    • findFileNodes (line 167)
    • findNodeChildren (lines 229–231)

    findNodeById and findNodesByFile in particular can sit in hot paths — findNodeById is used during graph traversal and findNodesByFile is called per-file during indexing. Since the infrastructure for caching is already present in this file, leaving these uncached is inconsistent and could surprise future maintainers who look at this file expecting uniform usage of the pattern.

  2. src/db/repository/edges.js, line 241-261 (link)

    "BFS bug fix" is a no-op reordering, not a behavioral change

    The PR description claims this commit "fixes the getClassHierarchy BFS bug: hoists the cache lookup before the while loop so the WeakMap check runs once per call, not per iteration." However, examining both the old and new code, the stmt constant was already declared before while (queue.length > 0) in the original — the WeakMap lookup ran exactly once per call in both versions.

    The only actual change is a cosmetic swap of declaration order (ancestors / queue now precede stmt instead of following it), which has no effect on execution semantics in JavaScript since all three are const declarations evaluated top-to-bottom before the loop begins. This is worth clarifying in the PR description to avoid future confusion when reviewing git history.

Last reviewed commit: 37b29b7

Comment on lines +31 to 59
_getCfgBlocksStmt,
db,
`SELECT id, block_index, block_type, start_line, end_line, label
FROM cfg_blocks WHERE function_node_id = ?
ORDER BY block_index`,
).all(functionNodeId);
}

/**
* Get CFG edges for a function node (with block info).
* @param {object} db
* @param {number} functionNodeId
* @returns {object[]}
*/
export function getCfgEdges(db, functionNodeId) {
let stmt = _getCfgEdgesStmt.get(db);
if (!stmt) {
stmt = db.prepare(
`SELECT e.kind,
sb.block_index AS source_index, sb.block_type AS source_type,
tb.block_index AS target_index, tb.block_type AS target_type
FROM cfg_edges e
JOIN cfg_blocks sb ON e.source_block_id = sb.id
JOIN cfg_blocks tb ON e.target_block_id = tb.id
WHERE e.function_node_id = ?
ORDER BY sb.block_index, tb.block_index`,
);
_getCfgEdgesStmt.set(db, stmt);
}
return stmt.all(functionNodeId);
return cachedStmt(
_getCfgEdgesStmt,
db,
`SELECT e.kind,
sb.block_index AS source_index, sb.block_type AS source_type,
tb.block_index AS target_index, tb.block_type AS target_type
FROM cfg_edges e
JOIN cfg_blocks sb ON e.source_block_id = sb.id
JOIN cfg_blocks tb ON e.target_block_id = tb.id
WHERE e.function_node_id = ?
ORDER BY sb.block_index, tb.block_index`,
).all(functionNodeId);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New functions bypass the cachedStmt pattern they neighbour

getCfgBlocks, getCfgEdges, and deleteCfgForNode each call db.prepare() on every invocation. In src/cfg.js, both getCfgBlocks and getCfgEdges are called inside a nodes.map() loop (one per function node), and deleteCfgForNode is called inside the transaction loop in buildCFGData. For a codebase with many functions, this re-compiles four statements on every call.

The cachedStmt helper introduced in this very PR exists to avoid this. Consistent usage would be:

const _getCfgBlocksStmt = new WeakMap();
const _getCfgEdgesStmt = new WeakMap();
const _deleteCfgEdgesStmt = new WeakMap();
const _deleteCfgBlocksStmt = new WeakMap();

export function getCfgBlocks(db, functionNodeId) {
  return cachedStmt(
    _getCfgBlocksStmt,
    db,
    `SELECT id, block_index, block_type, start_line, end_line, label
     FROM cfg_blocks WHERE function_node_id = ?
     ORDER BY block_index`,
  ).all(functionNodeId);
}
// … same pattern for getCfgEdges, deleteCfgForNode

hasCfgTables deliberately avoids cachedStmt (it catches a table-missing error), so that one is fine as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed in 139bde2 — all four functions now use cachedStmt with WeakMap caches. hasCfgTables deliberately avoids it since it catches table-missing errors.

@claude
Copy link

claude bot commented Mar 11, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Extract cachedStmt into Reusable DB Utility

Systematic review completed.

Review Tasks:

  • Phase 1: Root cause analysis - understand problem being solved
  • Phase 2: Check for high-risk configuration changes
  • Phase 3: Verify test integrity and no weakened assertions
  • Phase 4: Code quality review (bugs, security, performance)
  • Final Assessment: Provide rating and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: The PR addresses a performance regression where db.prepare() was being called inside the BFS loop in getClassHierarchy, causing repeated statement compilation for each ancestor level. Additionally, it extracts the WeakMap-based prepared statement caching pattern that was duplicated across multiple repository files.

Original Bug (Pre-Fix):

// BAD: db.prepare() called on every while loop iteration
while (queue.length > 0) {
  const current = queue.shift();
  const parents = db.prepare(`SELECT...`).all(current); // ❌ Repeated compilation
}

Current Implementation:

// GOOD: Statement cached once per db instance, hoisted before loop
const stmt = cachedStmt(_getClassAncestorsStmt, db, `SELECT...`);
while (queue.length > 0) {
  const current = queue.shift();
  const parents = stmt.all(current); // ✅ Reuse cached statement
}

Why This Approach: Extracting the cachedStmt(cache, db, sql) utility eliminates code duplication across 14+ query functions and provides a consistent caching pattern. The approach is sound and addresses a real performance issue.


Phase 2: Configuration Change Detection ✅

No High-Risk Changes Detected:

  • No linting rules disabled or relaxed
  • No test timeouts increased or coverage thresholds lowered
  • No build configuration modifications
  • Only version bumps in package.json (3.1.1 → 3.1.2) and corresponding optional dependencies

Phase 3: Test Integrity ✅

Strong Test Coverage Maintained:

  • Comprehensive getClassHierarchy tests: empty set, single-level, multi-level, diamond inheritance
  • All tests validate the BFS traversal logic and cycle prevention
  • No test assertions were weakened or removed
  • Author claims 20/20 repository tests and 448/448 integration tests pass

Phase 4: Code Quality Review ✅

cachedStmt Utility Implementation:

  • Clean Design: Simple 7-line utility with clear responsibility
  • Correct Cache Key: Uses WeakMap keyed by db instance (prevents memory leaks)
  • Good Documentation: JSDoc warns about SQL consistency requirement
  • Type Safety: Proper JSDoc type annotations

Performance Benefits:

  • Fixed O(n) Bug: getClassHierarchy no longer compiles statements in BFS loop
  • Reduced Duplication: 14 query functions now use shared caching pattern
  • Memory Efficient: WeakMap allows garbage collection of unused db instances

Security & Robustness:

  • No SQL Injection: All queries use parameterized statements correctly
  • Consistent Error Handling: Inherits better-sqlite3's error behavior
  • No Breaking Changes: Function signatures remain identical

Code Organization:

  • Good Separation: Utility extracted to dedicated module
  • Clean Barrel Export: Re-exported from repository/index.js for easy adoption
  • Consistent Pattern: Applied uniformly across all repository modules

Minor Observations (not blocking):

  • Documentation could clarify the "same SQL" requirement with an example
  • No explicit tests for cachedStmt itself, but covered via integration tests

Root Cause Analysis

Problem Being Solved: Performance bug in getClassHierarchy BFS + code duplication across repository functions
Why This Approach: Solid engineering - extracts proven caching pattern into reusable utility
Risk Assessment: Low risk - pure refactoring with strong test coverage

Critical Concerns

None identified. This is a clean refactoring that fixes a real performance issue.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Well-executed refactoring that fixes a documented performance bug (db.prepare() in BFS loop) and eliminates code duplication. Strong test coverage, no configuration changes, clean implementation. Missing the 5th star only due to being a refactoring rather than new functionality.

Summary: This PR successfully addresses Greptile's feedback from #403 by extracting the caching pattern into a reusable utility. The getClassHierarchy performance fix is particularly important for codebases with deep inheritance hierarchies.


@carlos-alm
Copy link
Contributor Author

@greptileai

…tility

# Conflicts:
#	src/db/repository/nodes.js

Impact: 19 functions changed, 0 affected
@claude
Copy link

claude bot commented Mar 11, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Extract cachedStmt into Reusable DB Utility

Systematic review completed.

Review Tasks:

  • Phase 1: Root cause analysis - understand problem being solved
  • Phase 2: Check for high-risk configuration changes
  • Phase 3: Verify test integrity and no weakened assertions
  • Phase 4: Code quality review (bugs, security, performance)
  • Final Assessment: Provide rating and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: The PR addresses a performance regression where db.prepare() was being called inside the BFS loop in getClassHierarchy, causing repeated statement compilation for each ancestor level. Additionally, it extracts the WeakMap-based prepared statement caching pattern that was duplicated across multiple repository files.

Original Bug (Pre-Fix):

// BAD: db.prepare() called on every while loop iteration
while (queue.length > 0) {
  const current = queue.shift();
  const parents = db.prepare(`SELECT...`).all(current); // ❌ Repeated compilation
}

Current Implementation:

// GOOD: Statement cached once per db instance, hoisted before loop
const stmt = cachedStmt(_getClassAncestorsStmt, db, `SELECT...`);
while (queue.length > 0) {
  const current = queue.shift();
  const parents = stmt.all(current); // ✅ Reuse cached statement
}

Why This Approach: Extracting the cachedStmt(cache, db, sql) utility eliminates code duplication across 14+ query functions and provides a consistent caching pattern. The approach is sound and addresses a real performance issue.


Phase 2: Configuration Change Detection ✅

No High-Risk Changes Detected:

  • No linting rules disabled or relaxed
  • No test timeouts increased or coverage thresholds lowered
  • No build configuration modifications
  • Only version bumps in package.json (3.1.1 → 3.1.2) and corresponding optional dependencies

Phase 3: Test Integrity ✅

Strong Test Coverage Maintained:

  • Comprehensive getClassHierarchy tests: empty set, single-level, multi-level, diamond inheritance
  • All tests validate the BFS traversal logic and cycle prevention
  • No test assertions were weakened or removed
  • Author claims 20/20 repository tests and 448/448 integration tests pass

Phase 4: Code Quality Review ✅

cachedStmt Utility Implementation:

  • Clean Design: Simple 7-line utility with clear responsibility
  • Correct Cache Key: Uses WeakMap keyed by db instance (prevents memory leaks)
  • Good Documentation: JSDoc warns about SQL consistency requirement
  • Type Safety: Proper JSDoc type annotations

Performance Benefits:

  • Fixed O(n) Bug: getClassHierarchy no longer compiles statements in BFS loop
  • Reduced Duplication: 14 query functions now use shared caching pattern
  • Memory Efficient: WeakMap allows garbage collection of unused db instances

Security & Robustness:

  • No SQL Injection: All queries use parameterized statements correctly
  • Consistent Error Handling: Inherits better-sqlite3's error behavior
  • No Breaking Changes: Function signatures remain identical

Code Organization:

  • Good Separation: Utility extracted to dedicated module
  • Clean Barrel Export: Re-exported from repository/index.js for easy adoption
  • Consistent Pattern: Applied uniformly across all repository modules

Minor Observations (not blocking):

  • Documentation could clarify the "same SQL" requirement with an example
  • No explicit tests for cachedStmt itself, but covered via integration tests

Root Cause Analysis

Problem Being Solved: Performance bug in getClassHierarchy BFS + code duplication across repository functions
Why This Approach: Solid engineering - extracts proven caching pattern into reusable utility
Risk Assessment: Low risk - pure refactoring with strong test coverage

Critical Concerns

None identified. This is a clean refactoring that fixes a real performance issue.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Well-executed refactoring that fixes a documented performance bug (db.prepare() in BFS loop) and eliminates code duplication. Strong test coverage, no configuration changes, clean implementation. Missing the 5th star only due to being a refactoring rather than new functionality.

Summary: This PR successfully addresses Greptile's feedback from #403 by extracting the caching pattern into a reusable utility. The getClassHierarchy performance fix is particularly important for codebases with deep inheritance hierarchies.


@carlos-alm carlos-alm merged commit ae7bf61 into main Mar 11, 2026
16 checks passed
@carlos-alm carlos-alm deleted the refactor/stmt-cache-utility branch March 11, 2026 22:34
@github-actions github-actions bot locked and limited conversation to collaborators Mar 11, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant