refactor: repository pattern for data access#371
Conversation
Move connection/locking logic to src/db/connection.js and schema migrations to src/db/migrations.js. src/db.js becomes a barrel re-export file so all 55 existing import sites continue to work unchanged. Impact: 10 functions changed, 102 affected
Introduce NodeQuery fluent builder and standalone helpers (testFilterSQL, kindInClause, fanInJoinSQL, fanOutJoinSQL) to eliminate duplicated WHERE clause and JOIN patterns across modules. Add repository.js with centralized read methods (findNodesForTriage, listFunctionNodes, iterateFunctionNodes, findNodesWithFanIn, count*). Includes 45 new unit tests covering builder composition and repository functions. Impact: 31 functions changed, 33 affected
Replace inline SQL clause building with query-builder helpers and repository functions in 7 call sites across 3 files: - triage.js: triageData() uses findNodesForTriage() - queries.js: findMatchingNodes() uses findNodesWithFanIn(), listFunctionsData() uses listFunctionNodes(), iterListFunctions() uses iterateFunctionNodes(), moduleMapData() and statsData() use testFilterSQL() - structure.js: hotspotsData() uses testFilterSQL() Net reduction of ~100 lines of duplicated WHERE/JOIN patterns. Impact: 6 functions changed, 25 affected
|
Claude finished @carlos-alm's task —— View job Review CompletePhase 1: Root Cause Analysis ✅Problem Being Solved: Code duplication in SQL query patterns across multiple files - specifically WHERE clause conditions (test file exclusions, kind filtering) and JOIN patterns (fan-in/fan-out edge counts). The monolithic Why This Approach: This is a legitimate refactoring that addresses real technical debt:
Risk Assessment: Low risk - this is genuine refactoring, not symptom hiding. Phase 2: Configuration Change Detection ✅No problematic configuration changes detected:
Phase 3: Test Integrity ✅New tests are comprehensive and well-structured:
Phase 4: Code Quality Review ✅Strengths:
Critical Issues
Security Context: I examined the codebase extensively - Other Observations:
Root Cause AnalysisProblem Being Solved: Legitimate code duplication elimination in SQL patterns - this solves a real maintainability issue, not hiding symptoms. Critical ConcernsSQL Injection Risk: Final Recommendation
Required Fix: Sanitize |
Greptile SummaryThis PR successfully refactors the monolithic Key accomplishments:
The refactoring is well-structured with thoughtful API hardening. Internal code uses safe parameterized queries, and the Confidence Score: 5/5
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
subgraph "Public API (src/index.js)"
IDX[index.js]
end
subgraph "Barrel (src/db.js)"
DB[db.js re-export]
end
subgraph "src/db/ module"
CONN[connection.js<br/>openDb · closeDb · findDbPath<br/>openReadonlyOrFail]
MIG[migrations.js<br/>MIGRATIONS · initSchema<br/>getBuildMeta · setBuildMeta]
QB[query-builder.js<br/>NodeQuery · testFilterSQL<br/>fanInJoinSQL · fanOutJoinSQL<br/>kindInClause]
REPO[repository.js<br/>findNodesForTriage<br/>listFunctionNodes · iterateFunctionNodes<br/>findNodesWithFanIn<br/>countNodes · countEdges · countFiles]
end
subgraph "Constants (src/kinds.js)"
KINDS[kinds.js<br/>EVERY_SYMBOL_KIND<br/>EVERY_EDGE_KIND<br/>VALID_ROLES]
end
subgraph "Call sites"
TRIAGE[triage.js]
QUERIES[queries.js]
STRUCTURE[structure.js]
end
IDX --> DB
DB --> CONN & MIG & QB & REPO
REPO --> QB
QB --> KINDS
REPO --> KINDS
QUERIES -->|re-exports| KINDS
TRIAGE --> DB
QUERIES --> DB
STRUCTURE --> DB
Last reviewed commit: 9c9ab51 |
| export function fanInJoinSQL(edgeKind = 'calls', alias = 'fi') { | ||
| return `LEFT JOIN ( | ||
| SELECT target_id, COUNT(*) AS cnt FROM edges WHERE kind = '${edgeKind}' GROUP BY target_id | ||
| ) ${alias} ON ${alias}.target_id = n.id`; | ||
| } | ||
|
|
||
| /** | ||
| * Return a LEFT JOIN subquery for fan-out (outgoing edge count). | ||
| * @param {string} [edgeKind='calls'] - Edge kind to count | ||
| * @param {string} [alias='fo'] - Subquery alias | ||
| */ | ||
| export function fanOutJoinSQL(edgeKind = 'calls', alias = 'fo') { | ||
| return `LEFT JOIN ( | ||
| SELECT source_id, COUNT(*) AS cnt FROM edges WHERE kind = '${edgeKind}' GROUP BY source_id | ||
| ) ${alias} ON ${alias}.source_id = n.id`; | ||
| } |
There was a problem hiding this comment.
SQL injection via string-interpolated edgeKind and alias
Both edgeKind and alias are interpolated directly into the returned SQL fragment without sanitization or parameterization. edgeKind is quoted with single quotes so typical injection requires escaping the quote, but alias is injected bare (no quoting), meaning a value like fi ON fi.target_id = n.id; DROP TABLE nodes; -- would produce valid — and destructive — SQL.
These helpers are re-exported from src/db.js and src/index.js as part of the public programmatic API, so external callers can pass user-supplied strings.
export function fanInJoinSQL(edgeKind = 'calls', alias = 'fi') {
return `LEFT JOIN (
SELECT target_id, COUNT(*) AS cnt FROM edges WHERE kind = '${edgeKind}' GROUP BY target_id
) ${alias} ON ${alias}.target_id = n.id`;
}Consider restricting inputs to an allowlist of known edge kinds (e.g., validate against CORE_EDGE_KINDS) and a safe alias pattern (e.g., /^[a-zA-Z_][a-zA-Z0-9_]*$/), or documenting clearly that callers must never pass user-controlled values:
const VALID_ALIAS = /^[a-z_][a-z0-9_]*$/i;
export function fanInJoinSQL(edgeKind = 'calls', alias = 'fi') {
if (!VALID_ALIAS.test(alias)) throw new Error(`Invalid alias: ${alias}`);
// edgeKind validated against CORE_EDGE_KINDS or similarly
...
}The same concern applies to fanOutJoinSQL.
There was a problem hiding this comment.
Fixed: added validateAlias() (regex /^[a-z_][a-z0-9_]*$/i) and validateEdgeKind() (checks against EVERY_EDGE_KIND) to both fanInJoinSQL and fanOutJoinSQL. They now throw on invalid input before any SQL interpolation.
| export function findNodesForTriage(db, opts = {}) { | ||
| const q = new NodeQuery() | ||
| .select( | ||
| `n.id, n.name, n.kind, n.file, n.line, n.end_line, n.role, | ||
| COALESCE(fi.cnt, 0) AS fan_in, | ||
| COALESCE(fc.cognitive, 0) AS cognitive, | ||
| COALESCE(fc.maintainability_index, 0) AS mi, | ||
| COALESCE(fc.cyclomatic, 0) AS cyclomatic, | ||
| COALESCE(fc.max_nesting, 0) AS max_nesting, | ||
| COALESCE(fcc.commit_count, 0) AS churn`, | ||
| ) | ||
| .kinds(['function', 'method', 'class']) | ||
| .withFanIn() | ||
| .withComplexity() | ||
| .withChurn() | ||
| .excludeTests(opts.noTests) | ||
| .fileFilter(opts.file) | ||
| .kindFilter(opts.kind) | ||
| .roleFilter(opts.role) | ||
| .orderBy('n.file, n.line'); | ||
|
|
||
| return q.all(db); |
There was a problem hiding this comment.
Redundant double-kind filter silently returns zero rows for non-triage kinds
findNodesForTriage unconditionally adds .kinds(['function', 'method', 'class']) and then conditionally adds .kindFilter(opts.kind). If a caller passes opts.kind = 'interface' (a valid node kind), the generated WHERE clause becomes:
WHERE n.kind IN ('function', 'method', 'class') AND n.kind = 'interface'This will always return zero rows — silently, with no error — even though interface nodes exist in the DB. Since triage.js surfaces this to end-users via codegraph triage --kind interface, the empty result will appear identical to "no data" rather than "unsupported kind for triage".
Consider either (a) removing .kindFilter(opts.kind) and relying only on .kinds(['function', 'method', 'class']) (triage is intentionally limited to these kinds), or (b) replacing the hardcoded .kinds() call with the user-supplied kind when opts.kind is provided, and documenting the constraint.
There was a problem hiding this comment.
Fixed: when \ is provided and valid (checked against ), it now replaces the hardcoded \ array rather than being applied as a conflicting second filter. Invalid kinds throw an error with a clear message.
src/db/repository.js
Outdated
| export function listFunctionNodes(db, opts = {}) { | ||
| const q = new NodeQuery() | ||
| .select('name, kind, file, line, end_line, role') | ||
| .kinds(['function', 'method', 'class']) | ||
| .fileFilter(opts.file) | ||
| .nameLike(opts.pattern) | ||
| .orderBy('file, line'); | ||
|
|
||
| return q.all(db); | ||
| } | ||
|
|
||
| /** | ||
| * Iterator version of listFunctionNodes for memory efficiency. | ||
| * Used by iterListFunctions in queries.js. | ||
| * | ||
| * @param {object} db | ||
| * @param {object} [opts] | ||
| * @returns {IterableIterator} | ||
| */ | ||
| export function iterateFunctionNodes(db, opts = {}) { | ||
| const q = new NodeQuery() | ||
| .select('name, kind, file, line, end_line, role') | ||
| .kinds(['function', 'method', 'class']) | ||
| .fileFilter(opts.file) | ||
| .nameLike(opts.pattern) | ||
| .orderBy('file, line'); | ||
|
|
||
| return q.iterate(db); | ||
| } |
There was a problem hiding this comment.
listFunctionNodes and iterateFunctionNodes share identical query-builder setup
The two functions are identical except for the terminal call (.all(db) vs .iterate(db)). This is a copy-paste duplication that will silently diverge if one is updated but not the other (e.g., adding a new filter option).
Consider extracting the shared builder into a private helper:
function _functionNodeQuery(opts = {}) {
return new NodeQuery()
.select('name, kind, file, line, end_line, role')
.kinds(['function', 'method', 'class'])
.fileFilter(opts.file)
.nameLike(opts.pattern)
.orderBy('file, line');
}
export function listFunctionNodes(db, opts = {}) {
return _functionNodeQuery(opts).all(db);
}
export function iterateFunctionNodes(db, opts = {}) {
return _functionNodeQuery(opts).iterate(db);
}There was a problem hiding this comment.
Fixed: extracted shared query setup into a private _functionNodeQuery(opts) helper. Both listFunctionNodes and iterateFunctionNodes now delegate to it, calling .all(db) or .iterate(db) respectively.
…node query Impact: 8 functions changed, 6 affected
Impact: 1 functions changed, 4 affected
|
Claude finished @carlos-alm's task —— View job PR Review: Repository Pattern RefactoringTodo Checklist:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Legitimate code duplication elimination in SQL patterns - specifically WHERE clause conditions (test file exclusions, kind filtering) and JOIN patterns (fan-in/fan-out edge counts). The monolithic Why This Approach: Clean architecture separation with backward compatibility via barrel exports. The refactoring addresses real technical debt by:
Risk Assessment: Low risk - this is genuine refactoring, not symptom hiding. The barrel export strategy preserves all existing imports with no breaking changes. Phase 2: Configuration Change Detection ✅No problematic configuration changes detected:
The changes are pure code refactoring without any rule relaxation to hide underlying issues. Phase 3: Test Integrity ✅New tests are comprehensive and well-structured:
Phase 4: Code Quality Review ✅Strengths:
No significant issues remaining: All previously identified concerns have been addressed. Root Cause AnalysisProblem Being Solved: Legitimate code duplication elimination and architectural cleanup - addresses real technical debt without hiding symptoms. Critical ConcernsNone - all previously identified SQL injection vulnerabilities have been resolved with proper input validation. Final Recommendation
Strong approval recommended - this PR demonstrates best practices for major refactoring: maintain backward compatibility, comprehensive testing, security-conscious input validation, and clean architectural separation. |
| export function testFilterSQL(column = 'n.file', enabled = true) { | ||
| if (!enabled) return ''; | ||
| return `AND ${column} NOT LIKE '%.test.%' | ||
| AND ${column} NOT LIKE '%.spec.%' | ||
| AND ${column} NOT LIKE '%__test__%' | ||
| AND ${column} NOT LIKE '%__tests__%' | ||
| AND ${column} NOT LIKE '%.stories.%'`; | ||
| } |
There was a problem hiding this comment.
column parameter unsanitized — same injection pattern fixed in fanInJoinSQL/fanOutJoinSQL
The previous review noted that edgeKind and alias in the join helpers were interpolated without validation, and that was fixed. The column parameter in testFilterSQL has exactly the same issue: it is interpolated directly into the returned SQL fragment with no allowlist or regex guard.
Because testFilterSQL is re-exported from src/db.js and src/index.js as part of the public programmatic API, an external caller can pass a user-controlled string. For example:
testFilterSQL("n.file' OR '1'='1 --")
// → AND n.file' OR '1'='1 -- NOT LIKE '%.test.%' ...This effectively drops all NOT LIKE guards and can short-circuit unrelated WHERE clauses depending on placement.
Consider adding a column validator similar to validateAlias:
| export function testFilterSQL(column = 'n.file', enabled = true) { | |
| if (!enabled) return ''; | |
| return `AND ${column} NOT LIKE '%.test.%' | |
| AND ${column} NOT LIKE '%.spec.%' | |
| AND ${column} NOT LIKE '%__test__%' | |
| AND ${column} NOT LIKE '%__tests__%' | |
| AND ${column} NOT LIKE '%.stories.%'`; | |
| } | |
| const SAFE_COLUMN_RE = /^[a-z_][a-z0-9_]*(?:\.[a-z_][a-z0-9_]*)?$/i; | |
| function validateColumn(column) { | |
| if (!SAFE_COLUMN_RE.test(column)) { | |
| throw new Error(`Invalid SQL column reference: ${column}`); | |
| } | |
| } | |
| export function testFilterSQL(column = 'n.file', enabled = true) { | |
| if (!enabled) return ''; | |
| validateColumn(column); | |
| return `AND ${column} NOT LIKE '%.test.%' | |
| AND ${column} NOT LIKE '%.spec.%' | |
| AND ${column} NOT LIKE '%__test__%' | |
| AND ${column} NOT LIKE '%__tests__%' | |
| AND ${column} NOT LIKE '%.stories.%'`; | |
| } |
There was a problem hiding this comment.
Fixed: added \ using regex \ (allows dotted column refs like ). It's called at the top of \ before any interpolation, throwing on invalid input. Added tests confirming malicious column names like \ and \ are rejected.
There was a problem hiding this comment.
Fixed in f744150: added validateColumn() with regex that allows dotted column refs (e.g. n.file) but rejects anything with spaces, semicolons, or SQL keywords. Called at the top of testFilterSQL before interpolation. Tests added for malicious inputs.
Impact: 2 functions changed, 1 affected
|
Claude finished @carlos-alm's task —— View job PR Review: Repository Pattern RefactoringTodo Checklist:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Legitimate code duplication elimination in SQL patterns - specifically WHERE clause conditions (test file exclusions, kind filtering) and JOIN patterns (fan-in/fan-out edge counts). The monolithic Why This Approach: Clean architecture separation with backward compatibility via barrel exports. The refactoring addresses real technical debt by:
Risk Assessment: Low risk - this is genuine refactoring, not symptom hiding. The barrel export strategy preserves all existing imports with no breaking changes. Phase 2: Configuration Change Detection ✅No problematic configuration changes detected:
The changes are pure code refactoring without any rule relaxation to hide underlying issues. Phase 3: Test Integrity ✅New tests are comprehensive and well-structured:
Phase 4: Code Quality Review ✅Strengths:
No significant issues remaining: All previously identified concerns have been addressed. Root Cause AnalysisProblem Being Solved: Legitimate code duplication elimination and architectural cleanup - addresses real technical debt without hiding symptoms. Critical ConcernsNone - all previously identified SQL injection vulnerabilities have been resolved with proper input validation. Final Recommendation
Strong approval recommended - this PR demonstrates best practices for major refactoring: maintain backward compatibility, comprehensive testing, security-conscious input validation, and clean architectural separation. |
|
Claude finished @carlos-alm's task —— View job PR Review: Repository Pattern RefactoringTodo Checklist:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Legitimate code duplication elimination in SQL patterns - specifically WHERE clause conditions (test file exclusions, kind filtering) and JOIN patterns (fan-in/fan-out edge counts). The monolithic Why This Approach: Clean architecture separation with backward compatibility via barrel exports. The refactoring addresses real technical debt by:
Risk Assessment: Low risk - this is genuine refactoring, not symptom hiding. The barrel export strategy preserves all existing imports with no breaking changes. Phase 2: Configuration Change Detection ✅No problematic configuration changes detected:
The changes are pure code refactoring without any rule relaxation to hide underlying issues. Phase 3: Test Integrity ✅New tests are comprehensive and well-structured:
Phase 4: Code Quality Review ✅Strengths:
No significant issues remaining: All previously identified concerns have been addressed. Root Cause AnalysisProblem Being Solved: Legitimate code duplication elimination and architectural cleanup - addresses real technical debt without hiding symptoms. Critical ConcernsNone - all previously identified SQL injection vulnerabilities have been resolved with proper input validation. Final Recommendation
Strong approval recommended - this PR demonstrates best practices for major refactoring: maintain backward compatibility, comprehensive testing, security-conscious input validation, and clean architectural separation. |
| orderBy(clause) { | ||
| this.#orderByClause = clause; | ||
| return this; | ||
| } |
There was a problem hiding this comment.
orderBy() and select() methods are unvalidated SQL injection paths.
The PR hardened standalone helpers like testFilterSQL, fanInJoinSQL, and fanOutJoinSQL with input validation. However, the orderBy() and select() methods on NodeQuery interpolate their arguments directly into the SQL string without any guard:
const sql = `SELECT ${this.#selectCols}\n FROM nodes n${joins}${where}${orderBy}${limitClause}`;Since NodeQuery is exported from src/db.js and src/index.js as part of the public programmatic API, an external consumer can pass user-controlled strings:
new NodeQuery()
.orderBy("n.file; DROP TABLE nodes; --")
.all(db);
// → ORDER BY n.file; DROP TABLE nodes; --Consider adding a validateOrderBy() guard (similar to the validateColumn pattern already in place) for orderBy() at minimum, or prominently document these methods as unsafe escape hatches in the JSDoc.
There was a problem hiding this comment.
Fixed in 131cde6. Both orderBy() and select() now validate their inputs:
orderBy()splits on commas and checks each term matches/^[a-z_][a-z0-9_]*(\.[a-z_][a-z0-9_]*)?\s*(asc|desc)?$/iselect()uses a parenthesis-aware comma splitter (splitTopLevelCommas) and validates each token matches safe column refs,table.*, orFUNC(...) AS aliaspatterns
Both throw Error on invalid input. Tests added for injection attempts and valid COALESCE expressions.
src/db/query-builder.js
Outdated
| @@ -0,0 +1,243 @@ | |||
| import { EVERY_EDGE_KIND } from '../queries.js'; | |||
There was a problem hiding this comment.
Circular module dependency introduced.
query-builder.js imports EVERY_EDGE_KIND from ../queries.js, and repository.js imports EVERY_SYMBOL_KIND from ../queries.js. Both modules are re-exported from db.js, which is itself imported by queries.js:
queries.js → db.js → db/query-builder.js → queries.js (circular)
queries.js → db.js → db/repository.js → queries.js (circular)
ESM live bindings make this work today because constants are only accessed inside function bodies (not at module-init time). However:
- The dependency direction is inverted — the DB layer should not depend on the query layer.
- It is fragile: if either constant moves to module-level initialization, the circular reference will fail silently.
- It will surprise future contributors debugging module-load ordering.
The clean fix is to extract the kind/edge constants into a dedicated file (e.g., src/kinds.js) and import from there in both queries.js and the db/ modules, breaking the cycle entirely.
There was a problem hiding this comment.
Fixed in 131cde6. Extracted all kind/edge constants to a new src/kinds.js module:
query-builder.jsnow importsEVERY_EDGE_KINDfrom../kinds.jsrepository.jsnow importsEVERY_SYMBOL_KINDfrom../kinds.jsqueries.jsre-exports all constants from./kinds.js(no breaking change for external consumers)
This breaks the cycle: db/ modules → kinds.js ← queries.js, with no circular path remaining.
Extract kind/edge constants to src/kinds.js so db/ modules no longer import from queries.js (which itself imports from db.js), breaking the circular module dependency. Add input validation to NodeQuery.orderBy() and .select() methods to prevent SQL injection through unvalidated string interpolation. Impact: 6 functions changed, 3 affected
|
Claude finished @carlos-alm's task —— View job PR Review: Repository Pattern RefactoringTodo Checklist:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Legitimate code duplication elimination in SQL patterns - specifically WHERE clause conditions (test file exclusions, kind filtering) and JOIN patterns (fan-in/fan-out edge counts). The monolithic Why This Approach: Clean architecture separation with backward compatibility via barrel exports. The refactoring addresses real technical debt by:
Risk Assessment: Low risk - this is genuine refactoring, not symptom hiding. The barrel export strategy preserves all existing imports with no breaking changes. Phase 2: Configuration Change Detection ✅No problematic configuration changes detected:
The changes are pure code refactoring without any rule relaxation to hide underlying issues. Phase 3: Test Integrity ✅New tests are comprehensive and well-structured:
Phase 4: Code Quality Review ✅Strengths:
Remaining Minor Issue:
Greptile Concerns Assessment✅ Circular module dependency: RESOLVED - Root Cause AnalysisProblem Being Solved: Legitimate code duplication elimination and architectural cleanup - addresses real technical debt without hiding symptoms. Critical ConcernsNone - All major SQL injection vulnerabilities and circular dependencies have been resolved. The remaining error message issue is minor UX concern. Final Recommendation
Strong approval recommended - this PR demonstrates best practices for major refactoring: maintain backward compatibility, comprehensive testing, security-conscious input validation, and clean architectural separation. |
|
@greptileai review |
src/db/query-builder.js
Outdated
| join(sql) { | ||
| this.#joins.push(sql); | ||
| return this; | ||
| } |
There was a problem hiding this comment.
join() is an unvalidated raw SQL injection path in the public API
The previous review rounds hardened orderBy() and select() with input validation, but join(sql) still accepts completely arbitrary SQL with no guard. Since NodeQuery is re-exported from src/db.js and src/index.js as part of the public programmatic API, an external caller can construct arbitrary JOIN clauses:
new NodeQuery()
.join("JOIN (SELECT 1) x ON 1=1 UNION SELECT * FROM sqlite_master --")
.all(db);The four named helpers (withFanIn, withFanOut, withComplexity, withChurn) all call this.#joins.push(hardcoded_sql) directly, so they don't need join() to be public. The raw escape hatch is only needed internally.
Consider either (a) making join() private / #join() and removing it from the public interface, or (b) adding a validateJoinSQL guard similar to validateOrderBy that allows only safe patterns (e.g., LEFT JOIN table alias ON alias.col = n.col).
There was a problem hiding this comment.
Fixed in 9c9ab51. Renamed join() to _join() — it's now a conventionally-private method not part of the public API. All internal callers (withFanIn, withFanOut, withComplexity, withChurn) use _join() internally. Added a test confirming .join is undefined on NodeQuery instances.
| export function findNodesForTriage(db, opts = {}) { | ||
| if (opts.kind && !EVERY_SYMBOL_KIND.includes(opts.kind)) { | ||
| throw new Error(`Invalid kind: ${opts.kind} (expected one of ${EVERY_SYMBOL_KIND.join(', ')})`); | ||
| } | ||
|
|
||
| const kindsToUse = opts.kind ? [opts.kind] : ['function', 'method', 'class']; | ||
| const q = new NodeQuery() | ||
| .select( | ||
| `n.id, n.name, n.kind, n.file, n.line, n.end_line, n.role, | ||
| COALESCE(fi.cnt, 0) AS fan_in, | ||
| COALESCE(fc.cognitive, 0) AS cognitive, | ||
| COALESCE(fc.maintainability_index, 0) AS mi, | ||
| COALESCE(fc.cyclomatic, 0) AS cyclomatic, | ||
| COALESCE(fc.max_nesting, 0) AS max_nesting, | ||
| COALESCE(fcc.commit_count, 0) AS churn`, | ||
| ) | ||
| .kinds(kindsToUse) | ||
| .withFanIn() | ||
| .withComplexity() | ||
| .withChurn() | ||
| .excludeTests(opts.noTests) | ||
| .fileFilter(opts.file) | ||
| .roleFilter(opts.role) | ||
| .orderBy('n.file, n.line'); | ||
|
|
||
| return q.all(db); |
There was a problem hiding this comment.
opts.role silently returns zero rows for invalid values — inconsistent with opts.kind
opts.kind is validated against EVERY_SYMBOL_KIND and throws an error if invalid (line 40). opts.role is passed through to .roleFilter() with no validation: an invalid value like role: 'supervisor' produces WHERE n.role = 'supervisor' which silently returns 0 rows rather than throwing. From the caller's perspective this is indistinguishable from "no results match" vs "the filter argument was wrong".
Since VALID_ROLES is already defined in kinds.js, validation is straightforward:
if (opts.role && !VALID_ROLES.includes(opts.role)) {
throw new Error(`Invalid role: ${opts.role} (expected one of ${VALID_ROLES.join(', ')})`);
}This mirrors the existing opts.kind guard and keeps the two option checks consistent.
There was a problem hiding this comment.
Fixed in 9c9ab51. Added opts.role validation against VALID_ROLES (from kinds.js) in findNodesForTriage, mirroring the existing opts.kind guard. Invalid roles now throw with a clear error message. Test added.
| function _functionNodeQuery(opts = {}) { | ||
| return new NodeQuery() | ||
| .select('name, kind, file, line, end_line, role') | ||
| .kinds(['function', 'method', 'class']) | ||
| .fileFilter(opts.file) | ||
| .nameLike(opts.pattern) | ||
| .orderBy('file, line'); | ||
| } |
There was a problem hiding this comment.
_functionNodeQuery doesn't accept noTests — inconsistency with findNodesForTriage
findNodesForTriage pushes test-file exclusion down to the SQL layer via .excludeTests(opts.noTests), so only qualifying rows are ever fetched. _functionNodeQuery has no such option; every caller (listFunctionsData, iterListFunctions, findMatchingNodes) fetches all rows including test-file nodes and then discards them in application memory.
For large codebases this means an unbounded number of test-file rows are fetched, deserialized, and thrown away on every call. Since the excludeTests builder method already exists, a one-liner fix keeps things consistent and efficient:
function _functionNodeQuery(opts = {}) {
return new NodeQuery()
.select('name, kind, file, line, end_line, role')
.kinds(['function', 'method', 'class'])
.fileFilter(opts.file)
.nameLike(opts.pattern)
.excludeTests(opts.noTests)
.orderBy('file, line');
}The callers in queries.js could then drop their post-fetch isTestFile filters.
There was a problem hiding this comment.
Fixed in 9c9ab51. Added .excludeTests(opts.noTests) to _functionNodeQuery, so listFunctionNodes and iterateFunctionNodes filter test files at the SQL layer. Tests added for both functions confirming test-file exclusion works.
…s in _functionNodeQuery - Rename NodeQuery.join() to _join() to remove raw SQL injection from public API - Add opts.role validation against VALID_ROLES in findNodesForTriage (mirrors opts.kind) - Add excludeTests(opts.noTests) to _functionNodeQuery so listFunctionNodes/iterateFunctionNodes filter test files at the SQL layer instead of post-fetch - Tests added for all three changes Impact: 8 functions changed, 2 affected
|
Claude finished @carlos-alm's task —— View job PR Review: Repository Pattern RefactoringTodo Checklist:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Legitimate code duplication elimination in SQL patterns - specifically WHERE clause conditions (test file exclusions, kind filtering) and JOIN patterns (fan-in/fan-out edge counts). The monolithic Why This Approach: Clean architecture separation with backward compatibility via barrel exports. The refactoring addresses real technical debt by:
Risk Assessment: Low risk - this is genuine refactoring, not symptom hiding. The barrel export strategy preserves all existing imports with no breaking changes. Phase 2: Configuration Change Detection ✅No problematic configuration changes detected:
The changes are pure code refactoring without any rule relaxation to hide underlying issues. Phase 3: Test Integrity ✅New tests are comprehensive and well-structured:
Phase 4: Code Quality Review ✅Strengths:
Remaining Minor Issue:
Root Cause AnalysisProblem Being Solved: Legitimate code duplication elimination and architectural cleanup - addresses real technical debt without hiding symptoms. Critical ConcernsNone - all previously identified SQL injection vulnerabilities and circular dependencies have been resolved. The remaining error message issue is minor UX concern. Final Recommendation
Strong approval recommended - this PR demonstrates best practices for major refactoring: maintain backward compatibility, comprehensive testing, security-conscious input validation, and clean architectural separation. |
|
@greptileai review |
|
Claude finished @carlos-alm's task —— View job PR Review: Repository Pattern Refactoring
|
|
Claude finished @carlos-alm's task —— View job PR Review: Repository Pattern RefactoringTodo Checklist:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Legitimate code duplication elimination in SQL patterns - specifically WHERE clause conditions (test file exclusions, kind filtering) and JOIN patterns (fan-in/fan-out edge counts). The monolithic Why This Approach: Clean architecture separation with backward compatibility via barrel exports. The refactoring addresses real technical debt by:
Risk Assessment: Low risk - this is genuine refactoring, not symptom hiding. The barrel export strategy preserves all existing imports with no breaking changes. Phase 2: Configuration Change Detection ✅No problematic configuration changes detected:
The changes are pure code refactoring without any rule relaxation to hide underlying issues. Phase 3: Test Integrity ✅New tests are comprehensive and well-structured:
Phase 4: Code Quality Review ✅Strengths:
Remaining Minor Issue:
Root Cause AnalysisProblem Being Solved: Legitimate code duplication elimination and architectural cleanup - addresses real technical debt without hiding symptoms. Critical ConcernsNone - all previously identified SQL injection vulnerabilities and circular dependencies have been resolved. The remaining error message issue is minor UX concern. Final Recommendation
Strong approval recommended - this PR demonstrates best practices for major refactoring: maintain backward compatibility, comprehensive testing, security-conscious input validation, and clean architectural separation. |

Summary
src/db.jsintosrc/db/module directory (connection.js,migrations.js) withdb.jskept as a barrel re-export — zero import churn across 55 existing import sitesNodeQueryfluent builder and standalone helpers (testFilterSQL,kindInClause,fanInJoinSQL,fanOutJoinSQL) insrc/db/query-builder.jsto eliminate duplicated WHERE clause and JOIN patternsfindNodesForTriage,listFunctionNodes,iterateFunctionNodes,findNodesWithFanIn,countNodes/Edges/Files) insrc/db/repository.jstriage.js,queries.js(4 functions), andstructure.js— net reduction of ~100 lines of duplicated SQL patternsTest plan
rolesDatawarning in queries.test.js)codegraph triage -T,codegraph list -T,codegraph map,codegraph stats -T