feat: node role classification + dead code detection#91
Conversation
buildStructure() clears all contains edges and directory nodes before rebuilding, but during incremental builds it only received the changed files — causing all unchanged files to lose their directory containment edges and metrics. This made `codegraph structure` show "0 files" for most directories after any incremental build. Fix: before calling buildStructure, load all existing file nodes and their symbols from the DB so the complete file set is available for structure rebuild. Closes #89 Impact: 1 functions changed, 0 affected
|
Claude finished @carlos-alm's task —— View job Reviewing PR #91: Incremental Build Structure FixTodo List:
Root Cause AnalysisProblem Being Solved: During incremental builds, Why This Approach: The fix loads all existing file nodes from the database during incremental builds, ensuring Risk Assessment: ✅ LOW RISK - This is a legitimate fix addressing a real functional bug, not masking symptoms. Critical ConcernsNone found. This is a clean, targeted fix. Code Quality Assessment✅ Strengths
✅ Technical Correctness
✅ No Risky Changes
✅ Test Coverage
🔍 Minor Observations
Final AssessmentThis is a well-executed bug fix that properly addresses the root cause of structure data loss during incremental builds. The implementation is clean, efficient, and follows the codebase's established patterns. Final Recommendation
Recommended Actions: None. The PR is ready to merge. |
Greptile SummaryThis PR fixes a critical bug where incremental builds corrupted directory structure data. The issue occurred because Key Changes
Performance ImpactThe fix adds ~34ms to 1-file incremental rebuilds (+27%) due to loading 98 unchanged files' metadata from the DB. This is an acceptable trade-off for correct structure data, as the overhead is O(n) DB reads which will be dominated by parse time at scale. Documentation Updates
Confidence Score: 5/5
Important Files Changed
Last reviewed commit: 7b538bc |
src/builder.js
Outdated
| `SELECT DISTINCT n2.file AS source FROM edges e | ||
| JOIN nodes n1 ON e.source_id = n1.id | ||
| JOIN nodes n2 ON e.target_id = n2.id | ||
| WHERE n1.file = ? AND e.kind = 'imports'`, |
There was a problem hiding this comment.
Query returns files that the current file imports FROM (targets), but stores them as if they were import sources. The column is aliased as source but n2.file is the target of the import edge (the file being imported), not the source.
Expected structure based on buildStructure usage (line 129 in structure.js): imports: [{ source: './b.js', names: [] }]
This query returns target files instead of reconstructing the import structure. Should either:
- Return the import source paths as strings (matching parser output format), or
- Query should be
n1.name AS sourceif storing the imported symbol names
| `SELECT DISTINCT n2.file AS source FROM edges e | |
| JOIN nodes n1 ON e.source_id = n1.id | |
| JOIN nodes n2 ON e.target_id = n2.id | |
| WHERE n1.file = ? AND e.kind = 'imports'`, | |
| const importsByFile = db.prepare( | |
| `SELECT DISTINCT e.*, n1.name AS imported_name, n2.file AS imported_from | |
| FROM edges e | |
| JOIN nodes n1 ON e.target_id = n1.id | |
| JOIN nodes n2 ON e.target_id = n2.id | |
| WHERE e.source_id IN (SELECT id FROM nodes WHERE file = ?) AND e.kind = 'imports'`, | |
| ); |
However, buildStructure only uses imports.length (line 129 in structure.js), so the current approach works for metrics calculation but doesn't preserve the actual import structure.
There was a problem hiding this comment.
Addressed Greptile feedback in 7b538bc:
-
Replaced
importsByFilequery withimportCountByFileCOUNT query —buildStructureonly usesimports.lengthfor theimport_countmetric (line 129 of structure.js), so fetching full rows was wasteful. Now usesSELECT COUNT(DISTINCT n2.file)and createsimports: new Array(importCount)for the correct.length. -
Cached line counts from
node_metricstable instead of reading every unchanged file from disk — eliminates ~98 synchronousreadFileSynccalls during incremental rebuilds on this repo. Falls back to disk read if no cached value exists.
|
Claude finished @carlos-alm's task —— View job Reviewing PR #91: Incremental Build Structure FixTodo List:
Root Cause AnalysisProblem Being Solved: During incremental builds, Why This Approach: The fix loads all existing file nodes from the database during incremental builds, ensuring Risk Assessment: ✅ LOW RISK - This is a legitimate fix addressing a real functional bug, not masking symptoms. Critical ConcernsNone found. This is a clean, targeted fix. Code Quality Assessment✅ Strengths
✅ Technical Correctness
✅ No Risky Changes
✅ Test Coverage
🔍 Minor Observations
Final AssessmentThis is a well-executed bug fix that properly addresses the root cause of structure data loss during incremental builds. The implementation is clean, efficient, and follows the codebase's established patterns. Final Recommendation
Recommended Actions: None. The PR is ready to merge. |
|
Claude finished @carlos-alm's task —— View job Reviewing PR #91: Incremental Build Structure FixTodo List:
Root Cause AnalysisProblem Being Solved: During incremental builds, Why This Approach: The fix loads all existing file nodes from the database during incremental builds, ensuring Risk Assessment: ✅ LOW RISK - This is a legitimate fix addressing a real functional bug, not masking symptoms. Critical ConcernsNone found. This is a clean, targeted fix. Code Quality Assessment✅ Strengths
✅ Technical Correctness
✅ No Risky Changes
✅ Test Coverage
🔍 Minor Observations
Final AssessmentThis is a well-executed bug fix that properly addresses the root cause of structure data loss during incremental builds. The implementation is clean, efficient, and follows the codebase's established patterns. Final Recommendation
Recommended Actions: None. The PR is ready to merge. |
Add Phase 4b requiring all four benchmark scripts to be run during dogfooding sessions. Also update Phase 7c to require before/after benchmark comparisons when bug fixes touch benchmarked code paths.
|
Claude finished @carlos-alm's task —— View job Reviewing PR #91: Incremental Build Structure FixTodo List:
Root Cause AnalysisProblem Being Solved: During incremental builds, Why This Approach: The fix loads all existing file nodes from the database during incremental builds, ensuring Risk Assessment: ✅ LOW RISK - This is a legitimate fix addressing a real functional bug, not masking symptoms. Critical ConcernsNone found. This is a clean, targeted fix. Code Quality Assessment✅ Strengths
✅ Technical Correctness
✅ No Risky Changes
✅ Test Coverage
🔍 Minor Observations
Final AssessmentThis is a well-executed bug fix that properly addresses the root cause of structure data loss during incremental builds. The implementation is clean, efficient, and follows the codebase's established patterns. Final Recommendation
Recommended Actions: None. The PR is ready to merge. |
…build Address Greptile review feedback on PR #91: - Replace misleading importsByFile query with importCountByFile COUNT query, since buildStructure only uses imports.length for metrics - Cache line counts from node_metrics table instead of reading every unchanged file from disk (falls back to disk if not cached) Impact: 1 functions changed, 1 affected
|
Claude finished @carlos-alm's task —— View job Reviewing PR #91: Incremental Build Structure FixTodo List:
Root Cause AnalysisProblem Being Solved: During incremental builds, Why This Approach: The fix loads all existing file nodes from the database during incremental builds, ensuring Risk Assessment: ✅ LOW RISK - This is a legitimate fix addressing a real functional bug, not masking symptoms. Critical ConcernsNone found. This is a clean, targeted fix. Code Quality Assessment✅ Strengths
✅ Technical Correctness
✅ No Risky Changes
✅ Test Coverage
🔍 Minor Observations
Final AssessmentThis is a well-executed bug fix that properly addresses the root cause of structure data loss during incremental builds. The implementation is clean, efficient, and follows the codebase's established patterns. Final Recommendation
Recommended Actions: None. The PR is ready to merge. |
…eaf) Auto-classify every symbol based on fan-in/fan-out connectivity patterns using adaptive median thresholds. Roles are computed during graph build and stored in the DB (migration v5). - classifyNodeRoles() in structure.js with median-based thresholds - New `roles` CLI command with --role and --file filters - New `node_roles` MCP tool (18 tools total) - Role field surfaced in where/explain/context/stats/list-functions - Dead code detection via `roles --role dead` - Unit + integration tests for classification and queries - Updated README, BACKLOG, and COMPETITIVE_ANALYSIS docs Impact: 19 functions changed, 13 affected
Impact: 2 functions changed, 1 affected
…al-structure Impact: 3 functions changed, 4 affected
|
Claude finished @carlos-alm's task —— View job Reviewing PR #91: Node Role Classification + Dead Code DetectionTodo List:
Phase 1: Root Cause Analysis ✅Problem Being Solved: This PR adds node role classification and dead code detection functionality to codegraph. The implementation automatically categorizes every symbol based on connectivity patterns using adaptive median thresholds. Why This Approach: The classification system uses fan-in/fan-out patterns with adaptive median thresholds, which is a well-established approach in code analysis. The implementation is delivered as a byproduct of graph analysis rather than requiring separate tooling. Risk Assessment: ✅ LOW RISK - This is a new feature addition that doesn't modify existing functionality. The implementation is additive and includes comprehensive testing. Phase 2: Configuration Change Detection ✅No Risky Changes Detected:
Phase 3: Test Integrity ✅Strong Test Coverage:
No Test Integrity Issues Found:
Phase 4: Code Quality Assessment ✅✅ Architecture & Implementation Quality
✅ Technical Correctness
✅ Feature Completeness
🔍 Minor Observations
Final AssessmentThis is a well-architected feature addition that implements node role classification using sound software engineering principles. The implementation follows established academic patterns for code analysis, includes comprehensive testing, and integrates cleanly with the existing codebase architecture. Key Strengths:
The feature delivers significant practical value for code analysis and maintenance workflows while maintaining the tool's performance and reliability characteristics. Final Recommendation
|
Summary
entry/core/utility/adapter/dead/leafbased on fan-in/fan-out connectivity patterns with adaptive median thresholdscodegraph roles --role dead -Tsurfaces unreferenced non-exported symbols — delivered as a byproduct of classificationrolesCLI command with--role,--file, and--no-testsfiltersnode_rolesMCP tool (18 tools total)where,explain,context,stats,list-functionsnow show[role]tagsrolecolumn + index tonodestableAlso includes prior commits on this branch:
fix(builder): preserve structure data during incremental buildsperf(builder): avoid disk reads for line counts during incremental rebuildTest plan
tests/unit/roles.test.js— classification logic, median thresholds, idempotency, empty graph, file/directory nodes stay nulltests/integration/roles.test.js— rolesData filters, statsData role distribution, whereData/explainData/listFunctionsData include role fieldtests/unit/mcp.test.js— updated withnode_rolestool androlesDatamockcodegraph build .succeeds with roles computedcodegraph roles -Tshows role distribution