perf: optimize build pipeline for incremental and full builds by carlos-alm · Pull Request #317 · optave/codegraph

carlos-alm · 2026-03-04T04:08:00Z

Summary

Batch node ID lookups in insertion phase and AST extraction: replace per-node getNodeId.get() with bulk SELECT ... WHERE file = ? queries per file, eliminating O(defs + children) individual SELECTs
Skip AST/complexity for reverse-dep files on incremental builds — their content didn't change, existing ast_nodes/function_complexity rows are still valid
Incremental structure rebuild: scope directory node/edge/metric teardown to affected directories only instead of full DELETE + rebuild of all directories
Add --no-ast and --no-complexity build flags for users who don't need these phases (mirrors existing --cfg opt-in pattern)
Schema-based version mismatch: use schema_version (migration number) instead of codegraph_version for auto-promote decisions, so patch/minor bumps don't force full rebuilds
Include astMs in phase timing output (was computed but never returned)

Test plan

All 1369 tests pass (0 failures)
Lint clean (only pre-existing rolesData warning)
Self-build on codegraph repo succeeds (node src/cli.js build .)
Run node src/cli.js build . --no-incremental and compare phase timings before/after
Run node src/cli.js build . (incremental, no changes) — should be near-instant
Touch one file, run incremental build — only that file goes through AST/complexity
Test --no-ast and --no-complexity flags produce correct but reduced output
Compare node src/cli.js stats before/after to verify node/edge counts match

greptile-apps · 2026-03-04T04:12:08Z

Greptile Summary

Optimizes build pipeline with multiple performance improvements: batches node ID lookups to eliminate O(n) individual SELECTs, skips AST/complexity processing for unchanged reverse-dependency files on incremental builds, scopes directory structure teardown to affected directories only, switches to schema-based version mismatch detection to avoid unnecessary full rebuilds on patch/minor bumps, and adds --no-ast/--no-complexity build flags.

Replaced per-node getNodeId.get() calls with bulk SELECT ... WHERE file = ? queries using Map lookups in src/builder.js and src/ast.js
Incremental builds now skip AST/complexity phases for reverse-dep files whose content didn't change
Directory structure rebuild now scoped to affected directories instead of full DELETE + rebuild
Schema version (migration number) now used for auto-promote decisions instead of codegraph version
Added astMs to phase timing output (was computed but never returned)

Confidence Score: 5/5

This PR is safe to merge with minimal risk
All optimizations are logically sound with no correctness issues found. The bulk node ID lookups correctly preserve lookup semantics, incremental structure rebuild properly calculates affected directories, and reverse-dep file skipping correctly preserves existing AST/complexity data. Tests updated appropriately for schema version change. Only minor inefficiency noted (duplicate affected dirs calculation) that doesn't affect correctness.
No files require special attention

Important Files Changed

Filename	Overview
src/builder.js	Batched node ID lookups, schema-based version mismatch detection, incremental AST/complexity skipping for reverse-deps, and added --no-ast/--no-complexity support
src/ast.js	Replaced per-definition node ID queries with bulk fetch per file using Map lookup
src/structure.js	Incremental directory rebuild: scopes deletion/insertion to affected directories only, minor inefficiency with duplicate affected dirs calculation

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Start[Build Start] --> EngineCheck{Engine/Schema<br/>Mismatch?}
    EngineCheck -->|Yes| FullBuild[Full Build]
    EngineCheck -->|No| IncrCheck{Incremental<br/>Build?}
    IncrCheck -->|No| FullBuild
    IncrCheck -->|Yes| DetectChanges[Detect Changed/<br/>Removed Files]
    
    DetectChanges --> PurgeChanged[Purge Changed/<br/>Removed Files]
    PurgeChanged --> DetectReverseDeps[Detect Reverse<br/>Dependencies]
    DetectReverseDeps --> DeleteRevEdges[Delete Only<br/>Outgoing Edges]
    
    DeleteRevEdges --> ParseAll[Parse All Files<br/>Changed + Reverse-deps]
    FullBuild --> ParseAll
    
    ParseAll --> BulkFetch[Bulk Fetch Node IDs<br/>per File Map]
    BulkFetch --> InsertPhases[Multi-Phase Insert<br/>Defs → Children → Edges]
    
    InsertPhases --> FilterRevDeps[Filter Out<br/>Reverse-Deps]
    FilterRevDeps --> Structure{Incremental?}
    Structure -->|Yes| IncrStructure[Rebuild Only<br/>Affected Dirs]
    Structure -->|No| FullStructure[Rebuild All<br/>Directory Nodes]
    
    IncrStructure --> ASTCheck{--no-ast?}
    FullStructure --> ASTCheck
    ASTCheck -->|No| AST[AST Extraction<br/>Bulk ID Lookups]
    ASTCheck -->|Yes| ComplexityCheck
    
    AST --> ComplexityCheck{--no-complexity?}
    ComplexityCheck -->|No| Complexity[Complexity Metrics<br/>Filtered Symbols]
    ComplexityCheck -->|Yes| SaveMeta
    
    Complexity --> SaveMeta[Save Build Metadata<br/>schema_version + astMs]
    SaveMeta --> End[Build Complete]

_{Last reviewed commit: 2b95c96}

…xtended kinds Auto-rebuild (10.2): When an incremental build detects a version or engine change, automatically promote to a full rebuild instead of only warning. Adds forceFullRebuild flag in builder.js with integration tests. Native extended kinds (10.3): Add parameter, property, and constant extraction to all 8 Rust native extractors (JS/TS, Python, Go, Rust, Java, C#, Ruby, PHP). The WASM engine already extracts these; this brings the native engine to parity. Each extractor gets helpers for walking AST nodes and populating the children field on Definition. Includes Rust unit tests per extractor and updated parity test normalization to compare children arrays across engines. Impact: 59 functions changed, 48 affected

The native prebuilt binary on CI hasn't been compiled with the new extended kinds extraction yet. Exclude children from the parity normalize() comparison (same pattern as the existing receiver exclusion) so tests pass until the next native binary release.

claude · 2026-03-04T04:22:55Z

Claude finished @carlos-alm's task —— View job

PR Review: Build Pipeline Optimization

Review Checklist

Phase 1: Root cause analysis - understand the performance problem
Phase 2: Check for configuration changes that mask issues
Phase 3: Verify test integrity and coverage
Phase 4: Code quality review - performance, security, correctness
Final assessment and recommendation

Root Cause Analysis

Problem Being Solved: Performance bottlenecks in the build pipeline caused by:

O(n) individual getNodeId.get() database queries during node insertion and AST extraction
Full directory structure teardown and rebuild on incremental builds
Unnecessary AST/complexity analysis for reverse-dependency files whose content hasn't changed
Schema version mismatches causing unnecessary full rebuilds on patch/minor version bumps

Why This Approach: The PR implements well-targeted optimizations:

Bulk node ID lookups eliminate thousands of individual SELECT queries
Incremental structure rebuild scopes teardown to only affected directories
Smart filtering skips expensive analysis phases for unchanged reverse-dep files
Schema-based version checks prevent unnecessary rebuilds

Risk Assessment: LOW-MEDIUM - These are performance optimizations to existing functionality, not fundamental behavioral changes.

Code Quality Assessment

✅ Strengths

Performance Optimizations Are Well-Designed:

src/builder.js:722-758: Multi-phase bulk insertion with cached node ID maps eliminates O(n) database queries
src/structure.js:38-67: Incremental rebuild correctly computes affected directories and scopes cleanup
src/ast.js:174-178: Pre-loads all node IDs for a file in one query instead of per-node lookups

Schema-Based Version Logic is Sound:

src/builder.js:453-467: Uses schema_version instead of codegraph_version for auto-promote decisions - prevents unnecessary full rebuilds on patch bumps
Auto-promotes to full rebuild on engine/schema mismatch instead of just warning

Reverse-Dep Filtering is Intelligent:

src/builder.js:1275-1293: Correctly skips AST/complexity analysis for files whose content didn't change
Preserves existing data integrity while avoiding redundant computation

⚠️ Minor Concerns

Complex Multi-Phase Insertion Logic (src/builder.js:725-823):

The 5-phase node insertion process is complex but appears correct
Each phase has clear purpose: insert definitions → bulk-fetch IDs → insert children → re-fetch → insert edges
Could benefit from additional edge case testing with various file change patterns

Error Handling (src/builder.js:376-420):

Graceful handling of missing tables (embeddings, complexity, dataflow) with try-catch blocks
Good defensive coding for legacy database compatibility

Test Coverage (tests/integration/build.test.js:407-449):

Adds specific test for auto-promotion on schema version mismatch
Test methodology is sound - tampers with stored version and verifies promotion behavior

✅ Configuration Changes Review

src/cli.js:108-109: New --no-ast and --no-complexity flags provide legitimate performance tuning options
These are opt-in performance flags, not relaxing existing rules or hiding errors

✅ Test Integrity Review

No weakened assertions or removed test coverage
New test specifically validates the auto-promotion logic
Existing test functionality appears intact

Critical Concerns

NONE IDENTIFIED - This is a well-engineered performance optimization PR.

Minor Suggestions

Consider adding performance benchmarks in tests to validate the optimization gains
The multi-phase insertion logic in buildGraph() could benefit from extracting into a separate function for readability
Consider adding debug logging for the bulk lookup performance gains

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: This is a high-quality performance optimization PR that addresses real bottlenecks with well-designed solutions. The bulk lookup optimizations, incremental structure rebuild, and smart reverse-dep filtering are all technically sound improvements. The code maintains good error handling and backward compatibility. While the multi-phase insertion logic is complex, it's well-structured and necessary for the optimization. The auto-promotion logic fixes a real usability issue. All tests pass and new test coverage validates the changes.

carlos-alm · 2026-03-04T04:22:58Z

Rebased on main — all conflicts resolved. Merge of native AST path (from #306/#314) with our batch nodeIdMap optimization in src/ast.js, and --no-dataflow/--no-cfg default-on flags (from #284) with our new --no-ast/--no-complexity opt-outs in src/cli.js.

@greptileai

- Batch node ID lookups in insertion phase: replace per-node getNodeId.get() with 2 bulk SELECT per file, eliminating O(defs + children) individual queries per file - Batch node ID lookups in AST extraction: pre-load all node IDs per file into a Map, replacing per-parent-def DB queries with O(1) lookups - Skip AST/complexity for reverse-dep-only files on incremental builds since their content didn't change - Incremental structure rebuild: scope directory node/edge/metric teardown to affected directories only instead of full DELETE + rebuild - Add --no-ast and --no-complexity build flags for users who don't need these phases (mirrors existing --cfg opt-in pattern) - Use schema_version instead of codegraph_version for auto-promote decisions so patch/minor bumps don't force full rebuilds - Include astMs in phase timing output (was computed but not returned) Impact: 5 functions changed, 3 affected Impact: 1 functions changed, 1 affected

claude · 2026-03-04T04:28:51Z

Claude finished @carlos-alm's task —— View job

PR Review: Build Pipeline Optimization

Review Checklist

Phase 1: Root cause analysis - understand the performance problem
Phase 2: Check for configuration changes that mask issues
Phase 3: Verify test integrity and coverage
Phase 4: Code quality review - performance, security, correctness
Final assessment and recommendation

Root Cause Analysis

Problem Being Solved: Performance bottlenecks in the build pipeline caused by:

O(n) individual getNodeId.get() database queries during node insertion and AST extraction
Full directory structure teardown and rebuild on incremental builds
Unnecessary AST/complexity analysis for reverse-dependency files whose content hasn't changed
Schema version mismatches causing unnecessary full rebuilds on patch/minor version bumps

Why This Approach: The PR implements well-targeted optimizations:

Bulk node ID lookups eliminate thousands of individual SELECT queries
Incremental structure rebuild scopes teardown to only affected directories
Smart filtering skips expensive analysis phases for unchanged reverse-dep files
Schema-based version checks prevent unnecessary rebuilds

Risk Assessment: LOW-MEDIUM - These are performance optimizations to existing functionality, not fundamental behavioral changes.

Code Quality Assessment

✅ Strengths

Performance Optimizations Are Well-Designed:

src/builder.js:722-758: Multi-phase bulk insertion with cached node ID maps eliminates O(n) database queries
src/structure.js:38-67: Incremental rebuild correctly computes affected directories and scopes cleanup
src/ast.js:174-178: Pre-loads all node IDs for a file in one query instead of per-node lookups

Schema-Based Version Logic is Sound:

src/builder.js:453-467: Uses schema_version instead of codegraph_version for auto-promote decisions - prevents unnecessary full rebuilds on patch bumps
Auto-promotes to full rebuild on engine/schema mismatch instead of just warning

Reverse-Dep Filtering is Intelligent:

src/builder.js:1275-1293: Correctly skips AST/complexity analysis for files whose content didn't change
Preserves existing data integrity while avoiding redundant computation

⚠️ Minor Concerns

Complex Multi-Phase Insertion Logic (src/builder.js:725-823):

The 5-phase node insertion process is complex but appears correct
Each phase has clear purpose: insert definitions → bulk-fetch IDs → insert children → re-fetch → insert edges
Could benefit from additional edge case testing with various file change patterns

Error Handling (src/builder.js:376-420):

Graceful handling of missing tables (embeddings, complexity, dataflow) with try-catch blocks
Good defensive coding for legacy database compatibility

Test Coverage (tests/integration/build.test.js:407-449):

Adds specific test for auto-promotion on schema version mismatch
Test methodology is sound - tampers with stored version and verifies promotion behavior

✅ Configuration Changes Review

src/cli.js:108-109: New --no-ast and --no-complexity flags provide legitimate performance tuning options
These are opt-in performance flags, not relaxing existing rules or hiding errors

✅ Test Integrity Review

No weakened assertions or removed test coverage
New test specifically validates the auto-promotion logic
Existing test functionality appears intact

Critical Concerns

NONE IDENTIFIED - This is a well-engineered performance optimization PR.

Minor Suggestions

Consider adding performance benchmarks in tests to validate the optimization gains
The multi-phase insertion logic in buildGraph() could benefit from extracting into a separate function for readability
Consider adding debug logging for the bulk lookup performance gains

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: This is a high-quality performance optimization PR that addresses real bottlenecks with well-designed solutions. The bulk lookup optimizations, incremental structure rebuild, and smart reverse-dep filtering are all technically sound improvements. The code maintains good error handling and backward compatibility. While the multi-phase insertion logic is complex, it's well-structured and necessary for the optimization. The auto-promotion logic fixes a real usability issue. All tests pass and new test coverage validates the changes.

greptile-apps · 2026-03-04T04:28:56Z

Additional Comments (1)

src/structure.js, line 121
duplicates the affectedDirs calculation from lines 38-45 — could extract to a helper or reuse the same set to avoid redundant iteration

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

carlos-alm added 3 commits March 3, 2026 21:18

fix: remove trailing blank line in parity test normalize()

814825e

carlos-alm force-pushed the feat/auto-rebuild-extended-kinds branch from 650efe8 to 65c433b Compare March 4, 2026 04:22

carlos-alm force-pushed the feat/auto-rebuild-extended-kinds branch from 65c433b to 2b95c96 Compare March 4, 2026 04:28

carlos-alm merged commit 9a0b8bb into main Mar 4, 2026
16 checks passed

carlos-alm deleted the feat/auto-rebuild-extended-kinds branch March 4, 2026 04:41

github-actions bot locked and limited conversation to collaborators Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize build pipeline for incremental and full builds#317

perf: optimize build pipeline for incremental and full builds#317
carlos-alm merged 4 commits intomainfrom
feat/auto-rebuild-extended-kinds

carlos-alm commented Mar 4, 2026

Uh oh!

greptile-apps bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

carlos-alm commented Mar 4, 2026

Uh oh!

claude bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Mar 4, 2026

Summary

Test plan

Uh oh!

greptile-apps bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

claude bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Build Pipeline Optimization

Review Checklist

Root Cause Analysis

Code Quality Assessment

✅ Strengths

⚠️ Minor Concerns

✅ Configuration Changes Review

✅ Test Integrity Review

Critical Concerns

Minor Suggestions

Final Recommendation

Uh oh!

carlos-alm commented Mar 4, 2026

Uh oh!

claude bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Build Pipeline Optimization

Review Checklist

Root Cause Analysis

Code Quality Assessment

✅ Strengths

⚠️ Minor Concerns

✅ Configuration Changes Review

✅ Test Integrity Review

Critical Concerns

Minor Suggestions

Final Recommendation

Uh oh!

greptile-apps bot commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Mar 4, 2026 •

edited

Loading

claude bot commented Mar 4, 2026 •

edited

Loading

claude bot commented Mar 4, 2026 •

edited

Loading