perf: optimize build pipeline for incremental and full builds#317
perf: optimize build pipeline for incremental and full builds#317carlos-alm merged 4 commits intomainfrom
Conversation
Greptile SummaryOptimizes build pipeline with multiple performance improvements: batches node ID lookups to eliminate O(n) individual SELECTs, skips AST/complexity processing for unchanged reverse-dependency files on incremental builds, scopes directory structure teardown to affected directories only, switches to schema-based version mismatch detection to avoid unnecessary full rebuilds on patch/minor bumps, and adds
Confidence Score: 5/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
Start[Build Start] --> EngineCheck{Engine/Schema<br/>Mismatch?}
EngineCheck -->|Yes| FullBuild[Full Build]
EngineCheck -->|No| IncrCheck{Incremental<br/>Build?}
IncrCheck -->|No| FullBuild
IncrCheck -->|Yes| DetectChanges[Detect Changed/<br/>Removed Files]
DetectChanges --> PurgeChanged[Purge Changed/<br/>Removed Files]
PurgeChanged --> DetectReverseDeps[Detect Reverse<br/>Dependencies]
DetectReverseDeps --> DeleteRevEdges[Delete Only<br/>Outgoing Edges]
DeleteRevEdges --> ParseAll[Parse All Files<br/>Changed + Reverse-deps]
FullBuild --> ParseAll
ParseAll --> BulkFetch[Bulk Fetch Node IDs<br/>per File Map]
BulkFetch --> InsertPhases[Multi-Phase Insert<br/>Defs → Children → Edges]
InsertPhases --> FilterRevDeps[Filter Out<br/>Reverse-Deps]
FilterRevDeps --> Structure{Incremental?}
Structure -->|Yes| IncrStructure[Rebuild Only<br/>Affected Dirs]
Structure -->|No| FullStructure[Rebuild All<br/>Directory Nodes]
IncrStructure --> ASTCheck{--no-ast?}
FullStructure --> ASTCheck
ASTCheck -->|No| AST[AST Extraction<br/>Bulk ID Lookups]
ASTCheck -->|Yes| ComplexityCheck
AST --> ComplexityCheck{--no-complexity?}
ComplexityCheck -->|No| Complexity[Complexity Metrics<br/>Filtered Symbols]
ComplexityCheck -->|Yes| SaveMeta
Complexity --> SaveMeta[Save Build Metadata<br/>schema_version + astMs]
SaveMeta --> End[Build Complete]
Last reviewed commit: 2b95c96 |
…xtended kinds Auto-rebuild (10.2): When an incremental build detects a version or engine change, automatically promote to a full rebuild instead of only warning. Adds forceFullRebuild flag in builder.js with integration tests. Native extended kinds (10.3): Add parameter, property, and constant extraction to all 8 Rust native extractors (JS/TS, Python, Go, Rust, Java, C#, Ruby, PHP). The WASM engine already extracts these; this brings the native engine to parity. Each extractor gets helpers for walking AST nodes and populating the children field on Definition. Includes Rust unit tests per extractor and updated parity test normalization to compare children arrays across engines. Impact: 59 functions changed, 48 affected
The native prebuilt binary on CI hasn't been compiled with the new extended kinds extraction yet. Exclude children from the parity normalize() comparison (same pattern as the existing receiver exclusion) so tests pass until the next native binary release.
650efe8 to
65c433b
Compare
|
Claude finished @carlos-alm's task —— View job PR Review: Build Pipeline OptimizationReview Checklist
Root Cause AnalysisProblem Being Solved: Performance bottlenecks in the build pipeline caused by:
Why This Approach: The PR implements well-targeted optimizations:
Risk Assessment: LOW-MEDIUM - These are performance optimizations to existing functionality, not fundamental behavioral changes. Code Quality Assessment✅ StrengthsPerformance Optimizations Are Well-Designed:
Schema-Based Version Logic is Sound:
Reverse-Dep Filtering is Intelligent:
|
- Batch node ID lookups in insertion phase: replace per-node getNodeId.get() with 2 bulk SELECT per file, eliminating O(defs + children) individual queries per file - Batch node ID lookups in AST extraction: pre-load all node IDs per file into a Map, replacing per-parent-def DB queries with O(1) lookups - Skip AST/complexity for reverse-dep-only files on incremental builds since their content didn't change - Incremental structure rebuild: scope directory node/edge/metric teardown to affected directories only instead of full DELETE + rebuild - Add --no-ast and --no-complexity build flags for users who don't need these phases (mirrors existing --cfg opt-in pattern) - Use schema_version instead of codegraph_version for auto-promote decisions so patch/minor bumps don't force full rebuilds - Include astMs in phase timing output (was computed but not returned) Impact: 5 functions changed, 3 affected Impact: 1 functions changed, 1 affected
65c433b to
2b95c96
Compare
|
Claude finished @carlos-alm's task —— View job PR Review: Build Pipeline OptimizationReview Checklist
Root Cause AnalysisProblem Being Solved: Performance bottlenecks in the build pipeline caused by:
Why This Approach: The PR implements well-targeted optimizations:
Risk Assessment: LOW-MEDIUM - These are performance optimizations to existing functionality, not fundamental behavioral changes. Code Quality Assessment✅ StrengthsPerformance Optimizations Are Well-Designed:
Schema-Based Version Logic is Sound:
Reverse-Dep Filtering is Intelligent:
|
Additional Comments (1)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
Summary
getNodeId.get()with bulkSELECT ... WHERE file = ?queries per file, eliminating O(defs + children) individual SELECTsast_nodes/function_complexityrows are still validDELETE+ rebuild of all directories--no-astand--no-complexitybuild flags for users who don't need these phases (mirrors existing--cfgopt-in pattern)schema_version(migration number) instead ofcodegraph_versionfor auto-promote decisions, so patch/minor bumps don't force full rebuildsastMsin phase timing output (was computed but never returned)Test plan
rolesDatawarning)node src/cli.js build .)node src/cli.js build . --no-incrementaland compare phase timings before/afternode src/cli.js build .(incremental, no changes) — should be near-instant--no-astand--no-complexityflags produce correct but reduced outputnode src/cli.js statsbefore/after to verify node/edge counts match