Skip to content

fix: extract new/throw/await/string/regex AST nodes in native engine#314

Merged
carlos-alm merged 1 commit intomainfrom
fix/native-ast-kinds
Mar 4, 2026
Merged

fix: extract new/throw/await/string/regex AST nodes in native engine#314
carlos-alm merged 1 commit intomainfrom
fix/native-ast-kinds

Conversation

@carlos-alm
Copy link
Contributor

Summary

Fixes #306 — the native Rust engine only extracted call_expression nodes, missing new/throw/await/string/regex AST kinds that the WASM engine captures via tree walking in ast.js. This caused ast --kind new (and similar) to return empty results when built with --engine native.

  • Rust: Add AstNode struct to FileSymbols, implement walk_ast_nodes() in javascript.rs mirroring walkAst() from ast.js with proper recursion control (no recurse for new/throw/await to avoid double-counting, recurse for string/regex)
  • JS: normalizeNativeSymbols() maps the new astNodes field; buildAstNodes() adds native path consuming pre-extracted nodes when _tree is absent
  • Tests: 12 Rust unit tests + 9 JS native engine tests (auto-skip if installed addon predates this change)

Test plan

  • All 1371 existing tests pass
  • WASM AST extraction unchanged (10 tests pass)
  • Native engine tests auto-skip on old addon, will activate after CI rebuild
  • Lint clean
  • CI: cargo test validates Rust unit tests
  • Manual: node src/cli.js build . --engine native && node src/cli.js ast --kind new -T shows results
  • Manual: compare node src/cli.js ast -j --engine native vs --engine wasm counts

…306)

The native Rust engine only extracted call_expression nodes, missing
new/throw/await/string/regex kinds that the WASM engine captures via
tree walking in ast.js. This caused `ast --kind new` (and similar) to
return empty results when built with --engine native.

Rust changes:
- Add AstNode struct to types.rs with napi(object) derive
- Add ast_nodes field to FileSymbols
- Add walk_ast_nodes() in javascript.rs as a separate tree walk
  mirroring walkAst() in ast.js, with proper recursion control
  (no recurse for new/throw/await, recurse for string/regex)
- Add truncate() helper for char-safe text truncation
- Add 12 Rust unit tests covering all 5 kinds + edge cases

JS changes:
- normalizeNativeSymbols() maps the new astNodes field
- buildAstNodes() adds native path: when _tree is absent but
  astNodes is populated, insert pre-extracted nodes directly
- ast-nodes.test.js adds native engine test suite (auto-skips
  if installed native addon predates this change)

Impact: 20 functions changed, 19 affected
@claude
Copy link

claude bot commented Mar 4, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Native AST Node Extraction Fix

Todo List:

  • Phase 1: Root Cause Analysis - Understand the real problem being solved
  • Phase 2: Configuration Changes - Check for high-risk configuration modifications
  • Phase 3: Test Integrity - Verify test changes are legitimate improvements
  • Phase 4: Code Quality - Review implementation details and architecture
  • Final Assessment - Provide rating and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: The native Rust engine was only extracting call_expression nodes, missing new/throw/await/string/regex AST kinds that the WASM engine captures. This caused commands like node src/cli.js ast --kind new to return empty results when using --engine native, creating feature disparity between engines.

Why This Approach: This fixes the actual root cause rather than masking symptoms. The issue was architectural - the native engine lacked the AST walking functionality that existed in the WASM path (ast.js). The solution correctly implements the missing functionality in Rust while maintaining parity with the WASM implementation.

Risk Assessment: ✅ LOW RISK - This is a legitimate feature parity fix addressing a real functional gap.

Phase 2: Configuration Change Detection ✅

No Configuration Changes Found: ✅ No changes to:

  • Build configuration (package.json, Cargo.toml)
  • Linting rules (biome.json unchanged)
  • Test timeouts or coverage thresholds
  • TypeScript configuration
  • CI/CD workflows

This is purely a code implementation change with no rule relaxation or configuration modifications.

Phase 3: Test Integrity ✅

Test Changes Are Legitimate: ✅

  • 127 new lines in tests/parsers/ast-nodes.test.js - all additive test coverage
  • Tests verify the new native engine functionality matches WASM behavior
  • Smart compatibility check: nativeSupportsAstNodes() auto-skips tests on older native binaries
  • No weakened assertions: All tests verify expected functionality works correctly
  • 12 Rust unit tests added in javascript.rs covering edge cases
  • Comprehensive coverage: Tests all AST node kinds (new, throw, await, string, regex)

No test expectations were lowered - only new functionality verification added.

Phase 4: Code Quality ✅

Rust Implementation: ✅ High Quality

  • Unicode-safe truncation: truncate() helper uses is_char_boundary() to prevent breaking UTF-8 sequences
  • Proper recursion control: Smart logic prevents double-counting (e.g., throw new Error emits one "throw" node, not separate "throw" + "new")
  • Memory efficient: Uses Vec<AstNode> and avoids unnecessary allocations
  • Error handling: Proper Option<> types and safe string operations

JavaScript Integration: ✅ Well Architected

  • Clean dual-path logic: buildAstNodes() handles both WASM (_tree present) and native (astNodes present) paths
  • Consistent data mapping: normalizeNativeSymbols() properly maps Rust structs to JS objects
  • Backward compatibility: Auto-detection prevents breaking older native binaries

Architecture: ✅ Sound Design

  • Mirrors existing WASM implementation for consistency
  • Maintains same constants (TEXT_MAX = 200) across engines
  • Uses same parent resolution logic for both paths

Critical Concerns

None Found - This is a well-implemented feature parity fix.

Final Recommendation

## Root Cause Analysis
**Problem Being Solved**: Native Rust engine missing AST node extraction for new/throw/await/string/regex, causing feature disparity vs WASM engine
**Why This Approach**: Implements missing functionality in native engine to achieve parity, fixes architectural gap rather than symptoms  
**Risk Assessment**: LOW - Legitimate feature implementation with comprehensive testing

## Critical Concerns
None - Well-architected implementation with proper error handling and test coverage

## Final Recommendation  
- Rating: ⭐⭐⭐⭐☆ (4/5)
- Action: **APPROVE**
- Reasoning: High-quality implementation that fixes a real functional gap. Comprehensive test coverage, proper error handling, and maintains architectural consistency. The only reason it's not 5 stars is the size (454 additions) but it's justified by the scope of missing functionality.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 4, 2026

Greptile Summary

This PR fixes a feature parity gap between the native Rust engine and WASM engine by implementing AST node extraction for new, throw, await, string, and regex expressions in the native parser.

Key changes:

  • Rust implementation: Added walk_ast_nodes() function in javascript.rs that mirrors the JavaScript walkAst() logic, with proper recursion control to avoid double-counting (e.g., throw new Error emits only a throw node, not both throw and new)
  • Type system: Extended FileSymbols with AstNode struct and ast_nodes field, exposed to JavaScript via N-API as astNodes
  • JavaScript integration: Modified buildAstNodes() to support dual paths—WASM uses tree-walking, native consumes pre-extracted nodes from Rust
  • Backward compatibility: Tests auto-skip when running against old prebuilt binaries that lack the ast_nodes field via smart feature detection

Test coverage:

  • 12 new Rust unit tests validate extraction logic for all node types
  • 9 new JavaScript integration tests verify end-to-end behavior with native engine
  • All 1371 existing tests pass

The implementation is clean, well-tested, and properly handles edge cases like Unicode truncation and trivial string filtering.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The implementation mirrors proven JavaScript logic, has comprehensive test coverage (21 new tests), maintains backward compatibility, and fixes a real bug without introducing breaking changes. All existing tests pass.
  • No files require special attention

Important Files Changed

Filename Overview
crates/codegraph-core/src/extractors/javascript.rs Implemented walk_ast_nodes() to extract new/throw/await/string/regex AST nodes, mirroring JavaScript walkAst() with proper recursion control; added 12 comprehensive Rust unit tests
crates/codegraph-core/src/types.rs Added AstNode struct and ast_nodes field to FileSymbols, exposed to JS as astNodes via napi
src/ast.js Added native engine path in buildAstNodes() to consume pre-extracted symbols.astNodes from Rust when _tree is absent
tests/parsers/ast-nodes.test.js Added 9 native engine tests with auto-skip mechanism via nativeSupportsAstNodes() check for backward compatibility with old addons

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Source Code] -->|Parse| B[Tree-sitter AST]
    B -->|Native Engine| C[walk_ast_nodes in Rust]
    B -->|WASM Engine| D[walkAst in JS]
    
    C -->|Extract| E[AstNode Structs]
    E -->|Add to| F[FileSymbols.ast_nodes]
    F -->|N-API Bridge| G[normalizeNativeSymbols]
    
    D -->|Extract| H[AST Rows]
    
    G -->|symbols.astNodes| I[buildAstNodes]
    I -->|Native Path| J[Consume Pre-extracted Nodes]
    I -->|WASM Path| K[Walk Tree-sitter AST]
    
    H -->|Direct| K
    
    J -->|Insert| L[ast_nodes DB Table]
    K -->|Insert| L
    
    subgraph Rust[Rust Native Engine]
        C
        E
        F
    end
    
    subgraph JS[JavaScript Layer]
        G
        I
        D
    end
    
    subgraph AST Node Types
        M[new expressions]
        N[throw statements]
        O[await expressions]
        P[string literals]
        Q[regex patterns]
    end
Loading

Last reviewed commit: 79ec5c8

@carlos-alm carlos-alm merged commit 6101b5e into main Mar 4, 2026
20 checks passed
@carlos-alm carlos-alm deleted the fix/native-ast-kinds branch March 4, 2026 02:52
@github-actions github-actions bot locked and limited conversation to collaborators Mar 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: native engine only stores 'call' AST nodes, missing 5 other kinds

1 participant