Skip to content

fix: language-aware identifier types in dataflow collectIdentifiers#324

Merged
carlos-alm merged 1 commit intomainfrom
fix/dataflow-php-identifiers
Mar 4, 2026
Merged

fix: language-aware identifier types in dataflow collectIdentifiers#324
carlos-alm merged 1 commit intomainfrom
fix/dataflow-php-identifiers

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • collectIdentifiers hardcoded node.type === 'identifier', missing PHP's variable_name/name and other language-specific identifier node types. Replaced with IDENT_TYPES set lookup covering JS/TS/PHP/Go/Rust/Java/Python/Kotlin.
  • Added function_definition (PHP, Python) and method_declaration (PHP, Java, Go) to extractDataflow scope entry so function scopes are recognized for non-JS languages.
  • Added PHP-aware parameter extraction (simple_parameter, variadic_parameter) to extractParamNames.
  • Added tests/parsers/dataflow-php.test.js validating referencedNames and parameter extraction for PHP functions and class methods.

Test plan

  • New PHP dataflow tests pass (4 tests: referencedNames in functions, methods, function calls; parameter extraction)
  • Existing JS dataflow tests unaffected (22 tests pass)
  • Full test suite: 1380 passed, 3 pre-existing parity failures (unrelated)
  • Lint clean (biome check passes on changed files)

@claude
Copy link

claude bot commented Mar 4, 2026

Claude finished @carlos-alm's task —— View job


PR Review Complete

Reviewing PR #324: fix: language-aware identifier types in dataflow collectIdentifiers

Todo List:

  • Analyze root cause and problem being solved
  • Review code changes in src/dataflow.js
  • Examine new test file tests/parsers/dataflow-php.test.js
  • Check for configuration changes or rule relaxations
  • Assess test integrity and coverage
  • Verify code quality and potential issues
  • Provide final assessment and recommendation

Root Cause Analysis

Problem Being Solved: The collectIdentifiers function was hardcoded to only recognize JavaScript's 'identifier' node type, causing it to miss PHP's variable_name ($x), name, and other language-specific identifier types. This bug prevented dataflow analysis from working correctly for non-JS languages.

Why This Approach: The solution properly centralizes language-specific identifier types into a IDENT_TYPES Set, making the code extensible and maintainable. The approach is consistent with the codebase's tree-sitter AST handling patterns.

Risk Assessment: LOW RISK - Changes are surgical, well-tested, and maintain backward compatibility.

Code Quality Assessment

Excellent Implementation

  • Lines 39-44: Clean IDENT_TYPES Set with clear comments mapping each type to its languages
  • Lines 106-113: Proper PHP parameter extraction handling simple_parameter/variadic_parameter
  • Lines 268-276: Logical extension of function scope detection to support multi-language function_definition/method_declaration
  • Lines 505-513: Efficient Set lookup replaces hardcoded string comparison

Strong Test Coverage

The new tests/parsers/dataflow-php.test.js provides comprehensive validation:

  • PHP variable reference extraction ($a, $b in return statements)
  • Method vs function parameter handling
  • Function call argument detection
  • Graceful fallback when PHP parser unavailable

No Configuration Concerns

  • No linting rules relaxed or disabled
  • No test timeouts increased or thresholds lowered
  • No build configuration changes
  • Biome config remains unchanged

Architectural Soundness

  • Changes isolated to dataflow functionality only
  • Maintains existing JS/TS identifier handling
  • Follows established parser extractor patterns
  • Clear separation of concerns

Minor Observations

  • Could add more language identifier types (Go, Rust, etc.) but current scope is appropriate
  • Test gracefully handles missing PHP parser via early return

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: This is a well-implemented bug fix that solves a real language support issue. The code is clean, properly tested, and follows good engineering practices. The centralized IDENT_TYPES approach makes future language additions straightforward. No security, performance, or configuration concerns identified.

Approval Conditions: None - PR is ready to merge as-is.


@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 4, 2026

Greptile Summary

This PR fixes a language-awareness bug in dataflow analysis where collectIdentifiers() was hardcoded to check only for node.type === 'identifier', causing it to miss PHP variables ($x, $y) and other language-specific identifier types. The fix replaces the hardcoded check with a call to isIdent(node.type, rules), which properly handles language-specific identifier node types through the existing dataflow rules configuration.

Key changes:

  • Modified collectIdentifiers() to accept rules parameter and use isIdent() for language-aware identifier detection
  • Added defensive null check (if (!node) return;)
  • Updated test suite to verify referencedNames extraction works for PHP variables in return statements

Impact: Enables accurate dataflow tracking for PHP and prepares the foundation for other languages that use non-standard identifier node types. No breaking changes to JavaScript/TypeScript behavior since isIdent() still checks for 'identifier' first.

Confidence Score: 5/5

  • This PR is safe to merge with no risk
  • The change is a focused bug fix with proper language-aware implementation using existing infrastructure (isIdent function and dataflow rules). Tests validate the fix works for PHP while maintaining JavaScript compatibility. The code follows established patterns, includes defensive null checking, and properly threads the rules parameter through all call sites.
  • No files require special attention

Important Files Changed

Filename Overview
src/dataflow.js Updated collectIdentifiers to use language-aware isIdent() function instead of hardcoded 'identifier' check, enabling proper identifier collection for PHP and other languages
tests/parsers/dataflow-php.test.js Enhanced tests to verify referencedNames extraction works correctly for PHP variables in return statements

Last reviewed commit: 6a9cafb

@claude
Copy link

claude bot commented Mar 4, 2026

Claude finished @carlos-alm's task —— View job


PR Review Complete

Reviewing PR #324: fix: language-aware identifier types in dataflow collectIdentifiers

Todo List:

  • Analyze root cause and problem being solved
  • Review code changes in src/dataflow.js
  • Examine new test file tests/parsers/dataflow-php.test.js
  • Check for configuration changes or rule relaxations
  • Assess test integrity and coverage
  • Verify code quality and potential issues
  • Provide final assessment and recommendation

Root Cause Analysis

Problem Being Solved: The collectIdentifiers function was hardcoded to only recognize JavaScript's 'identifier' node type, causing it to miss PHP's variable_name ($x), name, and other language-specific identifier types. This bug prevented dataflow analysis from working correctly for non-JS languages.

Why This Approach: The solution properly centralizes language-specific identifier types into a IDENT_TYPES Set, making the code extensible and maintainable. The approach is consistent with the codebase's tree-sitter AST handling patterns.

Risk Assessment: LOW RISK - Changes are surgical, well-tested, and maintain backward compatibility.

Code Quality Assessment

Excellent Implementation

  • Lines 39-44: Clean IDENT_TYPES Set with clear comments mapping each type to its languages
  • Lines 106-113: Proper PHP parameter extraction handling simple_parameter/variadic_parameter
  • Lines 268-276: Logical extension of function scope detection to support multi-language function_definition/method_declaration
  • Lines 505-513: Efficient Set lookup replaces hardcoded string comparison

Strong Test Coverage

The new tests/parsers/dataflow-php.test.js provides comprehensive validation:

  • PHP variable reference extraction ($a, $b in return statements)
  • Method vs function parameter handling
  • Function call argument detection
  • Graceful fallback when PHP parser unavailable

No Configuration Concerns

  • No linting rules relaxed or disabled
  • No test timeouts increased or thresholds lowered
  • No build configuration changes
  • Biome config remains unchanged

Architectural Soundness

  • Changes isolated to dataflow functionality only
  • Maintains existing JS/TS identifier handling
  • Follows established parser extractor patterns
  • Clear separation of concerns

Minor Observations

  • Could add more language identifier types (Go, Rust, etc.) but current scope is appropriate
  • Test gracefully handles missing PHP parser via early return

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: This is a well-implemented bug fix that solves a real language support issue. The code is clean, properly tested, and follows good engineering practices. The centralized IDENT_TYPES approach makes future language additions straightforward. No security, performance, or configuration concerns identified.

Approval Conditions: None - PR is ready to merge as-is.


@carlos-alm
Copy link
Contributor Author

CI failures are the pre-existing build-parity.test.js (native vs WASM constant kind mismatch) -- same 2 failures on main latest CI run (22655929807). Our 1398 non-parity tests all pass across all platforms.

Added follow-up commit: consolidated conditional logic into FUNC_SCOPE_TYPES Set, simplified functionName() with optional chaining, and added null guard to collectIdentifiers().

@greptileai

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 4, 2026

Additional Comments (2)

src/dataflow.js, line 379
hardcoded identifier check won't capture PHP variable_name nodes. When PHP is fully enabled in DATAFLOW_LANG_IDS, update to IDENT_TYPES.has(unwrapped.type)


src/dataflow.js, line 434
hardcoded identifier check won't capture PHP variable_name nodes. When PHP is fully enabled in DATAFLOW_LANG_IDS, update to IDENT_TYPES.has(left?.type)

@carlos-alm
Copy link
Contributor Author

Closing: PR #318 (feat: extend dataflow analysis to all supported languages) already landed on main with a comprehensive rules-based system that supersedes all changes in this PR — language-aware IDENT_TYPES, FUNC_SCOPE_TYPES, PHP parameter extraction, isIdent helper, collectIdentifiers null guard, and PHP dataflow tests. No remaining diff against main.

@carlos-alm carlos-alm closed this Mar 4, 2026
@github-actions github-actions bot locked and limited conversation to collaborators Mar 4, 2026
@carlos-alm carlos-alm reopened this Mar 4, 2026
…ames

collectIdentifiers hardcoded node.type === 'identifier', missing PHP's
variable_name and other language-specific identifier types in return
statement referencedNames. Now uses the existing isIdent(nodeType, rules)
helper with rules passed from the call site. Also adds null guard.

Updated PHP dataflow test to assert referencedNames contains $x, $a, $b
— previously only checked funcName without validating identifier capture.

Impact: 3 functions changed, 2 affected
@carlos-alm carlos-alm force-pushed the fix/dataflow-php-identifiers branch from ba5fbec to 6a9cafb Compare March 4, 2026 05:43
@carlos-alm
Copy link
Contributor Author

Rebased on main (post-#318). The bug was still present: collectIdentifiers at line 981 still hardcoded node.type === identifier, missing PHP variable_name in referencedNames. Now uses the existing isIdent(nodeType, rules) helper with rules passed through. Also added null guard and updated PHP test to actually assert referencedNames capture.

@greptileai

@optave optave unlocked this conversation Mar 4, 2026
@claude
Copy link

claude bot commented Mar 4, 2026

Claude finished @carlos-alm's task —— View job


PR Review Complete

Reviewing PR #324: fix: language-aware identifier types in dataflow collectIdentifiers

Todo List:

  • Analyze root cause and problem being solved
  • Review code changes in src/dataflow.js
  • Examine new test file tests/parsers/dataflow-php.test.js
  • Check for configuration changes or rule relaxations
  • Assess test integrity and coverage
  • Verify code quality and potential issues
  • Provide final assessment and recommendation

Root Cause Analysis

Problem Being Solved: The collectIdentifiers function at line 983 was hardcoded to only recognize JavaScript's 'identifier' node type, causing it to miss PHP's variable_name ($x) and other language-specific identifier types. This prevented dataflow analysis from correctly extracting referencedNames for non-JS languages.

Why This Approach: The fix properly uses the existing isIdent(nodeType, rules) helper function with language-specific rules, making the code consistent with other identifier checks throughout the codebase. This approach leverages the existing infrastructure instead of adding more hardcoded logic.

Risk Assessment: LOW RISK - The changes are surgical and use existing well-tested infrastructure.

Configuration Change Detection

No configuration concerns identified:

  • No changes to biome.json linting rules or configuration
  • No test timeouts increased or coverage thresholds lowered
  • No build configuration modifications
  • No TypeScript strict mode flags disabled

Test Integrity Assessment

Strong test improvements:

  • Enhanced test coverage: Updated PHP test to actually validate referencedNames capture
  • Added comprehensive test: New test case for multiple variable references in return statements
  • No weakened assertions: Tests became more specific and thorough, not less
  • Good test structure: Uses proper expect.arrayContaining and expect.objectContaining patterns

Code Quality Analysis

Excellent Implementation Quality

  • Lines 914, 983-992: Clean usage of isIdent(node.type, rules) with proper null guard
  • Consistent with existing patterns: The isIdent helper is used throughout the file at lines 568, 593, 784, 823, 872
  • Proper function signature: Added rules parameter with correct JSDoc documentation
  • Language-aware design: Leverages existing extraIdentifierTypes infrastructure (PHP rules at line 349)

No Performance or Security Issues

  • Performance neutral: Same algorithmic complexity, just uses existing helper instead of hardcoded check
  • Memory safe: No new allocations or complex data structures
  • No security concerns: Pure AST traversal with no external inputs or command execution

Backward Compatibility

  • JavaScript/TypeScript unchanged: isIdent returns true for 'identifier' maintaining existing behavior
  • Additive change: Only extends functionality to support more languages, doesn't break existing functionality

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: This is a well-implemented bug fix that solves a real language support issue. The changes are surgical, use existing infrastructure properly, and include comprehensive test coverage. The fix follows established patterns in the codebase and maintains backward compatibility. No configuration concerns or quality issues identified.

Key strengths:

  • Fixes actual bug preventing PHP dataflow analysis
  • Uses existing isIdent helper consistently
  • Enhanced test coverage validates the fix
  • Clean, minimal implementation

Minor observation: Could potentially add more language identifier types in the future, but current scope is appropriate for this targeted fix.


View job

@carlos-alm carlos-alm merged commit 1eb1081 into main Mar 4, 2026
10 of 18 checks passed
@carlos-alm carlos-alm deleted the fix/dataflow-php-identifiers branch March 4, 2026 05:52
@github-actions github-actions bot locked and limited conversation to collaborators Mar 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant