feat: extend dataflow analysis to all supported languages#318
feat: extend dataflow analysis to all supported languages#318carlos-alm merged 5 commits intomainfrom
Conversation
Greptile SummaryExtended dataflow analysis from JavaScript/TypeScript to all 8 supported languages (Python, Go, Rust, Java, C#, PHP, Ruby). Implemented a comprehensive rule-based system following the CFG/Complexity pattern with Key changes:
Issue found:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
Start[extractDataflow called with langId] --> GetRules[Lookup rules from DATAFLOW_RULES Map]
GetRules --> CheckRules{Rules exist for language?}
CheckRules -->|No| ReturnEmpty[Return empty arrays]
CheckRules -->|Yes| InitState[Initialize parameters, returns, assignments, argFlows, mutations arrays]
InitState --> Traverse[Walk AST with visit function]
Traverse --> CheckNode{Node type?}
CheckNode -->|Function node| EnterScope[enterScope: extract function name & params using rules]
EnterScope --> VisitChildren1[Visit children]
VisitChildren1 --> ExitScope[exitScope]
CheckNode -->|Return node| CollectRefs[collectIdentifiers for referencedNames]
CollectRefs --> RecordReturn[Push to returns array]
CheckNode -->|Variable declarator| HandleVar[handleVarDeclarator: extract assignments from calls]
HandleVar --> VisitChildren2[Visit children]
CheckNode -->|Call node| HandleCall[handleCallExpr: track argFlows]
HandleCall --> VisitChildren3[Visit children]
CheckNode -->|Assignment node| HandleAssign[handleAssignment: detect mutations & captures]
HandleAssign --> VisitChildren4[Visit children]
CheckNode -->|Expression statement| CheckMutation[handleExprStmtMutation: detect mutating methods]
CheckMutation --> VisitChildren5[Visit children]
CheckNode -->|Other| DefaultVisit[Visit all children]
ExitScope --> MoreNodes{More nodes?}
RecordReturn --> MoreNodes
VisitChildren2 --> MoreNodes
VisitChildren3 --> MoreNodes
VisitChildren4 --> MoreNodes
VisitChildren5 --> MoreNodes
DefaultVisit --> MoreNodes
MoreNodes -->|Yes| CheckNode
MoreNodes -->|No| ReturnData[Return extracted dataflow data]
ReturnEmpty --> End[End]
ReturnData --> End
Last reviewed commit: 40a2b01 |
Additional Comments (1)
And update call site at line 914: |
Dataflow extraction was limited to JS/TS/TSX. This adds rules-based support for Python, Go, Rust, Java, C#, PHP, and Ruby following the established CFG/Complexity pattern. - DATAFLOW_DEFAULTS + makeDataflowRules() validation factory - Per-language rule objects mapping AST node types for functions, calls, returns, parameters, member access, and mutations - DATAFLOW_RULES Map + DATAFLOW_EXTENSIONS Set from LANGUAGE_REGISTRY - extractDataflow() accepts langId, all helpers use rules - buildDataflowEdges() uses DATAFLOW_EXTENSIONS instead of hardcoded extension checks, resolves langId from symbols._langId or extToLang - Language-specific handling: Go expression_list unwrapping, C# direct child initializer scanning, Java combined call+member mutations, PHP extra identifier types and argument wrappers - 7 new per-language test files (57 total tests across 8 languages) Impact: 25 functions changed, 14 affected
40a2b01 to
851f060
Compare
|
Claude finished @carlos-alm's task —— View job PR Review: Dataflow Multi-Language SupportReviewing this pull request that extends dataflow analysis to all supported languages. Root Cause AnalysisProblem Being Solved: Dataflow analysis was previously limited to JavaScript/TypeScript only. This PR extends it to all 8 supported languages (Python, Go, Rust, Java, C#, PHP, Ruby), enabling complete dataflow tracking across multi-language codebases. Why This Approach: The implementation follows the established pattern used in CFG/Complexity modules - using a Risk Assessment: Low risk - this is a feature extension following established patterns, not a fundamental architectural change. Critical Concerns1. Language-Specific Identifier Bug (Line 981-988) 🐛The function collectIdentifiers(node, out) {
if (node.type === 'identifier') { // ❌ Missing PHP's 'variable_name', etc.
out.push(node.text);
return;
}
for (const child of node.namedChildren) {
collectIdentifiers(child, out);
}
}Impact: This causes incomplete Evidence: PHP tests at line 36 don't assert Fix Required: Use the existing function collectIdentifiers(node, out, rules) {
if (isIdent(node.type, rules)) {
out.push(node.text);
return;
}
// ... rest unchanged
}2. Missing Test Coverage for PHP
|
|
Claude finished @carlos-alm's task —— View job PR Review: Dataflow Multi-Language SupportReviewing PR #318 that extends dataflow analysis from JS/TS-only to all supported languages. Review Progress:
Root Cause AnalysisProblem Being Solved: Dataflow analysis was previously limited to JavaScript/TypeScript only. This PR extends it to all 8 supported languages (Python, Go, Rust, Java, C#, PHP, Ruby), enabling complete dataflow tracking across multi-language codebases. Why This Approach: The implementation follows the established pattern used in CFG/Complexity modules - using a Risk Assessment: Low risk - this is a feature extension following established patterns, not a fundamental architectural change. Critical Concerns1. Language-Specific Identifier Bug (Line 981-988) 🐛The function collectIdentifiers(node, out) {
if (node.type === 'identifier') { // ❌ Missing PHP's 'variable_name', etc.
out.push(node.text);
return;
}
for (const child of node.namedChildren) {
collectIdentifiers(child, out);
}
}Impact: This causes incomplete Evidence: PHP tests at line 36 don't assert Fix Required: Use the existing function collectIdentifiers(node, out, rules) {
if (isIdent(node.type, rules)) {
out.push(node.text);
return;
}
// ... rest unchanged
}2. Missing Test Coverage for PHP
|
This reverts commit 0d4cfd2.
|
Claude finished @carlos-alm's task —— View job PR Review: Dataflow Multi-Language SupportReviewing PR #318 that extends dataflow analysis from JS/TS-only to all supported languages. Review Progress:
Root Cause AnalysisProblem Being Solved: Dataflow analysis was previously limited to JavaScript/TypeScript only. This PR extends it to all 8 supported languages (Python, Go, Rust, Java, C#, PHP, Ruby), enabling complete dataflow tracking across multi-language codebases. Why This Approach: The implementation follows the established pattern used in CFG/Complexity modules - using a Risk Assessment: Low risk - this is a feature extension following established patterns, not a fundamental architectural change. Critical Concerns1. Language-Specific Identifier Bug (Line 981-988) 🐛The function collectIdentifiers(node, out) {
if (node.type === 'identifier') { // ❌ Missing PHP's 'variable_name', etc.
out.push(node.text);
return;
}
for (const child of node.namedChildren) {
collectIdentifiers(child, out);
}
}Impact: This causes incomplete Evidence: PHP tests at line 36 don't assert Fix Required: Use the existing function collectIdentifiers(node, out, rules) {
if (isIdent(node.type, rules)) {
out.push(node.text);
return;
}
// ... rest unchanged
}2. Missing Test Coverage for PHP
|
|
Claude finished @carlos-alm's task —— View job PR Review: Dataflow Multi-Language SupportReviewing PR #318 that extends dataflow analysis from JS/TS-only to all supported languages. Review Progress:
Root Cause AnalysisProblem Being Solved: Dataflow analysis was previously limited to JavaScript/TypeScript only. This PR extends it to all 8 supported languages (Python, Go, Rust, Java, C#, PHP, Ruby), enabling complete dataflow tracking across multi-language codebases. Why This Approach: The implementation follows the established pattern used in CFG/Complexity modules - using a Risk Assessment: Low risk - this is a feature extension following established patterns, not a fundamental architectural change. Critical Concerns1. Language-Specific Identifier Bug (Line 981-988) 🐛The function collectIdentifiers(node, out) {
if (node.type === 'identifier') { // ❌ Missing PHP's 'variable_name', etc.
out.push(node.text);
return;
}
for (const child of node.namedChildren) {
collectIdentifiers(child, out);
}
}Impact: This causes incomplete Evidence: PHP tests at line 36 don't assert Fix Required: Use the existing function collectIdentifiers(node, out, rules) {
if (isIdent(node.type, rules)) {
out.push(node.text);
return;
}
// ... rest unchanged
}2. Missing Test Coverage for PHP
|
Summary
DATAFLOW_DEFAULTS+makeDataflowRules()factory + per-language rule objects +DATAFLOW_RULESMap +DATAFLOW_EXTENSIONSSetextractDataflow()to acceptlangIdand use rules throughout; all AST helpers (functionName,extractParams,resolveCalleeName,memberReceiver) parameterized by rulesbuildDataflowEdges()to useDATAFLOW_EXTENSIONSinstead of hardcoded extension checksexpression_listunwrapping, C# direct-child initializer scanning, Java combined call+member mutations, PHP extra identifier types and argument wrappersTest plan
npx vitest run tests/parsers/dataflow-— all 57 tests pass across 8 language filesnpx vitest run tests/integration/dataflow.test.js— integration tests passnpm test— full suite green (1404 passed)node src/cli.js build . --dataflow— builds without errors, 259 edges insertednode src/cli.js dataflow buildGraph -T— shows dataflow edges correctly