diff --git a/docs/roadmap/BACKLOG.md b/docs/roadmap/BACKLOG.md index ee1570b9..d5da94a6 100644 --- a/docs/roadmap/BACKLOG.md +++ b/docs/roadmap/BACKLOG.md @@ -144,6 +144,8 @@ These address fundamental limitations in the parsing and resolution pipeline tha | 71 | Basic type inference for typed languages | Extract type annotations from TypeScript and Java AST nodes (variable declarations, function parameters, return types, generics) to resolve method calls through typed references. Currently `const x: Router = express.Router(); x.get(...)` produces no edge because `x.get` can't be resolved without knowing `x` is a `Router`. Tree-sitter already parses type annotations — we just don't use them for resolution. Start with declared types (no flow inference), which covers the majority of TS/Java code. | Resolution | Dramatically improves call graph completeness for TypeScript and Java — the two languages where developers annotate types explicitly and expect tooling to use them. Directly prevents hallucinated "no callers" results for methods called through typed variables | ✓ | ✓ | 5 | No | — | | 72 | Interprocedural dataflow analysis | Extend the existing intraprocedural dataflow (ID 14) to propagate `flows_to`/`returns`/`mutates` edges across function boundaries. When function A calls B with argument X, and B's dataflow shows X flows to its return value, connect A's call site to the downstream consumers of B's return. Requires stitching per-function dataflow summaries at call edges — no new parsing, just graph traversal over existing `dataflow` + `edges` tables. Start with single-level propagation (caller↔callee), not transitive closure. | Analysis | Current dataflow stops at function boundaries, missing the most important flows — data passing through helper functions, middleware chains, and factory patterns. Single-function scope means `dataflow` can't answer "where does this user input end up?" across call boundaries. Cross-function propagation is the difference between toy dataflow and useful taint-like analysis | ✓ | ✓ | 5 | No | 14 | | 73 | Improved dynamic call resolution | Upgrade the current "best-effort" dynamic dispatch resolution for Python, Ruby, and JavaScript. Three concrete improvements: **(a)** receiver-type tracking — when `x = SomeClass()` is followed by `x.method()`, resolve `method` to `SomeClass.method` using the assignment chain (leverages existing `ast_nodes` + `dataflow` tables); **(b)** common pattern recognition — resolve `EventEmitter.on('event', handler)` callback registration, `Promise.then/catch` chains, `Array.map/filter/reduce` with named function arguments, and decorator/annotation patterns; **(c)** confidence-tiered edges — mark dynamically-resolved edges with a confidence score (high for direct assignment, medium for pattern match, low for heuristic) so consumers can filter by reliability. | Resolution | In Python/Ruby/JS, 30-60% of real calls go through dynamic dispatch — method calls on variables, callbacks, event handlers, higher-order functions. The current best-effort resolution misses most of these, leaving massive gaps in the call graph for the languages where codegraph is most commonly used. Even partial improvement here has outsized impact on graph completeness | ✓ | ✓ | 5 | No | — | +| 81 | Track dynamic `import()` and re-exports as graph edges | Extract `import()` expressions as `dynamic-imports` edges in both WASM extraction paths (query-based and walk-based). Destructured names (`const { a } = await import(...)`) feed into `importedNames` for call resolution. **Partially done:** WASM JS/TS extraction works (PR #389). Remaining: **(a)** native Rust engine support — `crates/codegraph-core/src/extractors/javascript.rs` doesn't extract `import()` calls; **(b)** non-static paths (`import(\`./plugins/${name}.js\`)`, `import(variable)`) are skipped with a debug warning; **(c)** re-export consumer counting in `exports --unused` only checks `calls` edges, not `imports`/`dynamic-imports` — symbols consumed only via import edges show as zero-consumer false positives. | Resolution | Fixes false "zero consumers" reports for symbols consumed via dynamic imports. 95 `dynamic-imports` edges found in codegraph's own codebase — these were previously invisible to impact analysis, exports audit, and dead-export hooks | ✓ | ✓ | 5 | No | — | +| 82 | Extract names from `import().then()` callback patterns | `extractDynamicImportNames` only extracts destructured names from `const { a } = await import(...)` (walks up to `variable_declarator`). The `.then()` pattern — `import('./foo.js').then(({ a, b }) => ...)` — produces an edge with empty names because the destructured parameters live in the `.then()` callback, not a `variable_declarator`. Detect when an `import()` call's parent is a `member_expression` with `.then`, find the arrow/function callback in `.then()`'s arguments, and extract parameter names from its destructuring pattern. | Resolution | `.then()`-style dynamic imports are common in older codebases and lazy-loading patterns (React.lazy, Webpack code splitting). Without name extraction, these produce file-level edges only — no symbol-level `calls` edges, so the imported symbols still appear as zero-consumer false positives | ✓ | ✓ | 4 | No | 81 | ### Tier 1i — Search, navigation, and monitoring improvements diff --git a/src/builder.js b/src/builder.js index 2710de48..835aa576 100644 --- a/src/builder.js +++ b/src/builder.js @@ -1041,7 +1041,13 @@ export async function buildGraph(rootDir, opts = {}) { const resolvedPath = getResolved(path.join(rootDir, relPath), imp.source); const targetRow = getNodeId.get(resolvedPath, 'file', resolvedPath, 0); if (targetRow) { - const edgeKind = imp.reexport ? 'reexports' : imp.typeOnly ? 'imports-type' : 'imports'; + const edgeKind = imp.reexport + ? 'reexports' + : imp.typeOnly + ? 'imports-type' + : imp.dynamicImport + ? 'dynamic-imports' + : 'imports'; allEdgeRows.push([fileNodeId, targetRow.id, edgeKind, 1.0, 0]); if (!imp.reexport && isBarrelFile(resolvedPath)) { @@ -1060,7 +1066,11 @@ export async function buildGraph(rootDir, opts = {}) { allEdgeRows.push([ fileNodeId, actualRow.id, - edgeKind === 'imports-type' ? 'imports-type' : 'imports', + edgeKind === 'imports-type' + ? 'imports-type' + : edgeKind === 'dynamic-imports' + ? 'dynamic-imports' + : 'imports', 0.9, 0, ]); diff --git a/src/extractors/javascript.js b/src/extractors/javascript.js index 1770d191..b59c5db9 100644 --- a/src/extractors/javascript.js +++ b/src/extractors/javascript.js @@ -1,3 +1,4 @@ +import { debug } from '../logger.js'; import { findChild, nodeEndLine } from './helpers.js'; /** @@ -173,6 +174,9 @@ function extractSymbolsQuery(tree, query) { // Extract top-level constants via targeted walk (query patterns don't cover these) extractConstantsWalk(tree.rootNode, definitions); + // Extract dynamic import() calls via targeted walk (query patterns don't match `import` function type) + extractDynamicImportsWalk(tree.rootNode, imports); + return { definitions, calls, imports, classes, exports: exps }; } @@ -224,6 +228,41 @@ function extractConstantsWalk(rootNode, definitions) { } } +/** + * Recursive walk to find dynamic import() calls. + * Query patterns match call_expression with identifier/member_expression/subscript_expression + * functions, but import() has function type `import` which none of those patterns cover. + */ +function extractDynamicImportsWalk(node, imports) { + if (node.type === 'call_expression') { + const fn = node.childForFieldName('function'); + if (fn && fn.type === 'import') { + const args = node.childForFieldName('arguments') || findChild(node, 'arguments'); + if (args) { + const strArg = findChild(args, 'string'); + if (strArg) { + const modPath = strArg.text.replace(/['"]/g, ''); + const names = extractDynamicImportNames(node); + imports.push({ + source: modPath, + names, + line: node.startPosition.row + 1, + dynamicImport: true, + }); + } else { + debug( + `Skipping non-static dynamic import() at line ${node.startPosition.row + 1} (template literal or variable)`, + ); + } + } + return; // no need to recurse into import() children + } + } + for (let i = 0; i < node.childCount; i++) { + extractDynamicImportsWalk(node.child(i), imports); + } +} + function handleCommonJSAssignment(left, right, node, imports) { if (!left || !right) return; const leftText = left.text; @@ -455,11 +494,36 @@ function extractSymbolsWalk(tree) { case 'call_expression': { const fn = node.childForFieldName('function'); if (fn) { - const callInfo = extractCallInfo(fn, node); - if (callInfo) calls.push(callInfo); - if (fn.type === 'member_expression') { - const cbDef = extractCallbackDefinition(node, fn); - if (cbDef) definitions.push(cbDef); + // Dynamic import(): import('./foo.js') → extract as an import entry + if (fn.type === 'import') { + const args = node.childForFieldName('arguments') || findChild(node, 'arguments'); + if (args) { + const strArg = findChild(args, 'string'); + if (strArg) { + const modPath = strArg.text.replace(/['"]/g, ''); + // Extract destructured names from parent context: + // const { a, b } = await import('./foo.js') + // (standalone import('./foo.js').then(...) calls produce an edge with empty names) + const names = extractDynamicImportNames(node); + imports.push({ + source: modPath, + names, + line: node.startPosition.row + 1, + dynamicImport: true, + }); + } else { + debug( + `Skipping non-static dynamic import() at line ${node.startPosition.row + 1} (template literal or variable)`, + ); + } + } + } else { + const callInfo = extractCallInfo(fn, node); + if (callInfo) calls.push(callInfo); + if (fn.type === 'member_expression') { + const cbDef = extractCallbackDefinition(node, fn); + if (cbDef) definitions.push(cbDef); + } } } break; @@ -941,3 +1005,64 @@ function extractImportNames(node) { scan(node); return names; } + +/** + * Extract destructured names from a dynamic import() call expression. + * + * Handles: + * const { a, b } = await import('./foo.js') → ['a', 'b'] + * const mod = await import('./foo.js') → ['mod'] + * import('./foo.js') → [] (no names extractable) + * + * Walks up the AST from the call_expression to find the enclosing + * variable_declarator and reads the name/object_pattern. + */ +function extractDynamicImportNames(callNode) { + // Walk up: call_expression → await_expression → variable_declarator + let current = callNode.parent; + // Skip await_expression wrapper if present + if (current && current.type === 'await_expression') current = current.parent; + // We should now be at a variable_declarator (or not, if standalone import()) + if (!current || current.type !== 'variable_declarator') return []; + + const nameNode = current.childForFieldName('name'); + if (!nameNode) return []; + + // const { a, b } = await import(...) → object_pattern + if (nameNode.type === 'object_pattern') { + const names = []; + for (let i = 0; i < nameNode.childCount; i++) { + const child = nameNode.child(i); + if (child.type === 'shorthand_property_identifier_pattern') { + names.push(child.text); + } else if (child.type === 'pair_pattern') { + // { a: localName } → use localName (the alias) for the local binding, + // but use the key (original name) for import resolution + const key = child.childForFieldName('key'); + if (key) names.push(key.text); + } + } + return names; + } + + // const mod = await import(...) → identifier (namespace-like import) + if (nameNode.type === 'identifier') { + return [nameNode.text]; + } + + // const [a, b] = await import(...) → array_pattern (rare but possible) + if (nameNode.type === 'array_pattern') { + const names = []; + for (let i = 0; i < nameNode.childCount; i++) { + const child = nameNode.child(i); + if (child.type === 'identifier') names.push(child.text); + else if (child.type === 'rest_pattern') { + const inner = child.child(0) || child.childForFieldName('name'); + if (inner && inner.type === 'identifier') names.push(inner.text); + } + } + return names; + } + + return []; +} diff --git a/src/kinds.js b/src/kinds.js index 60d363fc..3f469c43 100644 --- a/src/kinds.js +++ b/src/kinds.js @@ -33,6 +33,7 @@ export const ALL_SYMBOL_KINDS = CORE_SYMBOL_KINDS; export const CORE_EDGE_KINDS = [ 'imports', 'imports-type', + 'dynamic-imports', 'reexports', 'calls', 'extends', diff --git a/tests/engines/query-walk-parity.test.js b/tests/engines/query-walk-parity.test.js index 65638daa..2556af08 100644 --- a/tests/engines/query-walk-parity.test.js +++ b/tests/engines/query-walk-parity.test.js @@ -46,6 +46,7 @@ function normalize(symbols) { ...(i.reexport ? { reexport: true } : {}), ...(i.wildcardReexport ? { wildcardReexport: true } : {}), ...(i.typeOnly ? { typeOnly: true } : {}), + ...(i.dynamicImport ? { dynamicImport: true } : {}), })) .sort((a, b) => a.line - b.line), classes: (symbols.classes || []) @@ -178,6 +179,16 @@ export class Server { fn.call(null, arg); obj.apply(undefined, args); method.bind(ctx); +`, + }, + { + name: 'dynamic import() expressions', + file: 'test.js', + code: ` +const { readFile } = await import('fs/promises'); +const { readFile: rf } = await import('node:fs/promises'); +const mod = await import('./utils.js'); +import('./side-effect.js'); `, }, // TypeScript-specific