fix: emit CALLS edges for module-scope code (closes #284)#285
Open
michael-denyer wants to merge 1 commit intotirth8205:mainfrom
Open
fix: emit CALLS edges for module-scope code (closes #284)#285michael-denyer wants to merge 1 commit intotirth8205:mainfrom
michael-denyer wants to merge 1 commit intotirth8205:mainfrom
Conversation
The parser gated CALLS edge emission on `enclosing_func` being set, so calls made from module scope (top-level script glue, CLI entrypoints, `if __name__ == "__main__"` blocks, and Jupyter/Databricks notebook cells) produced zero CALLS edges. Any function invoked only from those contexts was flagged as dead by `find_dead_code`, even when the function was the entire reason the script existed. Notebooks are particularly affected because every cell is module-scope by definition, so the existing notebook parser (PR tirth8205#69) emitted nodes and IMPORTS_FROM edges but no CALLS edges — making the dead-code detector's notebook coverage vacuous. Fix: when `enclosing_func` is None, attribute the CALLS edge to the File node instead of dropping it. Matches the existing convention used by `_extract_value_references` and CONTAINS edges. Applied to all 5 gated emission sites: generic Python/JS/TS path, JSX components, Elixir, Solidity `emit`, and R. Downstream: `detect_entry_points` now filters File-sourced CALLS via `get_all_call_targets(include_file_sources=False)` so script-only callees remain detectable as entry points (otherwise `run_job()` called from `script.py` module scope would look "called" by `script.py` and disappear from flow analysis). Verified end-to-end against a Databricks `.ipynb` that calls `Predict.extract_data_from_sample_ids()` from cell-level code: edge count went from 0 to 14 CALLS edges, and `find_dead_code` no longer flags the method. Tests: - `test_module_scope_calls_attributed_to_file` — bare `.py` script - `test_module_scope_calls_in_notebook` — `.ipynb` file - `test_detect_entry_points_module_scope_caller_is_still_root` — flow analysis treats File-sourced CALLS correctly - `test_module_scope_caller_prevents_dead_code_flag` — end-to-end parse → store → find_dead_code - `test_if_main_block_caller_prevents_dead_code_flag` — same for `__main__` block
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #284.
_extract_callsand 4 sibling helpers gated CALLS edge emission onenclosing_funcbeing set, so module-scope calls (top-level script glue, CLI entrypoints,if __name__ == '__main__'blocks, Jupyter/Databricks notebook cells) produced zero CALLS edges. Any function invoked only from those contexts was flagged as dead byfind_dead_code.Notebooks were hit hardest: PR #69 added node + IMPORTS_FROM extraction, but every cell is module-scope by definition, so notebooks emitted no CALLS edges at all — making the dead-code detector's notebook coverage vacuous.
What changed
Parser (5 emission sites): when
enclosing_funcis None, attribute the CALLS edge to the File node instead of dropping it. Matches the existing convention used by_extract_value_referencesand CONTAINS edges._extract_calls(the main path)emitDownstream fix in
detect_entry_points: without filtering, a script's module-scope calls would attribute to the script's own File node, making script-only callees look "called by the script" and hiding them from flow analysis. Addedget_all_call_targets(include_file_sources=False)sodetect_entry_pointsexcludes File-sourced CALLS. Implementation joins againstnodes.kind = 'File'rather than pattern-matchingsource_qualifiedso future changes to File-node naming can't silently miscategorize edges.End-to-end verification
Real-world repro: a Databricks notebook (production inference pipeline) calling
Predict.extract_data_from_sample_ids().Before:
After:
Tests
5 new tests, all passing:
test_parser.py::test_module_scope_calls_attributed_to_file— bare.pyscripttest_parser.py::test_module_scope_calls_in_notebook—.ipynbfiletest_flows.py::test_detect_entry_points_module_scope_caller_is_still_root— flow analysis treats File-sourced CALLS correctlytest_refactor.py::test_module_scope_caller_prevents_dead_code_flag— end-to-end parse → store →find_dead_codetest_refactor.py::test_if_main_block_caller_prevents_dead_code_flag— same for__main__blockFull impacted suite: 318 passed, 0 failures (parser, refactor, flows, multilang, notebook).
Test plan
.py).ipynb)detect_entry_pointsexcludes File-sourced CALLS so script-only callees remain rootsfind_dead_codedoes not flag module-scope-called functionsfind_dead_codedoes not flag__main__-block-called functionsNot addressed (scope kept tight)
Reviewed all dead-code-related PRs/issues (#104, #108, #154, #158, #160, #247, #249) — none address module-scope CALLS emission. The other 4 helper sites (Elixir, JSX, Solidity, R) had the same gating shape so are fixed in the same PR for consistency, even though the original repro only needed the Python path.