Skip to content

fix: emit CALLS edges for module-scope code (closes #284)#285

Open
michael-denyer wants to merge 1 commit intotirth8205:mainfrom
michael-denyer:fix/module-scope-calls
Open

fix: emit CALLS edges for module-scope code (closes #284)#285
michael-denyer wants to merge 1 commit intotirth8205:mainfrom
michael-denyer:fix/module-scope-calls

Conversation

@michael-denyer
Copy link
Copy Markdown
Contributor

Summary

Closes #284.

_extract_calls and 4 sibling helpers gated CALLS edge emission on enclosing_func being set, so module-scope calls (top-level script glue, CLI entrypoints, if __name__ == '__main__' blocks, Jupyter/Databricks notebook cells) produced zero CALLS edges. Any function invoked only from those contexts was flagged as dead by find_dead_code.

Notebooks were hit hardest: PR #69 added node + IMPORTS_FROM extraction, but every cell is module-scope by definition, so notebooks emitted no CALLS edges at all — making the dead-code detector's notebook coverage vacuous.

What changed

Parser (5 emission sites): when enclosing_func is None, attribute the CALLS edge to the File node instead of dropping it. Matches the existing convention used by _extract_value_references and CONTAINS edges.

Site Language(s)
_extract_calls (the main path) Python, JS, TS, generic
Elixir call path Elixir
JSX component invocation TSX/JSX
Solidity emit Solidity
R call path R

Downstream fix in detect_entry_points: without filtering, a script's module-scope calls would attribute to the script's own File node, making script-only callees look "called by the script" and hiding them from flow analysis. Added get_all_call_targets(include_file_sources=False) so detect_entry_points excludes File-sourced CALLS. Implementation joins against nodes.kind = 'File' rather than pattern-matching source_qualified so future changes to File-node naming can't silently miscategorize edges.

End-to-end verification

Real-world repro: a Databricks notebook (production inference pipeline) calling Predict.extract_data_from_sample_ids().

Before:

>>> CodeParser().parse_file(Path('ML_wpredict_apply_v1.0.ipynb'))
nodes: 1, edges: 3 (all IMPORTS_FROM, zero CALLS)
>>> find_dead_code(...) → flags extract_data_from_sample_ids, extract_data_from_files

After:

nodes: 1, edges: 17 (3 IMPORTS_FROM, 14 CALLS)
>>> find_dead_code(...) → no longer flags either method

Tests

5 new tests, all passing:

  • test_parser.py::test_module_scope_calls_attributed_to_file — bare .py script
  • test_parser.py::test_module_scope_calls_in_notebook.ipynb file
  • test_flows.py::test_detect_entry_points_module_scope_caller_is_still_root — flow analysis treats File-sourced CALLS correctly
  • test_refactor.py::test_module_scope_caller_prevents_dead_code_flag — end-to-end parse → store → find_dead_code
  • test_refactor.py::test_if_main_block_caller_prevents_dead_code_flag — same for __main__ block

Full impacted suite: 318 passed, 0 failures (parser, refactor, flows, multilang, notebook).

$ uv run pytest tests/test_parser.py tests/test_refactor.py tests/test_notebook.py tests/test_multilang.py tests/test_flows.py
================== 316 passed, 2 xpassed, 1 warning ===================

Test plan

  • Parser emits CALLS edges from module-scope code (Python .py)
  • Parser emits CALLS edges from notebook cells (.ipynb)
  • detect_entry_points excludes File-sourced CALLS so script-only callees remain roots
  • find_dead_code does not flag module-scope-called functions
  • find_dead_code does not flag __main__-block-called functions
  • No regressions in existing parser/refactor/flows/multilang/notebook tests

Not addressed (scope kept tight)

Reviewed all dead-code-related PRs/issues (#104, #108, #154, #158, #160, #247, #249) — none address module-scope CALLS emission. The other 4 helper sites (Elixir, JSX, Solidity, R) had the same gating shape so are fixed in the same PR for consistency, even though the original repro only needed the Python path.

The parser gated CALLS edge emission on `enclosing_func` being set, so
calls made from module scope (top-level script glue, CLI entrypoints,
`if __name__ == "__main__"` blocks, and Jupyter/Databricks notebook
cells) produced zero CALLS edges. Any function invoked only from those
contexts was flagged as dead by `find_dead_code`, even when the
function was the entire reason the script existed.

Notebooks are particularly affected because every cell is module-scope
by definition, so the existing notebook parser (PR tirth8205#69) emitted nodes
and IMPORTS_FROM edges but no CALLS edges — making the dead-code
detector's notebook coverage vacuous.

Fix: when `enclosing_func` is None, attribute the CALLS edge to the
File node instead of dropping it. Matches the existing convention used
by `_extract_value_references` and CONTAINS edges. Applied to all 5
gated emission sites: generic Python/JS/TS path, JSX components,
Elixir, Solidity `emit`, and R.

Downstream: `detect_entry_points` now filters File-sourced CALLS via
`get_all_call_targets(include_file_sources=False)` so script-only
callees remain detectable as entry points (otherwise `run_job()`
called from `script.py` module scope would look "called" by `script.py`
and disappear from flow analysis).

Verified end-to-end against a Databricks `.ipynb` that calls
`Predict.extract_data_from_sample_ids()` from cell-level code: edge
count went from 0 to 14 CALLS edges, and `find_dead_code` no longer
flags the method.

Tests:
- `test_module_scope_calls_attributed_to_file` — bare `.py` script
- `test_module_scope_calls_in_notebook` — `.ipynb` file
- `test_detect_entry_points_module_scope_caller_is_still_root` — flow
  analysis treats File-sourced CALLS correctly
- `test_module_scope_caller_prevents_dead_code_flag` — end-to-end
  parse → store → find_dead_code
- `test_if_main_block_caller_prevents_dead_code_flag` — same for
  `__main__` block
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Module-scope calls (notebooks, scripts, __main__) emit no CALLS edges → find_dead_code false positives

1 participant