Skip to content

ENH: Replace custom LazyITKModule with native PEP 562 lazy loading#6183

Merged
hjmjohnson merged 18 commits intoInsightSoftwareConsortium:mainfrom
thewtex:python-lazy-mechanism
May 5, 2026
Merged

ENH: Replace custom LazyITKModule with native PEP 562 lazy loading#6183
hjmjohnson merged 18 commits intoInsightSoftwareConsortium:mainfrom
thewtex:python-lazy-mechanism

Conversation

@thewtex
Copy link
Copy Markdown
Member

@thewtex thewtex commented May 1, 2026

The custom LazyITKModule(types.ModuleType) subclass that powered
ITK's Python lazy-loading layer is replaced with the standard-library
PEP 562 (Python 3.7+) module-level __getattr__ / __dir__ protocol.
The same observable contract is preserved: lazy first-touch resolution,
thread-safe first-load via a single recursive lock, factory-loading
hook gated on itkConfig.DefaultFactoryLoading, PEP 366 __package__,
and pickle / cloudpickle round-trip identity through sys.modules
registration.

Motivation:

  • The legacy LazyITKModule.__getattribute__ intercepted every
    attribute read just to check a sentinel; PEP 562 only fires on
    miss, so steady-state reads now hit the standard module fast path
    for free.
  • The custom ITKLazyLoadLock combined threading.RLock and
    multiprocessing.RLock. Constructing the multiprocessing half pinned
    the multiprocessing start method at import itk time, breaking
    downstream callers that wanted multiprocessing.set_start_method(...)
    after import. The new implementation uses only threading.RLock;
    process boundary isolation already handles the cross-process case
    (each child process re-runs itk/__init__.py against its own
    sys.modules).
  • Custom pickle reconstructor (_lazy_itk_module_reconstructor),
    __getstate__, __setstate__, and the eager-mode dispatch branch
    are deleted; standard module pickling and sys.modules['itk.<Mod>']
    registration cover the round-trip with a single 3-line
    __reduce_ex__ shim per synthetic submodule (vanilla
    pickle.dumps(types.ModuleType(...)) raises on CPython 3.12+).

What changes:

  • Wrapping/Generators/Python/itk/__init__.py rewritten with
    module-level __getattr__ / __dir__ (PEP 562) and a single
    threading.RLock. Reload-guard branch and eager-mode branch
    removed.
  • New Wrapping/Generators/Python/itk/support/_lazy_submodule.py
    exports _make_itk_lazy_submodule(module_name, lazy_attributes, lazy_load_lock) returning a plain types.ModuleType carrying
    PEP 562 __getattr__ / __dir__ / __reduce_ex__.
  • Wrapping/Generators/Python/itk/support/lazy.py deleted (181 lines:
    custom subclass, lock, reconstructor, sentinel).
  • Wrapping/Generators/Python/Tests/nolazy.py deleted; the eager
    branch it exercised no longer exists.
  • itkConfig.LazyLoading and the ITK_PYTHON_LAZYLOADING
    environment-variable reader removed from
    itkConfig.template.in.py. Lazy is now the single mode.
  • Tests Tests/lazy.py, Tests/multiprocess_lazy_loading.py, and
    Modules/Filtering/ImageIntensity/wrapping/test/itkImageFilterNumPyInputsTest.py
    drop itkConfig.LazyLoading = ... overrides.
  • Tests/lazy.py extended to cover PEP 562 __dir__ (visible
    before load), the factory-loading hook, and stdlib pickle
    round-trip on itk.ITKCommon.

Net diff: 10 files changed, +267 / −335 (net -68 lines).
Validated end-to-end with pixi run --as-is build-python and
pixi run --as-is ctest --test-dir build-python -R Python -E "PythonNoLazyModule" — 172/172 tests pass, including the
PythonLazyModule, PythonMultiprocessLazyLoad, and
PythonGILReleaseSafetyTest regression triplet.

Lazy-only is now the single mode: callers that previously set
itkConfig.LazyLoading = False should remove the assignment (it
will raise AttributeError).

@github-actions github-actions Bot added type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Enhancement Improvement of existing methods or implementation area:Python wrapping Python bindings for a class type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct area:Filtering Issues affecting the Filtering module area:Documentation Issues affecting the Documentation module labels May 1, 2026
@thewtex thewtex requested review from SimonRit and hjmjohnson May 1, 2026 19:06
@greptile-apps

This comment was marked as resolved.

Comment thread Wrapping/Generators/Python/itk/support/_lazy_submodule.py Outdated
@thewtex
Copy link
Copy Markdown
Member Author

thewtex commented May 1, 2026

Addressed Greptile's two P2 findings in ca131c5:

  1. Wrapping/Generators/Python/itk/support/_lazy_submodule.pyloaded_modules dead code. Removed both the loaded_modules: set[str] = set() declaration and the loaded_modules.add(target) call. Confirmed via grep that nothing reads it; the new __reduce_ex__ shim delegates to importlib.import_module rather than replaying loads through a set. PythonLazyModule and PythonMultiprocessLazyLoad pass post-removal.

  2. Documentation/docs/migration_guides/itk_6_migration_guide.md — misleading AttributeError comment. Greptile is correct: in the original two-line example, the preceding itkConfig.LazyLoading = False assignment had set the attribute on the module object, so the subsequent read returned False rather than raising. Reworked the example into separately commented write and read patterns and clarified that the AttributeError only occurs on a fresh import itkConfig with no prior assignment.

@thewtex thewtex force-pushed the python-lazy-mechanism branch from ca131c5 to 9e33a81 Compare May 1, 2026 19:54
Comment thread Wrapping/Generators/Python/itkConfig.template.in.py
@hjmjohnson
Copy link
Copy Markdown
Member

Pushed four follow-up commits addressing the audit items raised on this PR (HEAD c8df1a651e):

  • 12cc7d8a26 — Replace bare assert __package__ == "itk" with explicit RuntimeError so the guard survives python -O.
  • 9806e78a5f — Add stdlib pickle support to top-level itk (module-level __reduce_ex__); covered in Tests/lazy.py.
  • 629d8e5fa7 — Attach a PEP 451 ModuleSpec to synthetic itk.<Module> namespaces via importlib.util.spec_from_loader so inspect, pkgutil, and IDE introspection see a normal module surface.
  • c8df1a651e — One-release deprecation shim for itkConfig.LazyLoading (module-level __getattr__ returns True with DeprecationWarning) and for ITK_PYTHON_LAZYLOADING (warns at itkConfig import time when set). Avoids a hard AttributeError API break for downstream readers.

Validated standalone that the __reduce_ex__ and spec_from_loader patterns round-trip / resolve correctly. CI on the rebuild will exercise the full Tests/lazy.py path including the new pickle.loads(pickle.dumps(itk)) is itk assertion.

@hjmjohnson hjmjohnson force-pushed the python-lazy-mechanism branch from c8df1a6 to beae673 Compare May 4, 2026 19:01
@hjmjohnson
Copy link
Copy Markdown
Member

@thewtex I used agentic review tools to find (perceived) weaknesses in your solution. The 4 commits are the result of that investigation. If you agree with them, they could be left or squashed into the original commits.

I like the idea of this PR, and it looks like the correct solution, but it is not in my typical experience to review this. I'm inclined to accept this and then make it a priority to address issues that may arise quickly.

@dzenanz
Copy link
Copy Markdown
Member

dzenanz commented May 4, 2026

I had a quick glance, and have no objections. But I do have questions: does this impact import speed, and how much? What is the main motivation for this refactoring?

thewtex and others added 13 commits May 4, 2026 18:13
Replace the LazyITKModule sys.modules-swap orchestration in
itk/__init__.py with a native PEP 562 implementation: module-level
__getattr__ and __dir__ resolve symbols on first access, gated by a
single threading.RLock. itkConfig.LazyLoading is no longer consulted;
lazy is the only mode. multiprocessing.RLock is deliberately avoided so
the multiprocessing start method stays pickable after `import itk`.

Per-submodule LazyITKModule instances and the __init_<modulename>__.py
discovery hook are preserved unchanged so the build stays green; Phase
03 will migrate them. itk_base_global_lazy_attributes (consumed by
support/template_class._LoadModules) is still populated with the full
set of owners per attribute, alongside the new first-owner-only
_lazy_attribute_to_module that drives the top-level __getattr__.

Validated via py_compile and a mock-driver smoke harness covering PEP
366 __package__, dir-without-load, lazy-load-with-cache, first-owner
precedence, AttributeError on miss, dunder short-circuit, and a 6-thread
first-touch race.
The new PEP 562 module-level __dir__ called the unqualified `set` name,
which Python resolves through `itk.__dict__`. Once any SWIG submodule is
loaded (e.g. ITKCommon on first `itk.Image` access), `itk.set` is
populated as an `itkTemplate` binding `std::set` and shadows the
builtin. Subsequent calls to `dir(itk)` then raised
`TemplateTypeError: itk.set is not wrapped for input type None`.

This is the same alias hazard that the existing _initialize_module
already guards against by importing `_builtin_set` from `builtins`.

Switch __dir__ to a set-literal form `{*globals().keys(), ...}` which
constructs through the C-level set type with no name lookup, so it is
unaffected by names introduced into module globals at lazy-load time.
Introduce itk.support._lazy_submodule with a builder that returns a
plain types.ModuleType for itk.<Module> wired with module-level
__getattr__ / __dir__ closures, registered in
sys.modules['itk.<Module>'], and carrying a one-line __reduce_ex__
shim for pickle / cloudpickle by-name round-trip. This is the
replacement target for the legacy LazyITKModule subclass; the next
commit swaps the construction site in itk/__init__.py.
Replace the LazyITKModule(types.ModuleType) construction in
_initialize_module() with the PEP 562 helper added in the previous
commit. Each itk.<Module> namespace is now a plain types.ModuleType
with __getattr__/__dir__ closures and a sys.modules registration, so
cloudpickle round-trips by dotted name (Tests/lazy.py).

The shared _lazy_load_lock is passed into the helper so top-level and
per-submodule lazy loads continue to serialise on a single RLock.

After this change __init__.py no longer references LazyITKModule; the
class itself remains in support/lazy.py for now and will be removed in
a follow-up commit.
The PEP 562 mechanism is now the only lazy-loading path  so the legacy
itkConfig.LazyLoading flag and its ITK_PYTHON_LAZYLOADING environment
variable are no-ops. Drop the flag definition, the docstring paragraph
documenting it, and the three test-side overrides that toggled it.

Specifically:

* Wrapping/Generators/Python/itkConfig.template.in.py: drop the
  LazyLoading docstring paragraph and the LazyLoading bool
  assignment driven by ITK_PYTHON_LAZYLOADING.
* Wrapping/Generators/Python/itk/__init__.py: refresh a leading
  comment that used LazyLoading as the canonical example of a
  mutable itkConfig flag; use DefaultFactoryLoading instead so the
  example still names a real attribute.
* Wrapping/Generators/Python/Tests/lazy.py: remove the
  itkConfig.LazyLoading=True override at the top of the test and
  refresh the now-stale "PEP 366 compliance of LazyITKModule"
  comment to reference the per-submodule namespaces that replaced
  the class.
* Wrapping/Generators/Python/Tests/multiprocess_lazy_loading.py:
  remove the itkConfig.LazyLoading=True override and the now-unused
  import itkConfig. The descriptive header comments referencing the
  *concept* of lazy loading remain intact.
* Modules/Filtering/ImageIntensity/wrapping/test/itkImageFilterNumPyInputsTest.py:
  remove the itkConfig.LazyLoading=False override and the
  associated import itkConfig (otherwise the test would raise
  AttributeError once the flag is gone).
Tests/nolazy.py was the eager-mode counterpart to Tests/lazy.py and
forced itkConfig.LazyLoading=False to exercise the non-lazy import
path. With the LazyLoading flag removed in the previous commit the
PEP 562 lazy mechanism is the only import path, so the eager test is
no longer meaningful and would raise AttributeError on import. Delete
the file and its CMake registration.

* Wrapping/Generators/Python/Tests/nolazy.py: deleted.
* Wrapping/Generators/Python/Tests/CMakeLists.txt: drop the
  itk_python_add_test(NAME PythonNoLazyModule ...) block.
Remove the legacy custom-subclass-of-types.ModuleType implementation now
that the itk package and its per-submodule namespaces resolve attributes
via module-level __getattr__ / __dir__.

- Delete Wrapping/Generators/Python/itk/support/lazy.py (LazyITKModule,
  ITKLazyLoadLock, _lazy_itk_module_reconstructor, not_loaded sentinel).
- Drop support/lazy from ITK_PYTHON_SUPPORT_MODULES in
  Wrapping/Generators/Python/CMakeLists.txt so the wheel no longer
  installs the deleted file.

A repo-wide grep confirms no live import of from itk.support import lazy,
from itk.support.lazy, support.lazy, _lazy., or LazyITKModule remains.
Extend Tests/lazy.py beyond PEP 366 + cloudpickle to assert the three
remaining contracts the PEP 562 lazy mechanism must preserve:

- PEP 562 __dir__: "Image" appears in dir(itk) and dir(ITKCommon) but
  is absent from vars(itk) / vars(ITKCommon) until first access -- the
  lazy attribute map is enumerable without forcing a SWIG load.
- Factory hook: under DefaultFactoryLoading, accessing a class whose
  module declares a needed factory (ITKIOImageBase -> ImageIO) grows
  ObjectFactoryBase.GetRegisteredFactories(); the disabled path is
  already covered by nodefaultfactories.py.
- stdlib pickle: pickle.loads(pickle.dumps(itk.ITKCommon)) returns the
  same instance, exercising the per-submodule __reduce_ex__ shim wired
  by _make_itk_lazy_submodule (a bare types.ModuleType is unpicklable
  on CPython 3.12+ without it).

Remove unused symbols:

- _lazy_submodule.py: docstring named the deleted LazyITKModule
  subclass; rephrased to describe the module's behavior directly.
- multiprocess_lazy_loading.py: header comments used CamelCase
  "LazyLoading" as a symbol; rephrased to "lazy module loading" /
  "PEP 562 __getattr__ hook" while keeping the threading contract
  the test still enforces.
- Wrapping/Generators/Python/CMakeLists.txt: directory-layout
  comment listed itk(...|LazyLoading|...).py as a static support
  file; updated to the actual filenames currently shipped.

Comment- and docstring-only; no functional change. PythonLazyModule,
PythonMultiprocessLazyLoad, and PythonLazyLoadingImage all still pass.
Add a section to the ITK 6 migration guide describing the removal of
itkConfig.LazyLoading and the ITK_PYTHON_LAZYLOADING environment
variable, which were dropped when the Python lazy-loading mechanism
was rewritten on top of PEP 562. The section distinguishes per-symbol
behavior: assignment to itkConfig.LazyLoading is silently ignored,
any subsequent read raises AttributeError, and the environment
variable is no longer consulted.
Two follow-ups from the Greptile review of PR 6183:

* Drop the dead `loaded_modules: set[str]` declaration and its
  `loaded_modules.add(target)` write in `_make_itk_lazy_submodule`.
  The set was a vestige of `LazyITKModule.__getstate__`/`__setstate__`;
  the new `__reduce_ex__` shim delegates to `importlib.import_module`
  and never consults it.
* Reword the migration-guide example for `itkConfig.LazyLoading`. The
  previous inline comment claimed the second line raised
  `AttributeError`, which was wrong: the preceding assignment had
  already set the attribute, so the read returned `False`. Split the
  example into separately commented write and read patterns and
  clarify that the `AttributeError` only occurs on a fresh
  `import itkConfig` with no prior assignment.
…heck

assert is silently stripped under `python -O`, leaving the invariant
unchecked. Replace with an explicit `if/raise` so the guard survives
optimization mode.
Bare types.ModuleType has no reducer, so pickle.dumps(itk) raises
TypeError: cannot pickle 'module' object on CPython 3.12+. Define a
module-level __reduce_ex__ that delegates to importlib.import_module,
matching the per-submodule shim. Cover the round-trip in Tests/lazy.py.
hjmjohnson added 2 commits May 4, 2026 18:13
Synthetic submodules previously shipped with __loader__=None and no
__spec__, which importlib.util.find_spec, inspect.getsourcefile,
pkgutil, and IDE introspection tools treat as a broken module. Build
a placeholder spec via importlib.util.spec_from_loader so the modules
present a normal PEP 451 surface.
Removing the LazyLoading attribute outright is a hard API break:
downstream code that reads itkConfig.LazyLoading raises AttributeError
on import. Add a one-release shim:

  - itkConfig.__getattr__ returns True for LazyLoading reads and emits
    DeprecationWarning so legacy `if itkConfig.LazyLoading:` keeps
    working.
  - ITK_PYTHON_LAZYLOADING is no longer consulted, but its presence in
    the environment now emits DeprecationWarning at itkConfig import
    time so launch scripts get a single audible heads-up.

Both shims are intended for ITK 6.x and may be removed in 7.0.
@hjmjohnson hjmjohnson force-pushed the python-lazy-mechanism branch from beae673 to e1c6adf Compare May 4, 2026 23:14
@hjmjohnson hjmjohnson self-requested a review May 4, 2026 23:18
hjmjohnson added a commit to thewtex/ITK that referenced this pull request May 4, 2026
PR InsightSoftwareConsortium#6183's first PEP 562 lazy-loading conversion serialised every
first-touch SWIG load through one process-wide threading.RLock.  When
parallel-worker code (threading.Pool, joblib threading backend, dask
local cluster) does first-touch `itk.X` from one thread and
first-touch `itk.Y` from another, the second thread blocks until
the first's load finishes — even though X and Y share no state.

Replace the single `_lazy_load_lock` with a per-SWIG-module dict of
RLocks created lazily on first lookup.  Per-module serialisation is
preserved (template registration and factory-loading hooks remain
race-free for any single SWIG module), but unrelated modules now
load in parallel.

Also refactors module-attribute access from `globals()[name]` /
`g.update(namespace)` to `getattr(this_module, name)` /
`setattr(this_module, attr, value)`.  Functionally equivalent (both
ultimately update the same module `__dict__`) but avoids non-literal
indexing of `globals()` that static analyzers (semgrep CWE-96) flag
as a code-injection foot-gun.  The same refactor lands on the
per-submodule path in support/_lazy_submodule.py.

The `_make_itk_lazy_submodule(...)` signature changes from
`lazy_load_lock` (a single RLock) to `get_module_load_lock` (a
callable returning the RLock for a given target SWIG module name)
so the per-submodule `__getattr__` can lock on the actual target
rather than the containing submodule.
hjmjohnson added a commit to thewtex/ITK that referenced this pull request May 4, 2026
Provides a debug escape hatch missing from PR InsightSoftwareConsortium#6183 after the
deprecation of `itkConfig.LazyLoading`.  When the user sets
`ITK_EAGER_IMPORT` to `1`, `true`, `yes`, or `on` in the
environment, `import itk` walks every SWIG module in the
lazy-attribute map and triggers a single first-touch `__getattr__`
on each.  Any import-time failure (missing C-extension dependency,
broken factory, version mismatch) surfaces synchronously at
`import itk` rather than at the first `itk.<thing>` attribute
access in user code, which is invaluable for triaging
"why does my itk import fail in this Docker image?".

Mirrors SPEC 1's `EAGER_IMPORT` convention used by
`scientific-python/lazy-loader`, `scikit-image`, NetworkX,
MNE-Python.  Default behaviour is unchanged (env var unset =>
fully lazy as in PR InsightSoftwareConsortium#6183).  Leaves the lazy machinery in place so
subsequent accesses still hit the fast cached path.
hjmjohnson added a commit to thewtex/ITK that referenced this pull request May 4, 2026
PR InsightSoftwareConsortium#6183's motivation cites cold-start improvements from the
PEP 562 conversion but ships zero quantitative measurements.  PEP 810
(accepted for Python 3.15) sets the ecosystem expectation at 50-70%
startup-time reduction; without an in-CI gate, ITK has no way to
detect a future change that accidentally re-introduces an eager
SWIG load at `import itk` time.

This test runs `python -X importtime -c 'import itk'` in a
subprocess, parses the cumulative cost from the bundled tracer's
output, and asserts:

  - cumulative `import itk` time stays below 5000 ms
  - `len(sys.modules)` after `import itk` stays below 400

Both ceilings are intentionally generous so the test does not flap
on slow CI runners; their job is to surface a hard regression
(e.g., a stray `import itk.ITKCommon` at module top-level that
would force the SWIG load) rather than gate ordinary noise.  The
`ITK_EAGER_IMPORT=0` env override on each subprocess defeats any
caller-set EAGER_IMPORT mode so the benchmark always measures the
lazy path.
hjmjohnson added 2 commits May 4, 2026 19:12
PR InsightSoftwareConsortium#6183's first PEP 562 lazy-loading conversion serialised every
first-touch SWIG load through one process-wide threading.RLock.  When
parallel-worker code (threading.Pool, joblib threading backend, dask
local cluster) does first-touch `itk.X` from one thread and
first-touch `itk.Y` from another, the second thread blocks until
the first's load finishes — even though X and Y share no state.

Replace the single `_lazy_load_lock` with a per-SWIG-module dict of
RLocks created lazily on first lookup.  Per-module serialisation is
preserved (template registration and factory-loading hooks remain
race-free for any single SWIG module), but unrelated modules now
load in parallel.

Also refactors module-attribute access from `globals()[name]` /
`g.update(namespace)` to `getattr(this_module, name)` /
`setattr(this_module, attr, value)`.  Functionally equivalent (both
ultimately update the same module `__dict__`) but avoids non-literal
indexing of `globals()` that static analyzers (semgrep CWE-96) flag
as a code-injection foot-gun.  The same refactor lands on the
per-submodule path in support/_lazy_submodule.py.

The `_make_itk_lazy_submodule(...)` signature changes from
`lazy_load_lock` (a single RLock) to `get_module_load_lock` (a
callable returning the RLock for a given target SWIG module name)
so the per-submodule `__getattr__` can lock on the actual target
rather than the containing submodule.
Provides a debug escape hatch missing from PR InsightSoftwareConsortium#6183 after the
deprecation of `itkConfig.LazyLoading`.  When the user sets
`ITK_EAGER_IMPORT` to `1`, `true`, `yes`, or `on` in the
environment, `import itk` walks every SWIG module in the
lazy-attribute map and triggers a single first-touch `__getattr__`
on each.  Any import-time failure (missing C-extension dependency,
broken factory, version mismatch) surfaces synchronously at
`import itk` rather than at the first `itk.<thing>` attribute
access in user code, which is invaluable for triaging
"why does my itk import fail in this Docker image?".

Mirrors SPEC 1's `EAGER_IMPORT` convention used by
`scientific-python/lazy-loader`, `scikit-image`, NetworkX,
MNE-Python.  Default behaviour is unchanged (env var unset =>
fully lazy as in PR InsightSoftwareConsortium#6183).  Leaves the lazy machinery in place so
subsequent accesses still hit the fast cached path.
hjmjohnson added a commit to thewtex/ITK that referenced this pull request May 5, 2026
PR InsightSoftwareConsortium#6183's motivation cites cold-start improvements from the
PEP 562 conversion but ships zero quantitative measurements.  PEP 810
(accepted for Python 3.15) sets the ecosystem expectation at 50-70%
startup-time reduction; without an in-CI gate, ITK has no way to
detect a future change that accidentally re-introduces an eager
SWIG load at `import itk` time.

This test runs `python -X importtime -c 'import itk'` in a
subprocess, parses the cumulative cost from the bundled tracer's
output, and asserts:

  - cumulative `import itk` time stays below 5000 ms
  - `len(sys.modules)` after `import itk` stays below 400

Both ceilings are intentionally generous so the test does not flap
on slow CI runners; their job is to surface a hard regression
(e.g., a stray `import itk.ITKCommon` at module top-level that
would force the SWIG load) rather than gate ordinary noise.  The
`ITK_EAGER_IMPORT=0` env override on each subprocess defeats any
caller-set EAGER_IMPORT mode so the benchmark always measures the
lazy path.
@hjmjohnson hjmjohnson force-pushed the python-lazy-mechanism branch from d4d359e to 9bbe050 Compare May 5, 2026 00:18
@hjmjohnson
Copy link
Copy Markdown
Member

hjmjohnson commented May 5, 2026

@thewtex @dzenanz

A careful review is needed, but this might be ready to go.

Local timing benchmark on the fix-folded tip (force-pushed d4d359e1e5 → 9bbe050991, four-line getattr → vars().get repair squashed into the PERF/RLock commit). BEFORE = main at the merge-base (ed56b57df1); AFTER = python-lazy-mechanism tip; both built with full Python wrapping (Linux x86_64, conda Python 3.13, ccache shared, 48 cores). Median of 9 runs (warmup dropped) per fresh interpreter.

Operation BEFORE AFTER Δ
Cold import itk 185.4 ms 159.4 ms −14.0%
First access itk.MedianImageFilter 1884 ms 1827 ms −3.0%
First access itk.GradientMagnitudeImageFilter 1928 ms 1854 ms −3.8%
dir(itk) 0.22 ms 0.52 ms +0.30 ms
First access itk.ITKCommon (lazy submodule) ≈0 ms ≈0 ms
Peak RSS after import itk 33.5 MiB 33.2 MiB −0.7%
pickle.dumps(itk) round-trip FAIL TypeError: cannot pickle 'module' object OK, m is itk regression repaired
pickle.dumps(itk.ITKCommon) OK but new module (same=False) OK and same=True identity preserved

Cold import itk is the headline win: −14% on a single-thread interpreter start. The first-attribute-access numbers reflect the SWIG-load cost of an entire module (ITKImageFilterBase, ITKGradientFilters, …) — the small AFTER speedup is per-module RLock + lazy bookkeeping; the real benefit shows up under threaded first-touch (not measured here, but follows directly from the PERF commit's intent). dir(itk) regression (+0.3 ms) is the lazy __dir__ building a sorted union of globals() ∪ lazy-attr keys per call — trivial in absolute terms, only visible because BEFORE's dir() was just __dict__.

RecursionError that the just-folded fix repairs

The PERF commit's refactor (semgrep CWE-96 hardening) replaced globals()[name] reads with getattr(this_module, name, _MISSING). PEP 562 __getattr__ re-fires on getattr(module, ...) when the name isn't in __dict__, so the cache check at the top of __getattr__ looped through itself until stack exhaustion on the very first attribute access (itk.Version, itk.MedianImageFilter, …).

File ".../itk/__init__.py", line 96, in __getattr__
    cached = getattr(this_module, name, _MISSING)
[Previous line repeated 995 more times]
RecursionError: maximum recursion depth exceeded

Four call sites (two in itk/__init__.py, two in itk/support/_lazy_submodule.py) now use vars(this_module).get(name, _MISSING) — direct __dict__ lookup, bypasses the descriptor protocol and PEP 562, preserves the semgrep-friendly read pattern (no non-literal globals indexing). Folded into 749c6a61d1 PERF commit so the bisect history stays clean.

importtime tree top entries

BEFORE itk self-time 23.9 ms cum 168.1 ms; AFTER itk self-time 3.9 ms cum 180.5 ms. The cumulative-time delta is dominated by numpy walking 71.7 → 118.2 ms between runs (likely cold-page-cache flake, not a PR effect — numpy.lib._arraypad_impl etc. show up only on the slower run). The ITK self-time drop (23.9 → 3.9 ms = −84%) is the genuine PEP-562 savings: the top-level module no longer eagerly enumerates SWIG modules.

Test environment
  • BEFORE source+build: /home/johnsonhj/src/ITK HEAD ed56b57df1 (merge PR BUG: SWIG typechecks reject foreign wrapped types #5310, the merge-base of upstream/main and python-lazy-mechanism)
  • AFTER source+build: git worktree at python-lazy-mechanism tip 9bbe050991
  • Both: pixi run configure-python && pixi run build-python (default ITK_WRAP_* settings, full Python wrapping)
  • Python: /home/johnsonhj/src/ITK/.pixi/envs/python/bin/python (3.13, conda-forge), shared between BEFORE and AFTER
  • Hardware: Linux x86_64, 48 cores, shared ccache (~/.cache/ccache)
  • Each timing run: fresh subprocess, time.perf_counter() deltas captured inside Python; /usr/bin/time -v for RSS
  • 10 runs per experiment, first dropped as warmup, median + IQR (Q1–Q3) reported
  • Raw 9-run arrays + importtime trees + build logs preserved locally at .devlocal/pr6183-timing/

PR InsightSoftwareConsortium#6183's motivation cites cold-start improvements from the
PEP 562 conversion but ships zero quantitative measurements.  PEP 810
(accepted for Python 3.15) sets the ecosystem expectation at 50-70%
startup-time reduction; without an in-CI gate, ITK has no way to
detect a future change that accidentally re-introduces an eager
SWIG load at `import itk` time.

This test runs `python -X importtime -c 'import itk'` in a
subprocess, parses the cumulative cost from the bundled tracer's
output, and asserts:

  - cumulative `import itk` time stays below 5000 ms
  - `len(sys.modules)` after `import itk` stays below 400

Both ceilings are intentionally generous so the test does not flap
on slow CI runners; their job is to surface a hard regression
(e.g., a stray `import itk.ITKCommon` at module top-level that
would force the SWIG load) rather than gate ordinary noise.  The
`ITK_EAGER_IMPORT=0` env override on each subprocess defeats any
caller-set EAGER_IMPORT mode so the benchmark always measures the
lazy path.
@hjmjohnson hjmjohnson force-pushed the python-lazy-mechanism branch from 9bbe050 to ed35abe Compare May 5, 2026 00:50
@hjmjohnson
Copy link
Copy Markdown
Member

Ran the full Python-wrapping ctest suite against the fix-folded tip and pushed one follow-up to keep the new cold-start gate in budget.

Suite result: 3477 / 3477 pass (ctest --test-dir build-python -j8, 227 sec wall-clock, Linux x86_64, full Python wrapping). One test (PythonLazyImportTime) initially failed; gate ceiling tightened in the original d4d359e1e5 was empirically too low for full wrapping. Folded a ceiling bump (400 → 750) plus an explanatory docstring into the gate commit; force-pushed 9bbe050991 → ed35abeda1 (lease pinned).

Why 400 was too tight

import itk with full Python wrapping legitimately produces ~520 entries in sys.modules:

Source Count
Stdlib baseline ~35
numpy + transitive ~130
itk.support.* 9
itk.<SWIG-module> synthetic placeholders (one per wrapped module) ~100
itk.Configuration.<Module>{Config,_snake_case} data modules (two per wrapped module) ~200
Stdlib import trampolines (importlib, pkgutil, runpy, …) ~50
Total ~524

The itk.Configuration.* files are pure-Python data tables that populate _lazy_attribute_to_module — required at import-time so that itk.MedianImageFilter knows it lives in itk.ITKImageFilterBase. They are not eager SWIG loads; they are the lookup-table the lazy mechanism builds. The original docstring's "30-50 entries" estimate corresponds to a tiny WRAP_* set (a handful of modules), not the full default wrap.

Forward-compatibility note added to the gate

The new docstring spells out the linear scaling so the next person who hits this knows to bump rather than spelunk:

The ceiling is set generously so slow CI runners do not flap. It scales linearly with the wrapped-module count: each new SWIG module added to the default wrap contributes roughly 3 entries (one synthetic submodule + two Configuration data modules), so the ceiling must be raised when new modules are added to the default wrap. Tighten if the headroom shrinks; raise if a wrap-set expansion (e.g. new remote-module ingest) takes the live count within ~50 of the ceiling.

Headroom at 750 vs current 524: 226 entries ≈ ~75 new SWIG modules of cushion before the gate trips again.

Other test categories (all green)
  • C++ unit tests, GoogleTest drivers
  • All Python* lazy/eager/multiprocess/GIL-release tests (PythonLazyModule, PythonNoDefaultFactories, PythonMultiprocessLazyLoad, PythonGILReleaseTest, PythonGILReleaseSafetyTest, PythonModifiedTimeTest, …)
  • All itk*PythonTest wrappers
  • Pickle/unpickle of itk and itk.<submodule> in lazy.py
  • Skipped (Linux-irrelevant): NumericLocale.WorksWithDifferentInitialLocale

Full ctest log preserved locally at .devlocal/pr6183-timing/test-after/ctest.log.

@dzenanz
Copy link
Copy Markdown
Member

dzenanz commented May 5, 2026

I have been reviewing so much lately, I can't spare time to carefully review this. I leave it to Matt, Brad, and Simon.

@dzenanz dzenanz requested a review from blowekamp May 5, 2026 15:08
Copy link
Copy Markdown

@SimonRit SimonRit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My knowledge is too limited on this topic to provide a meaningful review... LGTM!

@thewtex
Copy link
Copy Markdown
Member Author

thewtex commented May 5, 2026

@hjmjohnson @dzenanz @SimonRit thanks for the reviews 🙏 @hjmjohnson thanks for the follow-ups and extended testing.

This is expected to result in somewhat of a performance bump but also help with bugs as Hans' testing noted. There are additional performance gains and import issues that this should address in a more significant way such as the use case of the recent Discourse thread (use under multithreading).

In terms of impact, cleaning up the module dependencies and DAG will be much higher in terms of performance.

I will not have time to follow-up with issues until next week at the earliest, so I will defer merge until then. Others are welcome to merge before then as long as they follow up with other issues that arise. :-)

@hjmjohnson hjmjohnson merged commit ceffae2 into InsightSoftwareConsortium:main May 5, 2026
19 checks passed
@hjmjohnson hjmjohnson added this to the ITK 6.0.0 milestone May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Documentation Issues affecting the Documentation module area:Filtering Issues affecting the Filtering module area:Python wrapping Python bindings for a class type:Enhancement Improvement of existing methods or implementation type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants