WIP: Add v4 remote-module ingestion (two-phase, per-commit hooks)#6204
Draft
hjmjohnson wants to merge 19 commits intoInsightSoftwareConsortium:mainfrom
Draft
WIP: Add v4 remote-module ingestion (two-phase, per-commit hooks)#6204hjmjohnson wants to merge 19 commits intoInsightSoftwareConsortium:mainfrom
hjmjohnson wants to merge 19 commits intoInsightSoftwareConsortium:mainfrom
Conversation
dzenanz
reviewed
May 4, 2026
| @@ -0,0 +1,25 @@ | |||
| # IOMeshSTL — files that migrate into ITK at Modules/IO/MeshSTL/ | |||
Member
There was a problem hiding this comment.
Is there reason this list is kept per remote?
| @@ -0,0 +1,539 @@ | |||
| #!/usr/bin/env python3 | |||
| """sanitize-history.py — Apply ITK formatting and commit-prefix conventions | |||
Member
There was a problem hiding this comment.
It feels like "sanitize history" script already exists somewhere.
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 4, 2026
Many ITK remote modules' itk-module.cmake reads
file(READ "${MY_CURRENT_DIR}/README.rst" DOCUMENTATION)
to populate the CMake DESCRIPTION variable from the upstream README.
The v4 whitelist intentionally excludes the upstream README (the
operator ships a new README.md as part of the ingest PR), so without
this rewrite every commit's CMakeLists fails to configure post-ingest
until the operator manually patches itk-module.cmake.
sanitize-history.py now rewrites the file(READ ... README.rst ...)
reference to README.md as part of the cmake-blob transform, so every
historical commit on the rewritten branch is independently buildable.
Surfaced on the IOMeshSTL ingest (PR following PR InsightSoftwareConsortium#6204).
This was referenced May 4, 2026
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 5, 2026
….txt In-tree, ITK's top-level CMakeLists pins the CMake minimum, so the per-module declaration is dead code. Pre-emptive cleanup matching the v4 ingest-pipeline rule codified in PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required).
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 5, 2026
…mage In-tree, ITK_SOURCE_DIR is always defined and ITK's top-level CMakeLists pins the CMake minimum version. The per-module cmake_minimum_required line and if(NOT ITK_SOURCE_DIR) ... else() itk_module_impl() endif() guard are dead code. Pre-emptive cleanup matching the v4 ingest-pipeline rules codified in PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required and :patch_standalone_build_guard).
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 5, 2026
In-tree, ITK's top-level CMakeLists pins the CMake minimum, so the per-module declaration is dead code. Matches the v4 ingest-pipeline rule in PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required).
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 5, 2026
In-tree, ITK's top-level CMakeLists pins the CMake minimum, so the per-module declaration is dead code. Matches the v4 ingest-pipeline rule in PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required).
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 5, 2026
…shMZ3
In-tree, ITK_SOURCE_DIR is always defined and ITK's top-level
CMakeLists pins the CMake minimum version. The per-module
cmake_minimum_required line and if(NOT ITK_SOURCE_DIR) ... else()
itk_module_impl() endif() guard are dead code. The 'set(ITK_DIR
${CMAKE_BINARY_DIR})' line that was inside the in-tree else branch
is preserved (now unconditional) since it actually had effect during
the in-tree build path. Matches the v4 ingest-pipeline rules in
PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required and
:patch_standalone_build_guard).
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 5, 2026
…shMZ3
In-tree, ITK_SOURCE_DIR is always defined and ITK's top-level
CMakeLists pins the CMake minimum version. The per-module
cmake_minimum_required line and if(NOT ITK_SOURCE_DIR) ... else()
itk_module_impl() endif() guard are dead code. The 'set(ITK_DIR
${CMAKE_BINARY_DIR})' line that was inside the in-tree else branch
is preserved (now unconditional) since it actually had effect during
the in-tree build path. Matches the v4 ingest-pipeline rules in
PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required and
:patch_standalone_build_guard).
hjmjohnson
added a commit
that referenced
this pull request
May 5, 2026
Four already-merged remote-module ingests still carry their
upstream-original boilerplate at the top of their module CMakeLists:
- cmake_minimum_required(VERSION X.Y.Z) — redundant; ITK's top-level
CMakeLists pins a higher minimum.
- if(NOT ITK_SOURCE_DIR) ... find_package(ITK) ... else()
itk_module_impl() endif() — dead code in-tree because
ITK_SOURCE_DIR is always defined.
Strip both from:
Modules/Filtering/AnisotropicDiffusionLBR
Modules/Filtering/Cuberille
Modules/Filtering/LabelErodeDilate
Modules/Registration/Montage
Lines that had effect inside the in-tree else() branch
(set(ITK_DIR ${CMAKE_BINARY_DIR}) on AnisotropicDiffusionLBR and
Montage) are preserved unconditionally.
The other four merged ingests (GenericLabelInterpolator, MGHIO,
FastBilateral, MeshNoise) were already cleaned up during their
respective ingest PRs.
Matches the v4 ingest-pipeline rules now codified in PR #6204
(sanitize-history.py:patch_drop_cmake_minimum_required and
:patch_standalone_build_guard); future ingests will not introduce
this idiom.
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 5, 2026
…mage In-tree, ITK_SOURCE_DIR is always defined and ITK's top-level CMakeLists pins the CMake minimum version. The per-module cmake_minimum_required line and if(NOT ITK_SOURCE_DIR) ... else() itk_module_impl() endif() guard are dead code. Pre-emptive cleanup matching the v4 ingest-pipeline rules codified in PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required and :patch_standalone_build_guard).
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 5, 2026
…mage In-tree, ITK_SOURCE_DIR is always defined and ITK's top-level CMakeLists pins the CMake minimum version. The per-module cmake_minimum_required line and if(NOT ITK_SOURCE_DIR) ... else() itk_module_impl() endif() guard are dead code. Pre-emptive cleanup matching the v4 ingest-pipeline rules codified in PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required and :patch_standalone_build_guard).
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 5, 2026
In-tree, ITK's top-level CMakeLists pins the CMake minimum, so the per-module declaration is dead code. Matches the v4 ingest-pipeline rule in PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required).
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 6, 2026
In-tree, ITK's top-level CMakeLists pins the CMake minimum, so the per-module declaration is dead code. Matches the v4 ingest-pipeline rule in PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required).
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 6, 2026
Four already-merged remote-module ingests still carry their
upstream-original boilerplate at the top of their module CMakeLists:
- cmake_minimum_required(VERSION X.Y.Z) — redundant; ITK's top-level
CMakeLists pins a higher minimum.
- if(NOT ITK_SOURCE_DIR) ... find_package(ITK) ... else()
itk_module_impl() endif() — dead code in-tree because
ITK_SOURCE_DIR is always defined.
Strip both from:
Modules/Filtering/AnisotropicDiffusionLBR
Modules/Filtering/Cuberille
Modules/Filtering/LabelErodeDilate
Modules/Registration/Montage
Lines that had effect inside the in-tree else() branch
(set(ITK_DIR ${CMAKE_BINARY_DIR}) on AnisotropicDiffusionLBR and
Montage) are preserved unconditionally.
The other four merged ingests (GenericLabelInterpolator, MGHIO,
FastBilateral, MeshNoise) were already cleaned up during their
respective ingest PRs.
Matches the v4 ingest-pipeline rules now codified in PR InsightSoftwareConsortium#6204
(sanitize-history.py:patch_drop_cmake_minimum_required and
:patch_standalone_build_guard); future ingests will not introduce
this idiom.
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 6, 2026
….txt In-tree, ITK's top-level CMakeLists pins the CMake minimum, so the per-module declaration is dead code. Pre-emptive cleanup matching the v4 ingest-pipeline rule codified in PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required).
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 6, 2026
…shMZ3
In-tree, ITK_SOURCE_DIR is always defined and ITK's top-level
CMakeLists pins the CMake minimum version. The per-module
cmake_minimum_required line and if(NOT ITK_SOURCE_DIR) ... else()
itk_module_impl() endif() guard are dead code. The 'set(ITK_DIR
${CMAKE_BINARY_DIR})' line that was inside the in-tree else branch
is preserved (now unconditional) since it actually had effect during
the in-tree build path. Matches the v4 ingest-pipeline rules in
PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required and
:patch_standalone_build_guard).
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 6, 2026
…mage In-tree, ITK_SOURCE_DIR is always defined and ITK's top-level CMakeLists pins the CMake minimum version. The per-module cmake_minimum_required line and if(NOT ITK_SOURCE_DIR) ... else() itk_module_impl() endif() guard are dead code. Pre-emptive cleanup matching the v4 ingest-pipeline rules codified in PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required and :patch_standalone_build_guard).
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 6, 2026
In-tree, ITK's top-level CMakeLists pins the CMake minimum, so the per-module declaration is dead code. Matches the v4 ingest-pipeline rule in PR InsightSoftwareConsortium#6204 (sanitize-history.py:patch_drop_cmake_minimum_required).
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 8, 2026
Phase A v4 ingest of the standalone remote module ntustison/ITKAdaptiveDenoising into Modules/Filtering/AdaptiveDenoising/. Whitelisted upstream history was sanitized via the v4 pipeline (see InsightSoftwareConsortium#6204), then stripped of all raw .nrrd test fixtures via 'git filter-repo --invert-paths' so that no raw binary blobs land in ITK's git history. Sibling .cid stubs are added in a follow-up commit; their fetch path is published via ITKTestingData PR InsightSoftwareConsortium#48. Stacked on the pixi 0.68 multi-line cmd compatibility fix from InsightSoftwareConsortium#6236.
This was referenced May 8, 2026
Splits v3's single-driver flow into two phases. Phase A (ingest-module-v4.sh): local + reversible; no upstream pushes. Phase B (archive-remote-module.sh): only after Phase A merges; pushes upstream removal commit + promotes MIGRATION_README.md to README.md + flips archived=true. Each historical commit re-formatted by sanitize-history.py (clang-format / black / gersemi / trailing-ws / EOF / line-ending) and ghostflow-conformed (subject <=78, blank line, exec-bit cleared on text). Persistent artifact: per-module whitelist.
Strips Dockerfile*, .github/*, azure-pipelines*, appveyor.yml, circle.yml, .gitlab-ci.yml, .travis.yml from history so CI scaffolding doesn't leak into the ingested subtree.
Validated on Cuberille (InsightSoftwareConsortium#6205): ghostflow now reports only the unavoidable upstream root-commit error. - truncate_subject_to_body() for subjects >78 chars - ensure_blank_line_after_subject() inserts missing blank line - commit_callback clears exec-bit on text-classified blobs - deny-pass strips *.orig / *.rej / *.BACKUP.* / *.LOCAL.* / *.REMOTE.* / *.BASE.*
Many remote modules' itk-module.cmake reads README.rst via file(READ). After ingest the .rst is stripped (README.md takes over per the archival convention), so sanitize rewrites the file(READ) target to README.md to keep itk-module.cmake configuring cleanly.
…terns Documents the two recurring review patterns: - ghostflow's single unfixable root-commit error per ingest - Greptile draft-review request before promoting to ready-for-review
Strip ExternalData/ caches from history so cached binaries don't leak through the whitelist boundary.
Many remote modules wrap their CMakeLists.txt in 'if(NOT ITK_SOURCE_DIR) ... else() ... endif()' so the standalone build differs from the ITK-embedded build. After ingest the standalone branch is dead code; sanitize unwraps the else-branch (the ITK-embedded path) and drops the guard.
Adds a table separating what sanitize-history handles automatically from what the operator must fix manually after Phase A merges.
After ingest the module's cmake_minimum_required is dead (top-level ITK CMakeLists owns it). Sanitize strips it so multiple floors don't compete.
…cmake
Replace 'file(READ ${...}/README.md DESCRIPTION)' in module
CMakeLists with a static set(DOCUMENTATION ...) so the module
configures cleanly even before the README is laid down.
c7768a5 to
d43cb90
Compare
Adds five fixups that reviewers have flagged on every recent ingest (PRs InsightSoftwareConsortium#6238 TextureFeatures, InsightSoftwareConsortium#6240 PrincipalComponentsAnalysis): * itk_module_examples() — examples/ is not ingested * cmake_policy(CMP...) — ITK top-level pins all policies * cmake_dependent_option(Module_${X}_BUILD_DOCUMENTATION ...) and the matching if(...) add_subdirectory(doc) endif() block * per-module LICENSE / COPYING — ITK's root LICENSE applies in-tree * ${DOCUMENTATION} kept as-is when sourced from a literal set(DOCUMENTATION ...) block; only the file(READ README.md ...) form is replaced by a static one-liner.
This was referenced May 9, 2026
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 9, 2026
Replace the set(DOCUMENTATION "...") + DESCRIPTION "${DOCUMENTATION}"
indirection with a static one-liner literal.
itk_module() is a CMake macro, so ${ARGN} re-tokenizes its arguments
and list-splits any embedded ";". A future edit adding a semicolon or
"[" to the DOCUMENTATION string would silently produce spurious
"Unknown argument" AUTHOR_WARNINGs from CMake/ITKModuleMacros.cmake:111
on every configure (see PRs InsightSoftwareConsortium#6220, InsightSoftwareConsortium#6245).
The v4 ingestion pipeline (PR InsightSoftwareConsortium#6204) enforces this via
sanitize-history.py:patch_dynamic_description.
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 9, 2026
Replace the set(DOCUMENTATION "...") + DESCRIPTION "${DOCUMENTATION}"
indirection with a static one-liner literal.
itk_module() is a CMake macro, so ${ARGN} re-tokenizes its arguments
and list-splits any embedded ";". A future edit adding a semicolon or
"[" to the DOCUMENTATION string would silently produce spurious
"Unknown argument" AUTHOR_WARNINGs from CMake/ITKModuleMacros.cmake:111
on every configure (see PRs InsightSoftwareConsortium#6220, InsightSoftwareConsortium#6245).
The v4 ingestion pipeline (PR InsightSoftwareConsortium#6204) enforces this via
sanitize-history.py:patch_dynamic_description.
hjmjohnson
added a commit
to hjmjohnson/ITK
that referenced
this pull request
May 9, 2026
Replace the set(DOCUMENTATION "...") + DESCRIPTION "${DOCUMENTATION}"
indirection with a static one-liner literal.
itk_module() is a CMake macro, so ${ARGN} re-tokenizes its arguments
and list-splits any embedded ";". A future edit adding a semicolon or
"[" to the DOCUMENTATION string would silently produce spurious
"Unknown argument" AUTHOR_WARNINGs from CMake/ITKModuleMacros.cmake:111
on every configure (see PRs InsightSoftwareConsortium#6220, InsightSoftwareConsortium#6245).
The v4 ingestion pipeline (PR InsightSoftwareConsortium#6204) enforces this via
sanitize-history.py:patch_dynamic_description.
Two ghostflow-check-main violations slipped through v4 sanitization on TextureFeatures ingest (PR InsightSoftwareConsortium#6238): 1. .sha512 / .cid content-link sidecars without trailing newline. is_skip_content() returned True for any single-line hex hash blob, bypassing apply_universal_text_fixers (and therefore fix_end_of_file). Hash content has zero risk from the universal fixers (rstrip + single '\n' append), so drop the hex-hash skip branch and let those blobs flow through the normal text-fixer path. 2. Editor backup files (*~) survived ingestion. Adds **/*~ to the filter-repo --invert-paths deny-pass alongside *.orig / *.rej / *.BACKUP.* / *.LOCAL.* / *.REMOTE.* / *.BASE.*. Documented both in INGESTION_STRATEGY_v4.md's sanitizer table.
hjmjohnson
added a commit
that referenced
this pull request
May 9, 2026
Drop the file(READ README.md DOCUMENTATION) preamble and replace
DESCRIPTION "${DOCUMENTATION}" with a static one-liner.
itk_module() is a CMake macro, so ARGN re-tokenizes its arguments and
list-splits any embedded ';' characters. The PolarTransform README
contains 4 semicolons, producing 4 spurious "Unknown argument"
AUTHOR_WARNINGs from CMake/ITKModuleMacros.cmake:111 on every
configure.
Same canonical fix applied in PR #6220 to RLEImage, SplitComponents,
IOFDF, IOMeshMZ3, IOMeshSTL; PolarTransform was missed because it
landed in parallel. The v4 ingestion pipeline (PR #6204) prevents
this regression via sanitize-history.py:patch_dynamic_description.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
v4 remote-module ingestion pipeline. Splits the v3 single-driver flow into two phases — Phase A is local + reversible; Phase B publishes upstream only after the Phase A PR has merged. Every historical commit is independently reformatted (clang-format / black / gersemi / trailing-ws / EOF / line-ending) and ghostflow-conformed (subject ≤ 78 chars, blank line after subject, exec-bit cleared on text), so each commit independently passes ITK's CI gates.
WIP — design feedback welcome. Validated on Cuberille (#6205), SubdivisionQuadEdgeMeshFilter (#6229, merged), AdaptiveDenoising (#6235), TextureFeatures (#6238), PrincipalComponentsAnalysis (#6240). Ghostflow now reports only the unavoidable upstream-root-commit error.
Two-phase rationale
ingest-module-v4.sharchive-remote-module.shMIGRATION_README.mdtoREADME.md+ flipsarchived=truePhases share one persistent artifact: the whitelist at
Utilities/Maintenance/RemoteModuleIngest/whitelists/<Module>.list. Correctness property: upstream is never touched until ITK has accepted the migration.Pre-commit + ghostflow coverage
clang-format(.c .cc .cxx .h .hxx)black(.py)gersemi(.cmake .wrap CMakeLists.txt)trailing-whitespace,end-of-file-fixer,mixed-line-endingtruncate_subject_to_body()— overflow moves to body, prefers word-breakensure_blank_line_after_subject()*.orig/*.rej/*.BACKUP.*/*.LOCAL.*/*.REMOTE.*/*.BASE.*commit_callbackclears100755 → 100644for known text extsDockerfile*,.github/*,azure-pipelines*,appveyor.yml, etc.Heuristic commit-prefix mapping
When a historical commit subject lacks a recognized ITK prefix,
sanitize-history.pyauto-prepends one based on keywords in the subject. Decisions logged for operator audit.fix,bug,crash,segfault,leak,regression,corrupt,deadlockBUG:cmake,compil,warning,build,ci,clang,gcc,msvcCOMP:doc,readme,comment,license,sphinx,doxygenDOC:style,whitespace,format,rename,lint,clean up,reorderSTYLE:perf,optim,speed,faster,efficienPERF:ENH:(default)Files added
Utilities/Maintenance/RemoteModuleIngest/INGESTION_STRATEGY_v4.md— designUtilities/Maintenance/RemoteModuleIngest/ingest-module-v4.sh— Phase A driverUtilities/Maintenance/RemoteModuleIngest/archive-remote-module.sh— Phase B driverUtilities/Maintenance/RemoteModuleIngest/sanitize-history.py—git_filter_repocallback driverUtilities/Maintenance/RemoteModuleIngest/whitelists/<Module>.list— per-module whitelist (one per consumer PR)V3 scripts retained until v4 has produced two successful ingests.
Open review questions
STYLE:follow-up. Acceptable, or verify version match before running?check-shebang-scripts-are-executabledeferred to post-merge gate. Add now?