COMP: Deprecate linearizing ingest helper; add merge-preserving rewriter#6162
Conversation
|
| Filename | Overview |
|---|---|
| Utilities/Maintenance/RemoteModuleIngest/rewrite-history-merge-preserving.py | New merge-preserving rewriter; all 7 previously-flagged issues resolved. One remaining edge case: normalize() converts empty files to a single newline, which may diverge from pre-commit's end-of-file-fixup behavior for intentionally-empty files. |
| Utilities/Maintenance/RemoteModuleIngest/normalize-ingest-commits.py | Deprecation guard added correctly; returns exit code 3 with clear notice unless --i-understand-this-linearizes is passed. Implementation is clean. |
| Utilities/Maintenance/RemoteModuleIngest/INGESTION_STRATEGY.md | Adds mandatory merge-topology requirements, linearization-forbidden section with allowed-tool table, and explicit deprecation notice for normalize-ingest-commits.py. |
Reviews (5): Last reviewed commit: "COMP: Deprecate linearizing ingest helpe..." | Re-trigger Greptile
262ff53 to
8b70516
Compare
|
Addressed greptile P1 in 8b70516 — added @greptileai re-review the merge-preserving rewriter. |
8b70516 to
66267cb
Compare
|
Addressed greptile's three new P1s in 66267cb:
@greptileai re-review. |
66267cb to
cfd667c
Compare
|
Addressed greptile's round-3 P1 in cfd667c: dropped
@greptileai re-review. |
cfd667c to
3345974
Compare
|
Addressed greptile's round-4 P1 + cleanup in 3345974:
@greptileai re-review. |
The normalize-ingest-commits.py script replays each non-merge upstream commit via cherry-pick and skips merges entirely. The result loses the upstream merge topology, which the strategy doc has always mandated as preserved. Reviewers flagged this on PRs InsightSoftwareConsortium#6135, InsightSoftwareConsortium#6137, InsightSoftwareConsortium#6159, and InsightSoftwareConsortium#6161 before it became a rule violation. This commit: 1. Adds a hard refuse-to-run guard to normalize-ingest-commits.py (exit 3 with deprecation notice unless --i-understand-this-linearizes is passed). The guard names the replacement and points at the strategy doc. 2. Adds rewrite-history-merge-preserving.py — a git-filter-repo driven rewriter that walks every commit (merges included) and applies uniform text-blob normalization (trailing-whitespace strip + EOF-newline fix). Implements the user-mandated Phase 1 reference / Phase 2 traversal / Phase 3 verification architecture. 3. Updates INGESTION_STRATEGY.md with the "Linearization is forbidden" subsection, an allowed-tool table for per-commit re-formatting, and the new operator check that the merge count satisfies 1 + UPSTREAM_INTERNAL_MERGES. Language-specific formatters (clang-format, gersemi) are intentionally NOT applied per blob — they require the destination's current config, and applying them retroactively across upstream history would produce spurious diffs. The recommended pattern is a single "STYLE: Apply current formatters" commit at the rewritten tip after Phase 2 succeeds.
3345974 to
1f41862
Compare
|
@dzenanz This is to be more rigorous about not linearizing the history. |
dzenanz
left a comment
There was a problem hiding this comment.
I like the intent. I did not look at the new script. As this is maintenance-related code, it does not need to be highly scrutinized. If someone does take a look, they can suggest changes via a PR.
Locks the door on the bug that linearized the topology of the GenericLabelInterpolator (#6135), MGHIO (#6137), FastBilateral (#6159), and MeshNoise (#6161) ingest PRs. The old
normalize-ingest-commits.pyis now a hard refuse-to-run, and a new merge-preserving rewriter implements the Phase 1 / Phase 2 / Phase 3 / Phase 4 architecture #6160 references.Why
normalize-ingest-commits.pyreplayed each upstream non-merge commit viagit cherry-pickand explicitly skipped merges ("Skipping N merge commit(s); their content arrives via the non-merge commits"). That trades the upstream merge topology for a linear cherry-pick chain. The strategy doc has always said merge topology is mandatory; the script silently violated that. After repeated reviewer push-back the author asked for a hard guard plus a replacement.What changed
Utilities/Maintenance/RemoteModuleIngest/normalize-ingest-commits.py--i-understand-this-linearizesis passed. Notice names the replacement and points at the strategy doc.Utilities/Maintenance/RemoteModuleIngest/rewrite-history-merge-preserving.pygit filter-repo --blob-callbackto apply uniform text-blob normalization (trailing-whitespace strip + EOF-newline fix) across every blob in<base>..<branch>. Merges are preserved by default. After the rewrite, asserts that the merge count is at least--expected-merges. Phase-1-only mode (--phase-1) clones upstream, applies destination pre-commit hooks once, and emits the reference tree SHA1 for the operator to compare against.Utilities/Maintenance/RemoteModuleIngest/INGESTION_STRATEGY.mdnormalize-ingest-commits.py.What this intentionally doesn't do
STYLE: Apply current formatterscommit at the rewritten tip after Phase 2 succeeds. PR ENH: Ingest ITKFastBilateral into Modules/Filtering (supersedes #5134) #6159 and ENH: Ingest ITKMeshNoise into Modules/Filtering (closes #5174) #6161 used this pattern successfully.ingest-remote-module.sh. This script is a purely-history-rewrite tool that the ingest script (or a manual operator) can invoke.normalize-ingest-commits.py. Keeping the file with the deprecation guard preserves backlinks and lets anyone with--i-understand-this-linearizesrecover the old behavior in an emergency.Closes the linearization-at-ingest class of bugs flagged on the prior four ingest PRs. Master tracker: #6160.