COMP: Use pzero() for SelfadjointMatrixVector accumulators (silences GCC -O3 IPA-CP false positives)#6166
Conversation
|
| Filename | Overview |
|---|---|
| Modules/ThirdParty/Eigen3/src/itkeigen/Eigen/src/Core/products/SelfadjointMatrixVector.h | Replaces pset1<Packet>(t2/t3) with pzero(Packet{}) for zero-initialized packet accumulators; semantically identical but eliminates GCC IPA-CP false-positive -Wmaybe-uninitialized warnings at -O3. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["selfadjoint_matrix_vector_product::run()"] --> B["Initialize ptmp0 = pset1(t0)\nInitialize ptmp1 = pset1(t1)"]
B --> C["Initialize ptmp2 = pzero(Packet{})\n(was: pset1(t2) where t2=0)\nInitialize ptmp3 = pzero(Packet{})\n(was: pset1(t3) where t3=0)"]
C --> D{"FirstTriangular?"}
D -- Yes --> E["Accumulate t3 scalar\nAccumulate ptmp2/ptmp3 packets\nvia aligned vectorized loop"]
D -- No --> F["Accumulate t2 scalar\nAccumulate ptmp2/ptmp3 packets\nvia aligned vectorized loop"]
E --> G["Reduce packets → scalars\nt2 += predux(ptmp2)\nt3 += predux(ptmp3)"]
F --> G
G --> H["Write results back to res[j], res[j+1]"]
Reviews (1): Last reviewed commit: "COMP: Use pzero() for SelfadjointMatrixV..." | Re-trigger Greptile
GCC -O3 IPA-CP emits false-positive -Wmaybe-uninitialized on the
cloned `selfadjoint_matrix_vector_product::run.constprop.isra`:
`pset1<Packet>(Scalar(0))` is opaque to interprocedural constant
propagation, while `pzero(Packet{})` (used by upstream Eigen master)
is visible as a constant. Backport just the ptmp2/ptmp3 init lines.
e4799ec to
91f6232
Compare
527b6c7
into
InsightSoftwareConsortium:main
Follow-up to #6166. On ubuntu-24.04 + GCC 13/14 + C++20 (the toolchain that #6164 introduces), GCC -O3 IPA-CP emits additional false-positive -Wmaybe-uninitialized at the outer selfadjoint_product_impl::run() call site, attributing the read to "argument 3 of type const float *" of the cloned selfadjoint_matrix_vector_product::run.constprop.isra. Add pzero(Packet{}) declare-then-assign init for the four Packet locals (A0i, A1i, Bi, Xi) loaded from the lhs/rhs/res pointers inside the inner loop. Mirrors the #6166 ptmp2/ptmp3 pattern: pzero() is visible to GCC's IPA-CP as a definite zero starting state, while a bare declaration plus subsequent ploadu/pload is opaque and lets the optimizer trace an uninit path through cloned variants.
Switch the two
ptmp2/ptmp3packet-accumulator initializations inEigen/src/Core/products/SelfadjointMatrixVector.hfrompset1<Packet>(t)topzero(Packet{}), matching upstream Eigen master. This silences four false-positive-Wmaybe-uninitializedwarnings GCC emits at-O3after IPA-CP clones the function asrun.constprop.isra.Must merge before #6164, which switches
ITK.Linux.PythonCI fromMinSizeReltoReleaseand is currently red because of these exact warnings.Why the existing in-header diagnostic-pop doesn't help
Modules/ThirdParty/Eigen3/src/itkeigen/Eigen/src/Core/products/SelfadjointMatrixVector.halready wraps its body in:That suppression is stateful at parse time — it applies while GCC parses this header. The actual warnings are emitted from
selfadjoint_matrix_vector_product::run.constprop.isra, a cloned function GCC creates during interprocedural constant propagation at-O2/-O3. The diagnostic state at the point GCC analyses the clone is whatever the call-site translation unit had active, not the suppression baked into the header. This is a long-standing GCC interaction with-fipa-cp-clone.The proper fix is therefore at the source: give GCC IPA-CP a clearly-zero starting packet so it doesn't ever build a "maybe uninit" hypothesis in the first place.
What changed (2-line diff)
Algorithmic behaviour is unchanged:
pset1<Packet>(0)andpzero(Packet{})both produce a zero packet. Upstream Eigen master uses the latter form because:Packet{}is value-initialization (explicit zero, visible to the optimizer as a constant).pzerois a small inline that returns a zero packet of the deduced type.pzeroalready exists in our vendored Eigen atModules/ThirdParty/Eigen3/src/itkeigen/Eigen/src/Core/GenericPacketMath.h:425, so no new symbols are needed.Upstream Eigen master also did a separate, larger refactor of this same kernel (2-column → 4-column processing, four accumulators
ptmp4–ptmp7). That refactor is intentionally not included here — it's a bigger change with its own performance trade-offs and reviewer surface, orthogonal to fixing the warning.Why this affects ITK.Linux.Python specifically
ITK.Linux.Pythonbuilds the SuperBuild withITK_WRAP_PYTHON=ON, which generates many SWIG wrapper translation units that include Eigen and instantiateselfadjoint_matrix_vector_product. The combination of (a)-O3(Release), (b) GCC's-fipa-cp-cloneenabled at that level, and (c) the high template instantiation count is what makes the warnings reproducible there.MinSizeRel(-Os) typically disables IPA-CP cloning, which is why the warnings only appeared after #6164's MinSizeRel→Release switch.Empirical CDash evidence from #6164 build #14600: 0 errors, 4 warnings, 175 tests passed. The warning text was:
_28andresult_90are GCC SSA temporaries inside the cloned function; they correspond toptmp2/ptmp3after IPA-CP rewriting.Test plan
ITK.Linux.Python(Release) builds with 0 warnings on this branch (this PR's CI).LinuxPythonred should clear.