Skip to content

ENH: VTI ImageIO follow-ups: scoping, exception-safety, GTest, F-011#6178

Closed
hjmjohnson wants to merge 35 commits intoInsightSoftwareConsortium:mainfrom
hjmjohnson:vti-pr6032-fixups
Closed

ENH: VTI ImageIO follow-ups: scoping, exception-safety, GTest, F-011#6178
hjmjohnson wants to merge 35 commits intoInsightSoftwareConsortium:mainfrom
hjmjohnson:vti-pr6032-fixups

Conversation

@hjmjohnson
Copy link
Copy Markdown
Member

Fix-ups on top of #6032 (vtiSupport — VTK XML ImageData IO). Rebased onto current main so the diff includes both @dzenanz's vtiSupport stack and my 18 follow-up commits.

Coordinate with #6032: this branch is meant to supersede or stack on that PR — happy to do whichever the maintainers prefer. If #6032 merges first, drop my branch's first 17 commits (hashes 5a7cc61d32c87c33fa) and replay the remaining 18 on top of merged main.

Bug fixes (5 commits)
SHA Subject
b635df110d BUG: Scope VTI <DataArray> reader to <PointData> children only
b7b21d62b1 BUG: TensorRemapGuard only remaps on the successful Read() exit
818b8b2676 BUG: Reject Direction attribute with trailing non-whitespace junk
233bd59e3b BUG: Emit explicit F-011 diagnostic for CellData-only VTI files
a096adcff0 ENH: Warn on non-orthonormal Direction in VTI files
Documentation (2 commits)
SHA Subject
4cbd588a9e DOC: List all F-NNN deferred items in VTIImageIO header
94c55abd9e DOC: Document VTI single-image / first-wins DataArray semantics
Test additions (4 commits)
SHA Subject
5e29900e84 ENH: Test PointData/CellData scoping in VTI reader
7b13f89a5c ENH: Test rejection of <AppendedData> without _ marker
ac5c93c195 ENH: Test rejection of non-numeric NumberOfComponents in VTI

(Direction trailing-junk + non-orthonormal regression tests included with their respective bug commits.)

GTest conversion of 4 no-arg tests (8 commits, rename + content per test)
Old test New
itkVTIImageIOSwapBufferTest itkVTIImageIOSwapBufferGTest (6 TEST blocks)
itkVTIImageIOFutureFeaturesTest itkVTIImageIOFutureFeaturesGTest (4 TEST blocks for F-NNN guards)
itkVTIImageIODirectionTest itkVTIImageIODirectionGTest (single TEST_F using ::testing::TempDir())
itkVTIImageIOGeneratedFixturesTest itkVTIImageIOGeneratedFixturesGTest (5 TEST_F with RegisterRequiredFactories fixture)

Stands up creategoogletestdriver(ITKIOVTK …) scaffolding + a VTI_TEST_INPUT_DIR compile definition so the GTest driver doesn't need command-line arg passthrough.

Each conversion is a rename-only commit followed by a content commit so git log --follow walks back through the rename to the original test history.

Test plan
  • All 23 VTI tests pass locally (cxx build)
  • pixi run build-python clean — 0 warnings introduced
  • pre-commit run --all-files clean against final tip
  • No new compile warnings vs baseline (4 pre-existing warnings in unrelated remote modules)
  • CI green

hjmjohnson and others added 30 commits May 1, 2026 07:46
Closes InsightSoftwareConsortium#6030's parent issue thread; replaces the WIP implementation in
the original PR with one that addresses both the CI failures and the
review feedback.

== Why a rewrite ==

The original WIP commit on this branch failed CI on every C++ platform
and on Python wrapping, and Bradley Lowekamp's review asked for the XML
parsing to use ITK's existing expat library rather than ad-hoc string
matching.  Rather than patch the WIP implementation, this commit
restarts the file from the same upstream/main base with:

  - expat-based XML parsing for the file header
  - Python wrapping registration
  - KWStyle-clean doxygen comments
  - an ENH: prefix that satisfies kwrobot/ghostflow
  - a comprehensive round-trip and corner-case test suite

== Algorithm ==

VTIImageIO inherits from itk::ImageIOBase and supports the
serial-piece subset of the VTK XML ImageData (.vti) format.

Reading
  1. The whole file is slurped into memory.
  2. If the file contains an `<AppendedData>` element, the XML view fed
     to expat is truncated at that element and replaced with a
     self-closing `<AppendedData/></VTKFile>` so the parser sees a
     well-formed document.  The byte offset of the `_` marker that
     introduces the raw binary block is recorded for later seek-and-read.
  3. The truncated XML view is parsed with expat (XML_Parser /
     XML_SetElementHandler / XML_SetCharacterDataHandler).  Element
     handlers populate a VTIParseState with the VTKFile, ImageData,
     PointData, and active DataArray attributes, and the character data
     handler captures inline ASCII or base64 contents.
  4. After parsing, geometry/spacing/origin/component-type/pixel-type
     are populated on the ImageIOBase from the captured state.
  5. Read() then routes to one of three branches:
       - ASCII: parse the cached character data via ReadBufferAsASCII
       - base64: decode the cached base64 string, strip the
         UInt32/UInt64 block-size header, memcpy into the user buffer,
         then byte-swap if file != host endianness
       - raw appended: open the file, seek to
         m_AppendedDataOffset + m_DataArrayOffset + headerBytes, read,
         then byte-swap if necessary

The expat-based reader handles the things string matching cannot:
attribute reordering, optional whitespace, XML comments at any depth,
the standalone <?xml ... ?> declaration, and PointData with multiple
DataArrays where only one is the active scalar/vector/tensor.

Writing
  - ASCII or binary (base64) DataMode, selectable via SetFileType().
  - SymmetricSecondRankTensor pixels are expanded from ITK's 6-component
    storage to VTK's full 9-component layout for ASCII output.  Binary
    tensor writing is intentionally rejected with an exception because
    the on-disk NumberOfComponents="9" header would silently disagree
    with a 6-component memory buffer; this is verified by a test.
  - Output is always written in the host byte order; the file declares
    its byte_order accordingly.
  - The block-size header for base64 binary is currently UInt32 (the
    UInt64 reader path is exercised by a hand-crafted test file below).

Compression (zlib/lz4) and the VTK direction matrix are intentionally
out of scope for this PR.

== CI failures fixed ==

  - itkVTIImageIO.h:124  KWStyle:  comment doesn't have \class
      The DataEncoding `enum class` had a `/** ... */` doxygen comment;
      KWStyle's class-comment rule treats `enum class` like a class.
      Replaced with a plain `//` comment.

  - KeyError: 'VTIImageIOFactory' (Python tests)
      Added Modules/IO/VTK/wrapping/itkVTIImageIO.wrap registering both
      VTIImageIO and VTIImageIOFactory with the auto-loaded wrap module.

  - ghostflow-check-main: 'WIP:' prefix not in allowed list
      Commit message now uses ENH: prefix.

  - Module dependency: ITKExpat added as a PRIVATE_DEPENDS of ITKIOVTK
    so the new XML parser actually links.  Updated the FACTORY_NAMES to
    include `ImageIO::VTI`.

== Test strategy and coverage ==

The new itkVTIImageIOTest.cxx exercises the filter through 3 classes
of cases:

1. Round-trip tests via ImageFileWriter -> ImageFileReader for the
   common pixel type and dimensionality combinations:
     uchar 1D / 2D, short 3D, float 3D, double 3D, RGB 2D, vector<float,3>
     in 3D, both ASCII and binary (base64) encodings where applicable.
     Each round-trip checks region, spacing, origin, and per-pixel
     bit-equivalence.

2. Behavior tests:
     - Symmetric tensor ASCII round-trip (writes 9-component layout,
       reads back into 6-component ITK layout).
     - Symmetric tensor binary write must throw (silent layout
       corruption guard).

3. Hand-crafted-file readability tests for code paths the writer never
   produces but the reader must support:
     - XML robustness: comments at multiple positions, attribute
       reordering, multiple DataArrays in PointData with the active
       Scalars selector pointing at the second array.  Verifies
       dimensions, spacing, origin, AND that the correct active array
       is selected.
     - header_type="UInt64": base64 file with an 8-byte block-size
       header; verifies the dual UInt32/UInt64 header path.
     - format="appended" with raw binary in <AppendedData>: verifies
       the file-truncation + offset-seek path.
     - byte_order="BigEndian": base64 file with the data and the
       block-size header pre-swapped to big-endian; verifies the
       byte-swap-on-read path.
     - CanReadFile / CanWriteFile sanity.

This matches the relevant subset of VTK's own
TestDataObjectXMLIO.cxx coverage matrix (DataMode x ByteOrder x
HeaderType x DataObjectType) for the features this PR claims.  ZLIB/LZ4
compression and the VTK direction matrix are intentionally not exercised
since this PR does not claim those features.

== Local results ==

  $ cmake --build build-ssim -j48 --target ITKIOVTKTestDriver
  [4/4] Linking CXX executable bin/ITKIOVTKTestDriver

  $ ctest -R itkVTIImageIOTest --output-on-failure
  Test InsightSoftwareConsortium#1259: itkVTIImageIOTest .... Passed   0.03 sec
  100% tests passed, 0 tests failed out of 1

  pre-commit (gersemi, clang-format, kw-pre-commit) clean on every
  touched file.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This appears when legacy code is removed. The exact error message was:

Modules/IO/VTK/test/itkVTIImageIOTest.cxx:100:56: warning: 'itk::ImageConstIterator::IndexType itk::ImageConstIterator::GetIndex() const [with TImage = itk::Image<unsigned char, 1>; itk::ImageConstIterator::IndexType = itk::Index<1>]' is deprecated: Please use ComputeIndex() instead, or use an iterator with index, like ImageIteratorWithIndex! [-Wdeprecated-declarations]
…river

- Parse and emit Direction cosines (VTK 9+ row-major 3x3 attribute)
- Canonicalize symmetric-tensor layout to VTK [XX,YY,ZZ,XY,YZ,XZ] convention
- Fix byte-swap for LE-on-BE host; expose static SwapBufferForByteOrder for testing
- Emit version="1.0" and UInt64 block headers matching ParaView 5.7+
- Reject files with DOCTYPE/ENTITY declarations (XXE/billion-laughs mitigation)
- Add F-001..F-010 guard exceptions for deferred features with F-NNN tags
- Add RGBA round-trip coverage
- Add VTI fixture generator (Python stdlib) and synthetic round-trip fixtures
- Add hand-crafted oblique-direction fixture with MHD oracle
- Add unit tests: direction round-trip, SwapBufferForByteOrder, guard tests, fixture suite
- Rewrite class Doxygen with supported features and deferred F-NNN items
- Replace manual ExceptionObject with itkGenericExceptionMacro in static methods
- Stream-parse XML via expat chunks; suspend at <AppendedData> to avoid loading
  binary payload into memory (large-file performance improvement)
- Add appended-raw + vtkZLibDataCompressor writer via SetUseCompression(true);
  produces smallest on-disk files; matches ParaView's default write format

Disable itkVTIImageIOReadWriteTestVHFColorZLib until ExternalData upload is
restored (tracked at ITK#4340); equivalent ZLib code-path coverage provided
by itkVTIImageIOGeneratedFixturesTest (vector3_f32_zlib_appended fixture).
Base64 can have some padding = characters somewhere in the middle.
Decode in chunks. This avoids duplicating the input buffer.
With this approach, "compact" buffer is not needed.
The expat handler accepted any <DataArray> regardless of its parent
container, so files containing both <CellData> and <PointData> (or only
<CellData>) could silently bind to a cell-centered array as if it were
the image's point data, depending on element order.

Add an inPointData flag set on <PointData> start and cleared on
</PointData> end; gate the DataArray-capture branch on it.  Files
without any PointData arrays still produce the existing 'No DataArray
element found' diagnostic; the only behavioural change is that
CellData/FieldData arrays are no longer mistakenly chosen.

active{Scalars,Vectors,Tensors} hints from <PointData> attributes are
intentionally not cleared on </PointData>; they are consumed in the
post-parse pixel-type derivation in ReadImageInformation, and the
captured DataArray itself was already gated on inPointData at match
time.
The destructor used to remap the caller's buffer in place whenever the
image was a symmetric tensor, regardless of whether Read() reached its
successful exit point.  If Read() threw partway through (truncated
payload, decompression mismatch, swap failure), the partial bytes that
had already landed in the buffer were silently re-permuted, scrambling
whatever the caller had pre-existing or partially populated.

Add an explicit commit() flag set at the end of each successful encoding
path; the destructor remaps only when both active and committed are
true, so on exception the buffer is left as the writes left it -- which
is undefined per ImageIOBase::Read contract, but at least no longer
actively corrupted by a remap on top.
The PR body claimed git grep F-NNN locates every deferred-feature
identifier, but the header listed only the six items with active guard
exceptions (F-001/002/005/007/009/010).  F-003 (appended-raw writer),
F-004 (streaming read), F-008 (appended-base64 writer), and F-009
(MetaDataDictionary round-trip) have no public API surface today and
so cannot be triggered without a code change, but they belong in the
findable list anyway.

Add a (guard) vs (latent) annotation so a reader can tell at a glance
which items already throw on misuse and which are just placeholders
for the follow-up PR.  Also add F-011 for <CellData>-only images
(silently rejected today via the post-parse 'No DataArray element
found' diagnostic) so the limitation is explicit.
Companion regression coverage for the preceding scoping fix.  Two new
cases extend itkVTIImageIOTest:

  * CellData-only file is rejected with the existing 'No DataArray
    element found' diagnostic instead of being silently misread as
    point data.
  * Mixed CellData-then-PointData file picks the PointData array even
    when the CellData block appears first in the document, exercising
    the strong form of the inPointData scoping guarantee.
std::istream::operator>> reads exactly the requested 9 floats and then
silently leaves the rest of the attribute in the stream; the prior
failbit check fires only on parse failure within the first 9 reads, so
'1 0 0 0 1 0 0 0 1 garbage' was accepted as if it parsed cleanly.

Probe one more token after the 9-float loop and reject any non-empty
trailing word with a clear diagnostic.  Add a regression case in
itkVTIImageIOTest.
A truncated VTI file whose <AppendedData> block lacks the conventional
'_' byte before the binary payload exercises the EOF-during-marker-scan
path in ReadImageInformation.  The reader already throws there with
'Missing _ marker in <AppendedData> section', but no test pinned that
behaviour, so a future refactor of the offset-scan loop could regress
silently to reading past EOF or returning a successful zero-length
read.  Add a regression case.
Translate the four (componentSize, byte-order) blocks plus the
known-value regression and zero-component edge case into six
TEST(VTIImageIOSwapBuffer, ...) blocks, each calling
SwapBufferForByteOrder and asserting EXPECT_EQ on the resulting byte
buffer.  Failures now print the full operand vectors instead of the
prior '<label>: byte i got=0xNN want=0xNN' diagnostic.

Also stand up the ITKIOVTKGTests scaffolding via a single
creategoogletestdriver(ITKIOVTK ...) call so subsequent GTest
conversions in this directory just append to the list.
Each F-NNN guard becomes its own TEST(VTIImageIOFutureFeatures, ...)
block (F-001/002/005/010), so a regression in any single guard prints
exactly which one failed instead of being lost in a status |= batch.

The test driver consumes the input fixture directory through a
VTI_TEST_INPUT_DIR compile definition rather than argv[1], so the
GTest framework's lack of CLI passthrough does not cost any test
coverage.  The path is set in CMakeLists via target_compile_definitions
on ITKIOVTKGTestDriver.
The test becomes a single TEST(VTIImageIODirection, RoundTrip) that
loads the oblique-direction fixture, asserts the read-back matrix
matches, writes the image back out, and re-asserts after re-read.
EXPECT_NEAR replaces the per-element if/return diagnostic so failures
print the offending operand on the spot.

The output path uses ::testing::TempDir() (gtest-managed) instead of
the prior argv[2] pass-through, and the fixture path comes from the
build-system-injected VTI_TEST_INPUT_DIR macro.
hjmjohnson added 5 commits May 1, 2026 07:46
Each of the five fixture comparisons becomes its own
TEST_F(VTIImageIOGeneratedFixtures, ...) under a fixture struct whose
SetUp() calls RegisterRequiredFactories() so MetaIO can resolve the
.mhd oracle.  CompareImages now returns ::testing::AssertionResult so
EXPECT_TRUE() prints the offending pixel index/values directly on
failure instead of relying on std::cerr scaffolding.

Compressed-write coverage continues to be exercised in four of the
five tests via the same WriteImage(...,true) + read-back loop the
legacy test had; the tensor case remains skipped (F-007 binary-tensor
writer is deferred).  Input fixture paths come from the
build-system-injected VTI_TEST_INPUT_DIR macro.
A file with <DataArray> children only inside <CellData> previously
fell through to the generic 'No DataArray element found' message,
giving the user no clue that the file is structurally valid VTI but
exercises a code path the reader does not yet support.

Track <CellData> presence + at least one DataArray child in the
parser state, then emit a tagged F-011 exception in the post-parse
diagnostic when no PointData arrays were captured but a CellData
array was seen.  Promote F-011 in the header doxygen from (latent)
to (guard) and update the existing CellData-only regression test in
itkVTIImageIOTest to assert the new tag.
The header listed the supported encodings and pixel types but never
spelled out that VTIImageIO yields exactly one image per file, that
the active DataArray inside <PointData> is selected by name (Scalars/
Vectors/Tensors hint or first child if the hint is absent), and that
sibling DataArrays are silently ignored.  Add a 'Single-image
semantics' paragraph after the supported-on-read list so callers
who expect to walk through every DataArray learn the model up-front
and reach for the F-NNN list rather than filing a defect.
The reader wraps std::stoul in try/catch and rethrows as
itk::ExceptionObject so downstream pipelines can catch the parse
failure with the same handler used for every other malformed-input
case in this IO.  No regression test pinned that wrapping, so a
future refactor that returns the raw std::invalid_argument would
slip through.  Add a fixture with NumberOfComponents="banana" and
assert the read raises itk::ExceptionObject.
ITK's geometry pipelines (resampling, registration, physical-space
conversion) assume orthonormal image Direction matrices.  VTK and
ParaView always write rotations or axis permutations, but a tool that
conflates 'Direction' with raw column vectors can emit a degenerate
3x3 that VTIImageIO previously accepted silently and propagated into
the image header, breaking downstream geometry without any signal to
the user.

Compute D * D^T after parsing the Direction attribute and itkWarningMacro
when it deviates from identity by more than 1e-6.  The warning is
non-fatal so legacy files still load; users who care can promote it
to a hard failure via OutputWindow filtering.  Add a regression test
asserting that a deliberately non-orthonormal fixture still loads
without an exception.
@github-actions github-actions Bot added type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Enhancement Improvement of existing methods or implementation area:Python wrapping Python bindings for a class type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct area:Core Issues affecting the Core module area:IO Issues affecting the IO module labels May 1, 2026
@hjmjohnson
Copy link
Copy Markdown
Member Author

Superseded — pushed the 18 fixup commits directly onto dzenanz:vtiSupport (PR #6032), per @dzenanz collaboration. Closing this duplicate.

@hjmjohnson hjmjohnson closed this May 1, 2026
@hjmjohnson hjmjohnson deleted the vti-pr6032-fixups branch May 1, 2026 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Core Issues affecting the Core module area:IO Issues affecting the IO module area:Python wrapping Python bindings for a class type:Enhancement Improvement of existing methods or implementation type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants