ENH: VTI ImageIO follow-ups: scoping, exception-safety, GTest, F-011#6178
Closed
hjmjohnson wants to merge 35 commits intoInsightSoftwareConsortium:mainfrom
Closed
ENH: VTI ImageIO follow-ups: scoping, exception-safety, GTest, F-011#6178hjmjohnson wants to merge 35 commits intoInsightSoftwareConsortium:mainfrom
hjmjohnson wants to merge 35 commits intoInsightSoftwareConsortium:mainfrom
Conversation
Closes InsightSoftwareConsortium#6030's parent issue thread; replaces the WIP implementation in the original PR with one that addresses both the CI failures and the review feedback. == Why a rewrite == The original WIP commit on this branch failed CI on every C++ platform and on Python wrapping, and Bradley Lowekamp's review asked for the XML parsing to use ITK's existing expat library rather than ad-hoc string matching. Rather than patch the WIP implementation, this commit restarts the file from the same upstream/main base with: - expat-based XML parsing for the file header - Python wrapping registration - KWStyle-clean doxygen comments - an ENH: prefix that satisfies kwrobot/ghostflow - a comprehensive round-trip and corner-case test suite == Algorithm == VTIImageIO inherits from itk::ImageIOBase and supports the serial-piece subset of the VTK XML ImageData (.vti) format. Reading 1. The whole file is slurped into memory. 2. If the file contains an `<AppendedData>` element, the XML view fed to expat is truncated at that element and replaced with a self-closing `<AppendedData/></VTKFile>` so the parser sees a well-formed document. The byte offset of the `_` marker that introduces the raw binary block is recorded for later seek-and-read. 3. The truncated XML view is parsed with expat (XML_Parser / XML_SetElementHandler / XML_SetCharacterDataHandler). Element handlers populate a VTIParseState with the VTKFile, ImageData, PointData, and active DataArray attributes, and the character data handler captures inline ASCII or base64 contents. 4. After parsing, geometry/spacing/origin/component-type/pixel-type are populated on the ImageIOBase from the captured state. 5. Read() then routes to one of three branches: - ASCII: parse the cached character data via ReadBufferAsASCII - base64: decode the cached base64 string, strip the UInt32/UInt64 block-size header, memcpy into the user buffer, then byte-swap if file != host endianness - raw appended: open the file, seek to m_AppendedDataOffset + m_DataArrayOffset + headerBytes, read, then byte-swap if necessary The expat-based reader handles the things string matching cannot: attribute reordering, optional whitespace, XML comments at any depth, the standalone <?xml ... ?> declaration, and PointData with multiple DataArrays where only one is the active scalar/vector/tensor. Writing - ASCII or binary (base64) DataMode, selectable via SetFileType(). - SymmetricSecondRankTensor pixels are expanded from ITK's 6-component storage to VTK's full 9-component layout for ASCII output. Binary tensor writing is intentionally rejected with an exception because the on-disk NumberOfComponents="9" header would silently disagree with a 6-component memory buffer; this is verified by a test. - Output is always written in the host byte order; the file declares its byte_order accordingly. - The block-size header for base64 binary is currently UInt32 (the UInt64 reader path is exercised by a hand-crafted test file below). Compression (zlib/lz4) and the VTK direction matrix are intentionally out of scope for this PR. == CI failures fixed == - itkVTIImageIO.h:124 KWStyle: comment doesn't have \class The DataEncoding `enum class` had a `/** ... */` doxygen comment; KWStyle's class-comment rule treats `enum class` like a class. Replaced with a plain `//` comment. - KeyError: 'VTIImageIOFactory' (Python tests) Added Modules/IO/VTK/wrapping/itkVTIImageIO.wrap registering both VTIImageIO and VTIImageIOFactory with the auto-loaded wrap module. - ghostflow-check-main: 'WIP:' prefix not in allowed list Commit message now uses ENH: prefix. - Module dependency: ITKExpat added as a PRIVATE_DEPENDS of ITKIOVTK so the new XML parser actually links. Updated the FACTORY_NAMES to include `ImageIO::VTI`. == Test strategy and coverage == The new itkVTIImageIOTest.cxx exercises the filter through 3 classes of cases: 1. Round-trip tests via ImageFileWriter -> ImageFileReader for the common pixel type and dimensionality combinations: uchar 1D / 2D, short 3D, float 3D, double 3D, RGB 2D, vector<float,3> in 3D, both ASCII and binary (base64) encodings where applicable. Each round-trip checks region, spacing, origin, and per-pixel bit-equivalence. 2. Behavior tests: - Symmetric tensor ASCII round-trip (writes 9-component layout, reads back into 6-component ITK layout). - Symmetric tensor binary write must throw (silent layout corruption guard). 3. Hand-crafted-file readability tests for code paths the writer never produces but the reader must support: - XML robustness: comments at multiple positions, attribute reordering, multiple DataArrays in PointData with the active Scalars selector pointing at the second array. Verifies dimensions, spacing, origin, AND that the correct active array is selected. - header_type="UInt64": base64 file with an 8-byte block-size header; verifies the dual UInt32/UInt64 header path. - format="appended" with raw binary in <AppendedData>: verifies the file-truncation + offset-seek path. - byte_order="BigEndian": base64 file with the data and the block-size header pre-swapped to big-endian; verifies the byte-swap-on-read path. - CanReadFile / CanWriteFile sanity. This matches the relevant subset of VTK's own TestDataObjectXMLIO.cxx coverage matrix (DataMode x ByteOrder x HeaderType x DataObjectType) for the features this PR claims. ZLIB/LZ4 compression and the VTK direction matrix are intentionally not exercised since this PR does not claim those features. == Local results == $ cmake --build build-ssim -j48 --target ITKIOVTKTestDriver [4/4] Linking CXX executable bin/ITKIOVTKTestDriver $ ctest -R itkVTIImageIOTest --output-on-failure Test InsightSoftwareConsortium#1259: itkVTIImageIOTest .... Passed 0.03 sec 100% tests passed, 0 tests failed out of 1 pre-commit (gersemi, clang-format, kw-pre-commit) clean on every touched file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This appears when legacy code is removed. The exact error message was: Modules/IO/VTK/test/itkVTIImageIOTest.cxx:100:56: warning: 'itk::ImageConstIterator::IndexType itk::ImageConstIterator::GetIndex() const [with TImage = itk::Image<unsigned char, 1>; itk::ImageConstIterator::IndexType = itk::Index<1>]' is deprecated: Please use ComputeIndex() instead, or use an iterator with index, like ImageIteratorWithIndex! [-Wdeprecated-declarations]
…river - Parse and emit Direction cosines (VTK 9+ row-major 3x3 attribute) - Canonicalize symmetric-tensor layout to VTK [XX,YY,ZZ,XY,YZ,XZ] convention - Fix byte-swap for LE-on-BE host; expose static SwapBufferForByteOrder for testing - Emit version="1.0" and UInt64 block headers matching ParaView 5.7+ - Reject files with DOCTYPE/ENTITY declarations (XXE/billion-laughs mitigation) - Add F-001..F-010 guard exceptions for deferred features with F-NNN tags - Add RGBA round-trip coverage - Add VTI fixture generator (Python stdlib) and synthetic round-trip fixtures - Add hand-crafted oblique-direction fixture with MHD oracle - Add unit tests: direction round-trip, SwapBufferForByteOrder, guard tests, fixture suite - Rewrite class Doxygen with supported features and deferred F-NNN items - Replace manual ExceptionObject with itkGenericExceptionMacro in static methods - Stream-parse XML via expat chunks; suspend at <AppendedData> to avoid loading binary payload into memory (large-file performance improvement) - Add appended-raw + vtkZLibDataCompressor writer via SetUseCompression(true); produces smallest on-disk files; matches ParaView's default write format Disable itkVTIImageIOReadWriteTestVHFColorZLib until ExternalData upload is restored (tracked at ITK#4340); equivalent ZLib code-path coverage provided by itkVTIImageIOGeneratedFixturesTest (vector3_f32_zlib_appended fixture).
Base64 can have some padding = characters somewhere in the middle. Decode in chunks. This avoids duplicating the input buffer. With this approach, "compact" buffer is not needed.
The expat handler accepted any <DataArray> regardless of its parent
container, so files containing both <CellData> and <PointData> (or only
<CellData>) could silently bind to a cell-centered array as if it were
the image's point data, depending on element order.
Add an inPointData flag set on <PointData> start and cleared on
</PointData> end; gate the DataArray-capture branch on it. Files
without any PointData arrays still produce the existing 'No DataArray
element found' diagnostic; the only behavioural change is that
CellData/FieldData arrays are no longer mistakenly chosen.
active{Scalars,Vectors,Tensors} hints from <PointData> attributes are
intentionally not cleared on </PointData>; they are consumed in the
post-parse pixel-type derivation in ReadImageInformation, and the
captured DataArray itself was already gated on inPointData at match
time.
The destructor used to remap the caller's buffer in place whenever the image was a symmetric tensor, regardless of whether Read() reached its successful exit point. If Read() threw partway through (truncated payload, decompression mismatch, swap failure), the partial bytes that had already landed in the buffer were silently re-permuted, scrambling whatever the caller had pre-existing or partially populated. Add an explicit commit() flag set at the end of each successful encoding path; the destructor remaps only when both active and committed are true, so on exception the buffer is left as the writes left it -- which is undefined per ImageIOBase::Read contract, but at least no longer actively corrupted by a remap on top.
The PR body claimed git grep F-NNN locates every deferred-feature identifier, but the header listed only the six items with active guard exceptions (F-001/002/005/007/009/010). F-003 (appended-raw writer), F-004 (streaming read), F-008 (appended-base64 writer), and F-009 (MetaDataDictionary round-trip) have no public API surface today and so cannot be triggered without a code change, but they belong in the findable list anyway. Add a (guard) vs (latent) annotation so a reader can tell at a glance which items already throw on misuse and which are just placeholders for the follow-up PR. Also add F-011 for <CellData>-only images (silently rejected today via the post-parse 'No DataArray element found' diagnostic) so the limitation is explicit.
Companion regression coverage for the preceding scoping fix. Two new
cases extend itkVTIImageIOTest:
* CellData-only file is rejected with the existing 'No DataArray
element found' diagnostic instead of being silently misread as
point data.
* Mixed CellData-then-PointData file picks the PointData array even
when the CellData block appears first in the document, exercising
the strong form of the inPointData scoping guarantee.
std::istream::operator>> reads exactly the requested 9 floats and then silently leaves the rest of the attribute in the stream; the prior failbit check fires only on parse failure within the first 9 reads, so '1 0 0 0 1 0 0 0 1 garbage' was accepted as if it parsed cleanly. Probe one more token after the 9-float loop and reject any non-empty trailing word with a clear diagnostic. Add a regression case in itkVTIImageIOTest.
A truncated VTI file whose <AppendedData> block lacks the conventional '_' byte before the binary payload exercises the EOF-during-marker-scan path in ReadImageInformation. The reader already throws there with 'Missing _ marker in <AppendedData> section', but no test pinned that behaviour, so a future refactor of the offset-scan loop could regress silently to reading past EOF or returning a successful zero-length read. Add a regression case.
Translate the four (componentSize, byte-order) blocks plus the known-value regression and zero-component edge case into six TEST(VTIImageIOSwapBuffer, ...) blocks, each calling SwapBufferForByteOrder and asserting EXPECT_EQ on the resulting byte buffer. Failures now print the full operand vectors instead of the prior '<label>: byte i got=0xNN want=0xNN' diagnostic. Also stand up the ITKIOVTKGTests scaffolding via a single creategoogletestdriver(ITKIOVTK ...) call so subsequent GTest conversions in this directory just append to the list.
Each F-NNN guard becomes its own TEST(VTIImageIOFutureFeatures, ...) block (F-001/002/005/010), so a regression in any single guard prints exactly which one failed instead of being lost in a status |= batch. The test driver consumes the input fixture directory through a VTI_TEST_INPUT_DIR compile definition rather than argv[1], so the GTest framework's lack of CLI passthrough does not cost any test coverage. The path is set in CMakeLists via target_compile_definitions on ITKIOVTKGTestDriver.
The test becomes a single TEST(VTIImageIODirection, RoundTrip) that loads the oblique-direction fixture, asserts the read-back matrix matches, writes the image back out, and re-asserts after re-read. EXPECT_NEAR replaces the per-element if/return diagnostic so failures print the offending operand on the spot. The output path uses ::testing::TempDir() (gtest-managed) instead of the prior argv[2] pass-through, and the fixture path comes from the build-system-injected VTI_TEST_INPUT_DIR macro.
Each of the five fixture comparisons becomes its own TEST_F(VTIImageIOGeneratedFixtures, ...) under a fixture struct whose SetUp() calls RegisterRequiredFactories() so MetaIO can resolve the .mhd oracle. CompareImages now returns ::testing::AssertionResult so EXPECT_TRUE() prints the offending pixel index/values directly on failure instead of relying on std::cerr scaffolding. Compressed-write coverage continues to be exercised in four of the five tests via the same WriteImage(...,true) + read-back loop the legacy test had; the tensor case remains skipped (F-007 binary-tensor writer is deferred). Input fixture paths come from the build-system-injected VTI_TEST_INPUT_DIR macro.
A file with <DataArray> children only inside <CellData> previously fell through to the generic 'No DataArray element found' message, giving the user no clue that the file is structurally valid VTI but exercises a code path the reader does not yet support. Track <CellData> presence + at least one DataArray child in the parser state, then emit a tagged F-011 exception in the post-parse diagnostic when no PointData arrays were captured but a CellData array was seen. Promote F-011 in the header doxygen from (latent) to (guard) and update the existing CellData-only regression test in itkVTIImageIOTest to assert the new tag.
The header listed the supported encodings and pixel types but never spelled out that VTIImageIO yields exactly one image per file, that the active DataArray inside <PointData> is selected by name (Scalars/ Vectors/Tensors hint or first child if the hint is absent), and that sibling DataArrays are silently ignored. Add a 'Single-image semantics' paragraph after the supported-on-read list so callers who expect to walk through every DataArray learn the model up-front and reach for the F-NNN list rather than filing a defect.
The reader wraps std::stoul in try/catch and rethrows as itk::ExceptionObject so downstream pipelines can catch the parse failure with the same handler used for every other malformed-input case in this IO. No regression test pinned that wrapping, so a future refactor that returns the raw std::invalid_argument would slip through. Add a fixture with NumberOfComponents="banana" and assert the read raises itk::ExceptionObject.
ITK's geometry pipelines (resampling, registration, physical-space conversion) assume orthonormal image Direction matrices. VTK and ParaView always write rotations or axis permutations, but a tool that conflates 'Direction' with raw column vectors can emit a degenerate 3x3 that VTIImageIO previously accepted silently and propagated into the image header, breaking downstream geometry without any signal to the user. Compute D * D^T after parsing the Direction attribute and itkWarningMacro when it deviates from identity by more than 1e-6. The warning is non-fatal so legacy files still load; users who care can promote it to a hard failure via OutputWindow filtering. Add a regression test asserting that a deliberately non-orthonormal fixture still loads without an exception.
Member
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix-ups on top of #6032 (vtiSupport — VTK XML ImageData IO). Rebased onto current
mainso the diff includes both @dzenanz's vtiSupport stack and my 18 follow-up commits.Coordinate with #6032: this branch is meant to supersede or stack on that PR — happy to do whichever the maintainers prefer. If #6032 merges first, drop my branch's first 17 commits (hashes
5a7cc61d…32c87c33fa) and replay the remaining 18 on top of merged main.Bug fixes (5 commits)
b635df110d<DataArray>reader to<PointData>children onlyb7b21d62b1818b8b2676233bd59e3ba096adcff0Documentation (2 commits)
4cbd588a9e94c55abd9eTest additions (4 commits)
5e29900e847b13f89a5c<AppendedData>without_markerac5c93c195(Direction trailing-junk + non-orthonormal regression tests included with their respective bug commits.)
GTest conversion of 4 no-arg tests (8 commits, rename + content per test)
::testing::TempDir())Stands up
creategoogletestdriver(ITKIOVTK …)scaffolding + aVTI_TEST_INPUT_DIRcompile definition so the GTest driver doesn't need command-line arg passthrough.Each conversion is a rename-only commit followed by a content commit so
git log --followwalks back through the rename to the original test history.Test plan
pixi run build-pythonclean — 0 warnings introducedpre-commit run --all-filesclean against final tip