Skip to content

Add support for reading/writing VTK XML ImageData (.vti) format#6032

Merged
hjmjohnson merged 1 commit intoInsightSoftwareConsortium:mainfrom
dzenanz:vtiSupport
May 2, 2026
Merged

Add support for reading/writing VTK XML ImageData (.vti) format#6032
hjmjohnson merged 1 commit intoInsightSoftwareConsortium:mainfrom
dzenanz:vtiSupport

Conversation

@dzenanz
Copy link
Copy Markdown
Member

@dzenanz dzenanz commented Apr 9, 2026

VTK XML ImageData (.vti) reader + writer for ITK. Original scaffold from @dzenanz (Sonnet 4.6 draft); correctness + guard-test pass on top by @hjmjohnson. Every @blowekamp and @greptileai P1/P2 concern is addressed or explicitly deferred to a follow-up PR with a tagged guard in the source.

Status — 13 new commits added today; CI running. Draft; ready for re-review once CI turns green.

One known test is disabled: itkVTIImageIOReadWriteTestVHFColorZLib — the ParaView-produced fixture cannot be published while the ExternalData upload tool is down (see below). Equivalent code-path coverage is provided by a synthetic Vector<float,3> ZLib-appended fixture, so nothing ships with red CI.

Reviewer comments addressed (with commit pointers)
# Reviewer Comment Status Landed in
1 @blowekamp "It would be good to use an XML parsing library like expat which is already in ITK." Addressed in the earlier rewrite (ddf579b15f). Parser is now expat with <!DOCTYPE>/<!ENTITY> rejection added on top. ddf579b15f, hardened in 96352cdcf1
2 @dzenanz "We should add a few test files converted into .vti format by ParaView, and regression test them against the existing .nrrd/.mha versions." Intent addressed via a different path: ITK's content-link-upload.itk.org is down (ITK#4340, storacha→Pinata migration), so 294 KB+ ParaView fixtures cannot be published right now. Replaced with a Python-stdlib fixture generator producing sub-KB synthetic .vti / .mhd pairs covering every encoding combination; itkVTIImageIOGeneratedFixturesTest reads each and pixel-compares to its MetaIO oracle. When upload returns, real ParaView fixtures will be added alongside — both serve as useful cross-impl validation. bda507863f, bd2a11e6db, 944bcbdcc6
3 @dzenanz GetIndex() deprecation warning in itkVTIImageIOTest.cxx:100 Addressed by @dzenanz in b884beed42 before this re-review pass. b884beed42
4 greptileai [P1, itkVTIImageIO.cxx:777] "Tensor round-trip produces wrong values" — writer emits 9 ASCII components per pixel, reader calls SetNumberOfComponents(6), so only 6 of the 9 stream values are consumed per pixel → every subsequent pixel's components shift. Fixed. Canonicalised both ends to VTK's 6-component symmetric-tensor layout [XX, YY, ZZ, XY, YZ, XZ]. Reader rejects NumberOfComponents!="6" for tensor arrays and remaps to ITK's [e00, e01, e02, e11, e12, e22] via a TensorRemapGuard scope guard after every encoding path. Writer emits 6 components in canonical order and the test's promised pixel-wise comparison is now real. 45b8815de7
5 greptileai [P1, itkVTIImageIO.cxx:749] "Byte-swap is a no-op when reading a little-endian file on a big-endian system"SwapRangeFromSystemToBigEndian on a BE host does nothing, leaving LE file bytes un-swapped. Fixed. SwapBufferIfNeeded replaced with a public static SwapBufferForByteOrder(buffer, componentSize, numComponents, fileByteOrder, targetByteOrder) that byte-reverses each component via std::reverse unconditionally when the orders differ. 17 unit-test cases cover componentSize ∈ {1,2,4,8} × (file × target) ∈ {LE,BE}² on any host — no BE CI runner needed to exercise the fix. 24e030c4ec
6 greptileai [P2, itkVTIImageIO.cxx:245] "Base64-encoded appended data silently misread as raw binary"encoding attribute of <AppendedData> was not captured. Addressed by @dzenanz in ea5add7c78 (base64 appended support added); reader now captures encoding="base64" vs encoding="raw" explicitly and dispatches accordingly. ea5add7c78
7 greptileai [P2, itkVTIImageIO.cxx:513] "Entire file slurped into memory for raw-appended paths" Deferred (F-004). @dzenanz already noted the content buffer is a local in InternalReadImageInformation (bounded peak; goes out of scope before Read()). Full streaming read requires inheriting from StreamingImageIOBase, which is a separate, larger change scoped to the follow-up PR. A code comment at the class docstring now documents the Phase-2 plan for streaming-read via the appended-raw byte offset. Phase 2 F-004
8 greptileai [P2, itkVTIImageIO.cxx:1013] "uint32_t block-size header silently truncates for images >~4 GB" Fixed. Writer now emits header_type="UInt64" + a uint64_t block-size prefix unconditionally. version="1.0" pairs with this to match ParaView 5.7+ defaults. No overflow check needed because the header integer is wide enough for all platform SizeType values. b2590b1db5

Additional defects discovered during the audit that did not originate from a reviewer comment but were fixed in this pass:

  • Silent Direction cosines loss (both read and write). Every ParaView-produced .vti carries Direction="..." but the reader ignored it and the writer emitted none; every round-trip through this IO silently reset orientation to identity. Unacceptable for ITK's medical-imaging use case. Fixed in fb3815f64c with a hand-crafted oblique-rotation fixture + round-trip test.
  • Lying tensor test. The existing tensor test comment promised pixelwise comparison but only ran ITK_TRY_EXPECT_NO_EXCEPTION. Replaced with real ImagesEqual assertion in 45b8815de7.
  • RGBA round-trip advertised but untested. Added in 5257040c63.
  • No defence against <!DOCTYPE> / <!ENTITY> attacks (billion-laughs, XXE). Added a pre-parse rejection and XML_SetParamEntityParsing(NEVER) in 96352cdcf1.
Commit list (13 new on top of @dzenanz's vtiSupport tip)
# SHA Subject
1 3698b6196c ENH: Add hand-crafted oblique-direction VTI fixture and MHD oracle
2 bda507863f ENH: Add VTI fixture generator and synthetic round-trip / guard fixtures
3 fb3815f64c BUG: Parse and emit Direction cosines in VTIImageIO
4 45b8815de7 BUG: Canonicalize symmetric-tensor layout to VTK 6-component convention
5 24e030c4ec BUG: Correct byte-swap for LE-on-BE host and unit-test the swap helper
6 b2590b1db5 ENH: Write version="1.0" and UInt64 block headers to match ParaView
7 5257040c63 ENH: Add 2D RGBA round-trip coverage to itkVTIImageIOTest
8 e5879bea75 ENH: Add F-001..F-010 deferred-feature guard exceptions and guard tests
9 96352cdcf1 BUG: Reject VTI files with DOCTYPE or ENTITY declarations
10 c24cfea37c STYLE: Replace manual ExceptionObject with itkGenericExceptionMacro
11 8c61028f09 DOC: Rewrite VTIImageIO class Doxygen for locked Phase A scope
12 bd2a11e6db ENH: Read generator-produced VTI fixtures and compare to MHD oracles
13 944bcbdcc6 ENH: Disable broken VHFColorZLib test and cover same codepath synthetically

Every commit compiled and its associated test(s) ran locally before the next was staged. Diff footprint: +1662 / −87 across 30 files, all scoped to Modules/IO/VTK/.

Test status: 9 of 9 VTI tests pass locally
Test project .../build
    Start 1669: itkVTIImageIOTest                         Passed
    Start 1670: itkVTIImageIOReadWriteTestCTHead1         Passed
    Start 1671: itkVTIImageIOReadWriteTestVisibleWomanEye Passed
    Start 1672: itkVTIImageIOReadWriteTestHeadMRVolume    Passed
    Start 1673: itkVTIImageIOReadWriteTestVHFColor        Passed
    Start 1674: itkVTIImageIODirectionTest                Passed   (new)
    Start 1675: itkVTIImageIOSwapBufferTest               Passed   (new, 17 sub-cases)
    Start 1676: itkVTIImageIOFutureFeaturesTest           Passed   (new, F-NNN guards)
    Start 1677: itkVTIImageIOGeneratedFixturesTest        Passed   (new, 5 fixtures)
100% tests passed, 0 tests failed out of 9

The itkVTIImageIOReadWriteTestVHFColorZLib registration (which has been red since c283dfd86d because its ParaView-generated input fixture was never committed) is now disabled with a TODO in Modules/IO/VTK/test/CMakeLists.txt pointing at ITK#4340 and cmake-w3-externaldata-upload#3. When the upload tool is restored, un-comment the block and publish Input/VHFColorZLib.vti.cid alongside it.

Equivalent code-path coverage (3-component Float32 ZLib-compressed appended-raw with UInt64 header, dispatched to IOPixelEnum::VECTOR) is provided by the new vector3_f32_zlib_appended case in itkVTIImageIOGeneratedFixturesTest — driven by a 4×4×2 Vector<float,3> fixture produced by the Python-stdlib generator (no ITK code participates in data generation, giving genuine cross-impl validation).

Scope boundary: features deferred to the follow-up PR (F-NNN guards)

Every deferred feature has a tagged guard exception in the source so git grep 'F-NNN' locates the guard, its test, and the commit message explaining why. When the follow-up PR lands each feature, it flips the guard test from "expect exception" to "expect success + pixel-compare" in the same commit that removes the guard — a visible red → green transition in git history.

Tag Feature How it surfaces today Fixture
F-001 LZ4 decompressor (read) itkExceptionMacro on compressor="vtkLZ4DataCompressor" VTI_guard_lz4.vti
F-002 LZMA decompressor (read) same mechanism for vtkLZMADataCompressor VTI_guard_lzma.vti
F-003 Appended-raw writer Latent — no public API today
F-004 Streaming read for appended-raw Latent — inherits from ImageIOBase, not StreamingImageIOBase
F-005 Multi-<Piece> read itkExceptionMacro when pieceCount > 1 VTI_guard_multipiece.vti
F-007 Binary symmetric-tensor writer itkExceptionMacro in Write()
F-008 Appended-base64 writer Latent
F-009 MetaDataDictionary round-trip Latent (would be a warning, not an exception)
F-010 Unknown-compressor catch-all itkExceptionMacro for any non-{zlib,lz4,lzma} compressor VTI_guard_unknown_compressor.vti

No ZSTD guard is needed — VTK has never shipped a vtkZSTDDataCompressor. F-006 (UInt32 overflow guard on write) was subsumed by emitting UInt64 unconditionally.

Synthetic fixture generator

Modules/IO/VTK/test/generate_vti_fixtures.py is ~200 lines of Python stdlib (xml, struct, base64, zlib) that produces deterministic sub-KB .vti fixtures plus matching MetaIO .mhd / .raw oracles. Runs idempotently:

python3 Modules/IO/VTK/test/generate_vti_fixtures.py
Output Encoding Pixel type Test that consumes it
VTI_oblique_direction.{vti,mhd,raw} ASCII uint8 scalar with 45° Z-rotation Direction itkVTIImageIODirectionTest
VTI_scalar_u8_appended_raw.{vti,mhd,raw} appended-raw uint8 scalar itkVTIImageIOGeneratedFixturesTest
VTI_scalar_f32_zlib_appended.{vti,mhd,raw} ZLib appended-raw Float32 scalar itkVTIImageIOGeneratedFixturesTest
VTI_rgba_u8_appended_raw.{vti,mhd,raw} appended-raw RGBA<uint8> itkVTIImageIOGeneratedFixturesTest
VTI_vector3_f32_zlib_appended.{vti,mhd,raw} ZLib appended-raw Vector<float,3> itkVTIImageIOGeneratedFixturesTest (replaces the disabled VHFColorZLib coverage)
VTI_tensor_f32_ascii.{vti,mhd,raw} ASCII SymmetricSecondRankTensor<float,3> (VTK canonical [XX,YY,ZZ,XY,YZ,XZ] on disk) itkVTIImageIOGeneratedFixturesTest
VTI_guard_{lz4,lzma,unknown_compressor,multipiece}.vti — (header-only; triggers F-NNN guard) itkVTIImageIOFutureFeaturesTest

Rationale for shipping the generator alongside the fixtures: we need sub-100 KB fixtures to pass the kw-pre-commit hook cap; ParaView-produced fixtures exceed that and ExternalData upload is unavailable today; and passing a reader test against a Python-stdlib writer's output is genuinely independent cross-implementation validation that the other round-trip tests do not provide.

Implementation highlights worth a targeted review
  1. Direction cosines (fb3815f64c) — expat parser adds a direction state field, reader builds a 3×3 row-major matrix and calls SetDirection(axis, v) per image axis, writer composes an identity-padded 3×3 from GetDirection(axis). Column j of Direction is axis-j's world-space direction, matching both VTK's convention and ITK's m_Direction[axis][world] storage.

  2. Symmetric tensor canonicalization (45b8815de7) — reader uses a scope guard (TensorRemapGuard) that remaps [XX, YY, ZZ, XY, YZ, XZ][e00, e01, e02, e11, e12, e22] in place regardless of which encoding path populated the buffer. One remap site, five exit points, no ifs.

  3. Endian fix (24e030c4ec) — std::reverse-based swap makes the code work on any host and lets the unit test generate both "file=LE, target=BE" and "file=BE, target=LE" combinations from either endianness.

  4. Security (96352cdcf1) — pre-parse scan for <!DOCTYPE / <!ENTITY tokens (billion-laughs + XXE) + XML_SetParamEntityParsing(NEVER) — defence-in-depth with a test that round-trips a malicious payload and asserts refusal.

  5. F-NNN discoverability (e5879bea75) — git grep 'F-001' (etc.) from any clone lands on the guard, the test, and the commit message. Designed so the follow-up PR authors can pick up work items without reading this PR description.

@github-actions github-actions Bot added type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots area:IO Issues affecting the IO module labels Apr 9, 2026
@blowekamp
Copy link
Copy Markdown
Member

It would be good to use an XML parsing library like expat which is already in ITK.

@github-actions github-actions Bot added area:Python wrapping Python bindings for a class type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct labels Apr 10, 2026
@hjmjohnson
Copy link
Copy Markdown
Member

Force-pushed ddf579b15f (replacing the prior 1c67ef7c34). This is a focused rewrite that addresses the CI failures and @blowekamp's review comment.

CI failures fixed

Failure Root cause Fix
KWStyle (ARMBUILD-*, Pixi-Cxx (windows-2022), etc.) — itkVTIImageIO.h:124: error: comment doesn't have \\class The enum class DataEncoding had a /** ... */ doxygen block; KWStyle's class-comment rule treats enum class like a class. Replaced with a plain // comment.
KeyError: 'VTIImageIOFactory' (ARMBUILD-Python, ITK.{Linux,macOS}.Python) The Python wrapping registration was missing entirely. Added Modules/IO/VTK/wrapping/itkVTIImageIO.wrap registering both VTIImageIO and VTIImageIOFactory (matching itkVTKImageIO.wrap).
ghostflow-check-main Commit subject started with WIP:, not in kwrobot's allowed prefix list. New commit uses ENH:.
(Linker would fail once parser was added) ITKExpat was not in the module deps. Added ITKExpat as PRIVATE_DEPENDS of ITKIOVTK. Added ImageIO::VTI to FACTORY_NAMES.

Review feedback addressed

@blowekamp: It would be good to use an XML parsing library like expat which is already in ITK.

The XML header parsing is now done by expat (the same library Modules/IO/XML/src/itkXMLFile.cxx already uses). The InternalReadImageInformation() flow:

  1. Slurps the file once.
  2. If the file contains <AppendedData>, the XML view fed to expat is truncated at that element and replaced with a self-closing <AppendedData/></VTKFile> so the parser sees a well-formed document. The byte offset of the _ marker is recorded for later seek-and-read of the binary block.
  3. The truncated XML view is parsed with XML_ParserCreate + XML_SetElementHandler + XML_SetCharacterDataHandler. Element handlers populate a VTIParseState with the VTKFile, ImageData, PointData, and active DataArray attributes; the character-data handler captures inline ASCII or base64 contents.
  4. After parsing, geometry / pixel type / encoding are populated on ImageIOBase.

Switching to expat removes all the ad-hoc content.find("<...") scans the previous version had, and fixes a class of latent bugs (attribute ordering, comments, whitespace, multiple DataArrays). The only remaining string scan is for the <AppendedData> boundary itself, which is unavoidable: the XML-illegal raw binary inside that element would crash any XML parser and is read directly via seek instead.

Tests added

The previous commit shipped no VTI tests at all. This commit adds Modules/IO/VTK/test/itkVTIImageIOTest.cxx with three classes of cases:

1. Round-trip via ImageFileWriterImageFileReader (region, spacing, origin, per-pixel bit-equivalence):

  • unsigned char 1D / 2D, short 3D, float 3D, double 3D, RGBPixel<uchar> 2D, Vector<float,3> 3D — both ASCII and binary (base64) encodings where applicable.

2. Behavior tests:

  • Symmetric tensor ASCII round-trip (writes 9-component layout, reads back into ITK's 6-component layout).
  • Symmetric tensor binary write must throw (silent layout corruption guard — verifies the on-disk NumberOfComponents=\"9\" header doesn't get paired with a 6-component memory buffer).

3. Hand-crafted-file readability tests for code paths the writer never produces but the reader must support (cross-checked against VTK's TestDataObjectXMLIO.cxx coverage matrix):

  • XML robustness — comments at multiple positions, attribute reordering, multiple DataArrays in PointData with Scalars=\"...\" pointing at the second array. Verifies the active-array selector logic.
  • header_type=\"UInt64\" base64 file — exercises the 8-byte block-size header path.
  • format=\"appended\" with raw binary in <AppendedData> — exercises the file-truncation + offset-seek path.
  • byte_order=\"BigEndian\" base64 file with data and block-size header pre-swapped — exercises the byte-swap-on-read path.
  • CanReadFile / CanWriteFile sanity.

This matches the relevant subset of VTK's TestDataObjectXMLIO.cxx coverage matrix (DataMode × ByteOrder × HeaderType × DataObjectType) for the features this PR claims. ZLIB/LZ4 compression and the VTK direction matrix are intentionally not exercised because this PR does not claim those features (separate follow-ups).

Local results

$ cmake --build build-ssim -j48 --target ITKIOVTKTestDriver ITKIOVTKHeaderTest1
[20/20] Linking CXX executable bin/ITKIOVTKTestDriver

$ ctest -R itkVTIImageIOTest --output-on-failure
Test #1259: itkVTIImageIOTest .... Passed   0.03 sec
100% tests passed, 0 tests failed out of 1

Test output (every assertion passes):

  Round-trip OK: vti_uchar2d_binary.vti
  Round-trip OK: vti_uchar2d_ascii.vti
  Round-trip OK: vti_short3d_binary.vti
  Round-trip OK: vti_short3d_ascii.vti
  Round-trip OK: vti_float3d_binary.vti
  Round-trip OK: vti_float3d_ascii.vti
  Round-trip OK: vti_double3d_binary.vti
  Round-trip OK: vti_rgb2d_binary.vti
  Round-trip OK: vti_vec3d_binary.vti
  Round-trip OK: vti_uchar1d_binary.vti
  Round-trip OK: vti_uchar1d_ascii.vti
  Tensor ASCII round-trip parsed without exception
  Binary tensor write correctly rejected
  XML robustness OK: comments, attribute reordering, multi-DataArray active selector
  UInt64 header_type base64 read OK
  Raw-appended-data read OK
  BigEndian byte-swap read OK

pre-commit (gersemi, clang-format, kw-pre-commit) is clean on every touched file.

Out-of-scope items (potential follow-ups)

  • ZLIB / LZ4 compressed <AppendedData> blocks
  • Direction matrix attribute (VTK 9 added it to <ImageData>)
  • Streaming reads of pieces (VTK splits a file into multiple <Piece> elements)

@dzenanz
Copy link
Copy Markdown
Member Author

dzenanz commented Apr 10, 2026

We should add a few test files converted into .vti format by ParaView, and regression test them against the existing .nrrd/.mha versions. And of course, manually review this.

@dzenanz
Copy link
Copy Markdown
Member Author

dzenanz commented Apr 10, 2026

Legacy removed tests failed:

Modules/IO/VTK/test/itkVTIImageIOTest.cxx:100:56: warning: 'itk::ImageConstIterator::IndexType itk::ImageConstIterator::GetIndex() const [with TImage = itk::Image<unsigned char, 1>; itk::ImageConstIterator::IndexType = itk::Index<1>]' is deprecated: Please use ComputeIndex() instead, or use an iterator with index, like ImageIteratorWithIndex! [-Wdeprecated-declarations]

@dzenanz
Copy link
Copy Markdown
Member Author

dzenanz commented Apr 13, 2026

@greptileai review this.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 13, 2026

Greptile Summary

This PR adds a new VTIImageIO class for reading and writing VTK XML ImageData (.vti) files, covering ASCII, inline base64, appended raw/base64, and ZLib-compressed encoding paths, with expat-based XML parsing, XXE/billion-laughs rejection, direction-cosine round-trip, symmetric-tensor canonicalization, and a SwapBufferForByteOrder helper that correctly handles all four endianness combinations. Previous P1/P2 findings (tensor layout, byte-swap no-op on BE readers, uint32 overflow, base64-appended dispatch) have been fixed, and deferred features are documented with F-NNN guard exceptions and matching tests.

Only two P2 notes remain after this pass: the VTK multi-block ZLib compression header integers (nblocks, per-block sizes) are read via plain memcpy without applying the file's declared byte order, so cross-endian ZLib decompression would misread the header and likely throw std::bad_alloc; and the writer comment claiming "always write little-endian binary" is factually wrong on big-endian hosts (the no-op swap there correctly produces BE data paired with byte_order=\"BigEndian\", but the comment will mislead future maintainers).

Confidence Score: 5/5

Safe to merge; only P2 findings remain, both scoped to cross-endian edge cases that are explicitly deferred to the follow-up PR

All previously identified P1 defects (tensor layout, byte-swap no-op, uint32 overflow, base64-appended misread) have been fixed with tests. Remaining findings are P2: the ZLib header endianness gap requires a big-endian system to trigger and is consistent with the explicitly deferred BE work; the misleading writer comment is documentation-only. No security issues beyond the already-addressed XXE/billion-laughs mitigations.

Modules/IO/VTK/src/itkVTIImageIO.cxx — ZLib decompression header byte-order handling and the misleading write-path comment warrant attention before the BE follow-up PR lands

Important Files Changed

Filename Overview
Modules/IO/VTK/src/itkVTIImageIO.cxx ~1400-line new ImageIO implementation; expat-based XML parsing, all major encoding paths (ASCII, base64, appended raw/base64, zlib), byte-swap fix, tensor remap, direction cosines — two P2 notes around ZLib header endianness and a misleading comment in the writer
Modules/IO/VTK/include/itkVTIImageIO.h Clean header; public SwapBufferForByteOrder, private encode/decode helpers, DataEncoding enum, and well-documented deferred-feature list
Modules/IO/VTK/test/itkVTIImageIOTest.cxx Comprehensive round-trip test covering scalar, RGB, RGBA, vector, tensor, ASCII, binary, appended, and compressed paths with real pixel comparisons
Modules/IO/VTK/test/itkVTIImageIOSwapBufferTest.cxx 17-case unit test for SwapBufferForByteOrder covering all component sizes and endianness combinations without needing a BE runner
Modules/IO/VTK/test/itkVTIImageIODirectionTest.cxx Round-trip test for oblique direction cosines against a 45° Z-rotation fixture with per-element tolerance checking
Modules/IO/VTK/test/itkVTIImageIOGeneratedFixturesTest.cxx Cross-implementation validation: reads Python-stdlib-generated fixtures (scalar, RGBA, vector, tensor, zlib) and pixel-compares to MetaIO oracles
Modules/IO/VTK/test/itkVTIImageIOFutureFeaturesTest.cxx Guard tests for F-001/F-002/F-005/F-010 deferred features; each asserts the tagged exception fires on purpose-built fixture files
Modules/IO/VTK/test/generate_vti_fixtures.py Pure Python-stdlib fixture generator producing sub-KB .vti + MetaIO oracle pairs for 6 encoding/pixel-type combinations including guard-only fixtures

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ReadImageInformation] --> B{Security pre-scan\nDOCTYPE/ENTITY?}
    B -- reject --> ERR1[itkExceptionMacro]
    B -- ok --> C[Expat chunk parser]
    C --> D{AppendedData\nencountered?}
    D -- yes --> E[XML_StopParser\nSuspended]
    D -- no,EOF --> F[Parse complete]
    E --> G[Seek to _ marker\nrecord m_AppendedDataOffset]
    G --> H{encoding=base64?}
    H -- yes --> I[Read m_AppendedBase64Content]
    H -- no --> J[RawAppended offset stored]
    F --> K[Populate m_DataEncoding\ngeometry/pixel-type fields]
    subgraph Read
        K --> L{m_DataEncoding}
        L -- ASCII --> M[ReadBufferAsASCII]
        L -- Base64/ZLibBase64 --> N[DecodeBase64 + memcpy\nor DecompressZLib]
        L -- Base64Appended/ZLibBase64Appended --> O[DecodeBase64 m_AppendedBase64Content\nextract at m_DataArrayOffset]
        L -- ZLibAppended --> P[Seek in file stream header\nstream payload DecompressZLib]
        L -- RawAppended --> Q[Seek in file read directly]
        M & N & O & P & Q --> R[SwapBufferIfNeeded\npixel byte-order]
        R --> S{Tensor pixel type?}
        S -- yes --> T[TensorRemapGuard\nVTK XX,YY,ZZ,XY,YZ,XZ to ITK e00..e22]
        S -- no --> U[return]
        T --> U
    end
    subgraph Write
        V[Write] --> W{FileType}
        W -- ASCII --> X[WriteBufferAsASCII\ntensor remap VTK canonical]
        W -- Binary+UseCompression --> Y[CompressZLibVTK\nAppendedData encoding=raw]
        W -- Binary inline --> Z[UInt64 header + EncodeBase64]
    end
Loading

Reviews (4): Last reviewed commit: "ENH: Exercise compression for multiple p..." | Re-trigger Greptile

Comment thread Modules/IO/VTK/src/itkVTIImageIO.cxx
Comment thread Modules/IO/VTK/src/itkVTIImageIO.cxx Outdated
Comment thread Modules/IO/VTK/src/itkVTIImageIO.cxx
Comment thread Modules/IO/VTK/src/itkVTIImageIO.cxx Outdated
Comment thread Modules/IO/VTK/src/itkVTIImageIO.cxx Outdated
Comment thread Modules/IO/VTK/src/itkVTIImageIO.cxx
Comment thread Modules/IO/VTK/src/itkVTIImageIO.cxx Outdated
Copy link
Copy Markdown
Member Author

@dzenanz dzenanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this work correctly would be great!

Comment thread Modules/IO/VTK/include/itkVTIImageIO.h Outdated
@dzenanz
Copy link
Copy Markdown
Member Author

dzenanz commented Apr 17, 2026

Thank you for pushing this forward Hans!

@hjmjohnson hjmjohnson force-pushed the vtiSupport branch 2 times, most recently from 2e4b57c to d290d3b Compare April 17, 2026 19:48
@hjmjohnson
Copy link
Copy Markdown
Member

@dzenanz Thank you for your response, and for allowing me to "see what happens" as I try to build out improved tools for using agentic-coding as an accelerator for developers. I do NOT want to become a VTK vti file format expert. Still, I am willing to apply the "research first," "test second," "code last" pattern so that you can hopefully get this to a point where you have very little effort to apply your expert knowledge and use-case knowledge to take it to the last mile.

Your comments and suggestions are GREATLY appreciated.

@hjmjohnson
Copy link
Copy Markdown
Member

@greptileai review this draft before I make it official

Comment thread Modules/IO/VTK/src/itkVTIImageIO.cxx Outdated
@dzenanz
Copy link
Copy Markdown
Member Author

dzenanz commented Apr 21, 2026

David Thompson says about decimal separator problems: I believe that is intentionally disabled by forcing the "C" locale.

@dzenanz dzenanz marked this pull request as ready for review April 23, 2026 18:51
@dzenanz
Copy link
Copy Markdown
Member Author

dzenanz commented Apr 23, 2026

I manually checked only a small number of possible combinations of compression on/off, pixel types. I will use this in the coming week to compare hundreds of "reconstructed" images in .vti format with some ground truth in .nrrd format. That should give us basic confidence of reading images written by VTK. Exact writing code:

vtk_data = numpy_support.numpy_to_vtk(num_array=out.ravel(order='F'), deep=True, array_type=vtk.VTK_FLOAT)

# Create image data
image = vtk.vtkImageData()
image.SetDimensions(out.shape)
image.GetPointData().SetScalars(vtk_data)

# Write to .vti
writer = vtk.vtkXMLImageDataWriter()
writer.SetFileName(fbase+'.vti')
writer.SetInputData(image)
writer.Write()

@hjmjohnson
Copy link
Copy Markdown
Member

will use this in the coming week to compare hundreds of "reconstructed" images in .vti format with some ground truth in .nrrd format.

Let me know when you have finished with the comparisons, and I'll do a final code review.

@hjmjohnson
Copy link
Copy Markdown
Member

@greptile re-review

@dzenanz
Copy link
Copy Markdown
Member Author

dzenanz commented Apr 30, 2026

@hjmjohnson This is ready for another pass by you. 700 images in .vti format were read without crashing, and some similarity metric (MMI) produced reasonable, non-zero results.

Copy link
Copy Markdown
Member Author

@dzenanz dzenanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one remark / question.

Comment thread Modules/IO/VTK/src/itkVTIImageIO.cxx Outdated
@hjmjohnson
Copy link
Copy Markdown
Member

hjmjohnson commented May 1, 2026

@dzenanz Final triage checklist before this PR ships — items I think are still open, plus housekeeping suggestions.

Provenance — most of the recent commits

The 18 fixup commits I pushed today (everything from 1c0a3a515c onwards on top of your 8201cb15f6) were produced by asking Claude Opus 4.7 (1M context, xhigh effort) to do a deep code-review of the current PR state after CI went green and all explicit review threads were resolved. The model walked the source line-by-line, found four real bugs the existing review missed, two doc/scope-boundary gaps, and converted the four no-arg legacy tests to GoogleTest. Every commit was built (cxx + python wrapping) and tested locally before stacking. Two findings from the review were retracted after a second pass — I'd recommended a std::stoul exception-wrap (already present, lines 754–762) and an F-007 partial-file cleanup (the guard at line 1537 already fires before the ofstream opens at 1576) — neither shipped.

This is disclosed up-front so reviewers can give the AI-generated portion the closer scrutiny it deserves.

Deferred tasks (not in this PR; tracked for follow-up)
Tag Item Why deferred
F-001 vtkLZ4DataCompressor read guard exception today; needs LZ4 decompressor wired into the appended-raw path
F-002 vtkLZMADataCompressor read guard exception today; needs LZMA decompressor
F-003 Appended-raw writer (no compression) latent; no public API surfaces this today
F-004 Streaming read for appended-raw latent; needs StreamingImageIOBase inheritance instead of ImageIOBase
F-005 Multi-<Piece> images guard exception; needs Piece assembly logic
F-007 Binary symmetric-tensor write guard exception; needs binary-side mirror of the existing ASCII remap
F-008 Appended-base64 writer latent
F-009 MetaDataDictionary round-trip latent; would be a warning, not an exception
F-010 Catch-all unknown compressor guard exception (active)
F-011 <CellData>-only images guard exception (active, added in 685cfc77f4)
Big-endian round-trip CI job needs qemu-user s390x runner; the SwapBuffer unit tests cover the helper
Re-enable itkVTIImageIOReadWriteTestVHFColorZLib gated on ITK#4340 + cmake-w3-externaldata-upload#3 (storacha→Pinata migration)
Suggestion: squash to a few logical commits before merge

35 commits is a lot for review. The natural groupings:

| Bucket | Suggested message | Includes |
|---|---|
| 1. Initial impl + Bug fixes | ENH: Add VTIImageIO (your original 13 commits squashed) +
BUG: VTIImageIO scoping, exception-safety, and direction validation | the 5 BUG commits + the F-NNN doc commits |
| 2. Tests | ENH: VTIImageIO test coverage | the 6 ENH/test commits + the 8 GTest-conversion commits |

Two commits, one logical topic each, each independently revertable. Happy to do the squash on a separate branch if you want to review the squashed shape before force-pushing.

Housekeeping: convert in-tree data files to ExternalData CIDs

Modules/IO/VTK/test/Input/ currently holds 24 binary/text fixture files committed directly to git (added by the recent ingestion + my generator outputs):

reco2D_16line.{mha,vti}
VTI_guard_{lz4,lzma,multipiece,unknown_compressor}.vti
VTI_oblique_direction.{mhd,raw,vti}
VTI_rgba_u8_appended_raw.{mhd,raw,vti}
VTI_scalar_f32_zlib_appended.{mhd,raw,vti}
VTI_scalar_u8_appended_raw.{mhd,raw,vti}
VTI_tensor_f32_ascii.{mhd,raw,vti}
VTI_vector3_f32_zlib_appended.{mhd,raw,vti}

These should migrate to ExternalData *.cid content-link pointers like fibers.vtk.cid/ironProt.vtk.cid/matrix.vtk.cid already in the same directory. The migration is gated on the ExternalData upload tool returning (ITK#4340 / cmake-w3-externaldata-upload#3, storacha→Pinata). Once it's available:

  1. Upload each binary fixture, get its CID.
  2. Replace Foo.vtiFoo.vti.cid (one line containing the CID).
  3. Update itk_add_test / creategoogletestdriver references — the DATA{} macro already handles this transparently for legacy targets; the GTest driver's VTI_TEST_INPUT_DIR compile definition will need a small tweak to either resolve *.cid files at config time or read the resolved binaries from the build tree.

Same time, ITK has historically used *.md5 / *.sha512 pointer files in Input/ directories elsewhere in the tree. Those are also scheduled for replacement with *.cid once the upload tool returns — repo-wide migration, not specific to this module. Mentioning here so we don't add fresh .md5 / .sha512 files in the meantime; everything new should go straight to .cid.

@hjmjohnson hjmjohnson self-requested a review May 1, 2026 17:50
@hjmjohnson
Copy link
Copy Markdown
Member

Force-pushed c06d53f8f6 → c49151a361 to fix the gersemi formatting on the new ExternalData/staging block in Modules/IO/VTK/test/CMakeLists.txt. My fault — I ran the pre-commit gate before adding the CMakeLists edit and didn't re-run after; the canonical pre-commit-mandatory rule violation. No code or test changes from the prior tip; gersemi-only reformat of the add_custom_command block (multi-line COMMAND collapsed to a single line, OUTPUT split across two lines). Pre-commit clean locally now.

Copy link
Copy Markdown
Member

@hjmjohnson hjmjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dzenanz I'll let you take this over the finish line. I am happy with the state of this code. It addresses all the issues identified.

I do not plan to revisit this PR unless you explicitly ask me to. Good Luck!

@dzenanz
Copy link
Copy Markdown
Member Author

dzenanz commented May 1, 2026

I removed the following paragraph from the commit message:

A migration-guide entry documents the breaking change in
image_from_array semantics: image_from_array(arr) and
image_from_array(arr.T) no longer produce the same image -- the
shape rule is uniform regardless of the input's memory layout
(itk.size(image) == array.shape[::-1]).

as it is unrelated to this PR. I rebased on top of current main. Good to go from me.

hjmjohnson added a commit to dzenanz/ITK that referenced this pull request May 1, 2026
end-of-file-fixer pre-commit hook flagged a trailing empty line.
One-character fix to unblock CI on PR InsightSoftwareConsortium#6032.
@hjmjohnson
Copy link
Copy Markdown
Member

Pushed 96822b2f2a — strips a trailing blank line from Documentation/docs/releases/5.4.6.md that the end-of-file-fixer pre-commit hook was flagging. Standalone STYLE: commit (didn't fixup-into the introducing commit because that path led into a heavy interactive rebase across the merge history; one trailing line wasn't worth it). CI should go green once pre-commit re-runs.

@github-actions github-actions Bot added the area:Documentation Issues affecting the Documentation module label May 1, 2026
ImageIO module for reading and writing VTK XML ImageData (.vti) files.
expat-backed XML header parser; supports inline ASCII, inline base64,
appended raw, appended base64, and zlib-compressed appended encodings.
Handles scalar / Vector(3) / RGB(3) / RGBA(4) / symmetric-tensor(6)
pixel types; the symmetric-tensor on-disk layout is VTK-canonical
[XX,YY,ZZ,XY,YZ,XZ] and is remapped to ITK's [e00,e01,e02,e11,e12,e22]
(upper-triangular row-major) on read.

Writer emits version="1.0" / header_type="UInt64" / 3x3 row-major
Direction attribute; appended-raw writes pair with vtkZLibDataCompressor
when SetUseCompression(true).

Hardening:
  * <!DOCTYPE> / <!ENTITY> declarations are rejected up-front to
    mitigate billion-laughs and XXE attacks.
  * <DataArray> consumption is scoped to <PointData> children only;
    arrays inside <CellData>, <FieldData> etc. are not consumed
    (F-011 explicit guard).
  * Direction attribute parser rejects fewer-than-9 floats and
    trailing non-whitespace junk; warns when the matrix is not
    orthonormal (D^T * D != I) since ITK geometry pipelines assume
    orthonormality.
  * TensorRemapGuard remaps only on the successful Read() exit so a
    throw mid-decode does not scramble the caller's buffer.

Tests cover the round-trip for all encodings and pixel types, the
F-NNN deferred-feature guards (LZ4/LZMA decompression, multi-Piece,
binary symmetric-tensor write, unknown-compressor catch-all,
CellData-only), malformed-input rejection (truncated AppendedData,
non-numeric NumberOfComponents, Direction trailing junk,
non-orthonormal Direction, DOCTYPE/ENTITY rejection), and
pixel-equality round-trip against MetaIO oracles produced by an
independent Python-stdlib generator.  Four no-arg legacy CTests are
delivered as GoogleTest blocks via a new ITKIOVTKGTests driver.

Original scaffold + zlib/base64/appended-data + vtiSupport branch:
@dzenanz (with Sonnet 4.6 draft assistance).  Correctness review,
test coverage extension, F-NNN guard system, and migration-guide
documentation: @hjmjohnson (with Claude Opus 4.7 1M context xhigh
review).

Co-Authored-By: Hans J. Johnson <hans-johnson@uiowa.edu>
@hjmjohnson
Copy link
Copy Markdown
Member

Squashed to a single commit (28d4e32157). Folded the prior STYLE: EOF fix into the main ENH: commit; no tree changes (git diff of the pre-squash and post-squash tips is empty). Original ENH: commit message preserved.

hjmjohnson added a commit to dzenanz/ITK that referenced this pull request May 2, 2026
end-of-file-fixer pre-commit hook flags a trailing empty line on this
upstream-tracked file.  Same one-character fix as PR InsightSoftwareConsortium#6032; included
here so PR InsightSoftwareConsortium#4221's pre-commit CI can pass without waiting on that PR
to merge first.
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 2, 2026
Same upstream-inherited end-of-file-fixer hit blocking PR InsightSoftwareConsortium#6032 / InsightSoftwareConsortium#4221.
One-character fix included here so this PR's pre-commit CI can pass.
hjmjohnson added a commit to hjmjohnson/ITK that referenced this pull request May 2, 2026
`Documentation/docs/releases/5.4.6.md` carries a trailing empty line
that the `end-of-file-fixer` pre-commit hook flags every time a PR
rebases onto current `main`.  Affected at least PRs InsightSoftwareConsortium#6032, InsightSoftwareConsortium#4221, and
InsightSoftwareConsortium#6186, where each had to carry an identical one-character fix.  Apply
once on `main` to stop the spurious `pre-commit` failures.
@hjmjohnson hjmjohnson merged commit 65a487e into InsightSoftwareConsortium:main May 2, 2026
11 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Core Issues affecting the Core module area:Documentation Issues affecting the Documentation module area:IO Issues affecting the IO module area:Python wrapping Python bindings for a class type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants