Skip to content

CPU fp_predictor_decode gives wrong pixels for multi-band predictor=3 TIFFs #1247

@brendancol

Description

@brendancol

Describe the bug

The CPU decode path for TIFF predictor=3 (floating-point predictor) mis-handles multi-sample chunky data. Reading an externally-written GeoTIFF with Predictor=3, PlanarConfiguration=1 (chunky), and SamplesPerPixel > 1 via open_geotiff() returns garbage pixel values.

The writer is unaffected because it never emits predictor=3, only predictor=2.

Root cause

_reader.py:290-291 calls:

chunk = _apply_predictor(chunk, pred, width, height, bytes_per_sample * samples)

which routes to:

fp_predictor_decode(chunk, width, height, bytes_per_sample * samples)

Inside _fp_predictor_decode_row(row_data, width, bps) this treats the row as width super-samples, each bps * samples bytes wide, de-interleaving into bps * samples byte lanes of length width.

The TIFF Technical Note 3 spec (used by libtiff and GDAL) says the row should be de-interleaved into bps lanes of length width * samples. The GPU path at _gpu_decode.py:1350-1351 and :1598-1599 does this correctly:

_fp_predictor_decode_kernel[bpg, tpb](
    d_decomp, d_tmp, tile_width * samples, total_rows, dtype.itemsize)

So CPU and GPU decode diverge for multi-band predictor=3 files, and the CPU output does not match libtiff or GDAL.

Reproducer

For a 4-pixel-wide, 3-band float32 row (48 bytes, 96 bytes for 2 rows), manually TN3-encoding and decoding via fp_predictor_decode with the current signature gives 56 of 96 bytes wrong.

Expected behavior

open_geotiff(path) on a GDAL-written multi-band float32 TIFF with predictor=3 should return the same pixel values as open_geotiff(path, gpu=True) and as GDAL.

Fix

In _reader.py:_apply_predictor, when pred == 3 use

```python
fp_predictor_decode(chunk, width * samples, height, bytes_per_sample)
```

instead of

```python
fp_predictor_decode(chunk, width, height, bytes_per_sample * samples)
```

This matches the GPU path and the TN3 spec. Predictor=2 keeps its current call: the stride is bytes_per_pixel and bytes_per_sample * samples is equivalent for that path.

Severity

HIGH. Multi-band float32 TIFFs with predictor=3 are common in GDAL output. The failure is silent: no error, just wrong numbers.

Scope

  • xrspatial/geotiff/_reader.py: fix the dispatch.
  • xrspatial/geotiff/tests/test_predictor_multisample.py: add a regression test that decodes a TN3-encoded multi-band predictor=3 buffer.

Found by the /sweep-accuracy geotiff audit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions