Skip to content

Fix integer COG overview poisoning by sentinel value#1691

Merged
brendancol merged 3 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-12-v2
May 12, 2026
Merged

Fix integer COG overview poisoning by sentinel value#1691
brendancol merged 3 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-12-v2

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • to_geotiff(int_data, cog=True, nodata=N) produced overview pyramids that mixed the sentinel into surrounding valid pixels for mean, min, max, and median resampling.
  • The reader can't mask poisoned pixels back to NaN because they don't equal the sentinel, so the user silently sees garbage at every zoom level above 0.
  • Fix masks the sentinel to NaN before the nan-aware reduction, gated on representability in the source integer dtype, and rewrites all-sentinel blocks back to the sentinel before the integer cast.
  • GPU mirror in _block_reduce_2d_gpu gets the same fix for byte parity with CPU.

Repro

import numpy as np, xarray as xr, tempfile
from xrspatial.geotiff import to_geotiff, open_geotiff

H, W = 256, 256
data = np.full((H, W), 100, dtype=np.int16)
data[64:129, 64:129] = -9999

da = xr.DataArray(data, dims=('y', 'x'),
                  coords={'y': np.arange(H, dtype=np.float64),
                          'x': np.arange(W, dtype=np.float64)},
                  attrs={'crs': 4326})

with tempfile.NamedTemporaryFile(suffix='.tif', delete=False) as f:
    path = f.name

to_geotiff(da, path, nodata=-9999, cog=True,
           overview_levels=[1], overview_resampling='mean')

# Before this PR:
# Level 1 unique: [-9999 -4950 -2425   100]
# The -4950 and -2425 are sentinel-poisoned averages.
#
# After this PR:
# Level 1 unique: [-9999  100]

Test plan

  • pytest xrspatial/geotiff/tests/test_cog_int_overview_nodata_2026_05_12.py -- 38 new tests pass.
  • pytest xrspatial/geotiff/tests/test_cog_overview_nodata_1613.py xrspatial/geotiff/tests/test_cog_cubic_overview_nodata_1623.py -- regression coverage for the float and cubic cases still passes.
  • Full geotiff suite (1606 tests pre-PR) stays green.
  • CPU and GPU integer overview output is byte-equivalent (covered by test_gpu_cpu_int_overview_byte_match).

Notes

  • Found via /sweep-accuracy on the geotiff module (pass 20). State CSV updated in this PR.
  • No new public API. Internal helpers _block_reduce_2d and _block_reduce_2d_gpu accept a more permissive nodata kwarg; existing callers (_make_overview / make_overview_gpu / write()) pass it through unchanged.
  • Deferred LOW for a future PR: _read_cog_http accepts band=-1 and silently returns the last channel, while the local / dask / GPU paths raise IndexError (added in Local eager band parameter accepts negative and out-of-range values silently #1673). Separate backend-parity gap; opening as a follow-up per the one-finding-per-PR policy.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 12, 2026
@brendancol brendancol requested a review from Copilot May 12, 2026 17:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes integer COG overview “sentinel poisoning” during overview pyramid generation by ensuring integer nodata sentinel values are masked to NaN before nan-aware reductions, and that all-sentinel blocks are rewritten back to the sentinel prior to casting back to integer. This aligns CPU and GPU overview generation behavior and prevents corrupted overview values that the reader cannot later re-mask.

Changes:

  • Update _block_reduce_2d (CPU) to mask representable integer sentinels to NaN for nan-aware overview reductions, and rewrite all-NaN reduced blocks back to the sentinel before integer casting.
  • Apply the same integer-sentinel masking + NaN rewrite logic in _block_reduce_2d_gpu for byte-equivalent CPU/GPU overviews.
  • Add a comprehensive regression test suite covering unit-level reducers, end-to-end COG round-trips, and CPU/GPU parity.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
xrspatial/geotiff/_writer.py Fix integer overview reduction by sentinel-masking before nan-aware aggregation and restoring sentinel for all-missing blocks pre-cast; update related docstrings.
xrspatial/geotiff/_gpu_decode.py Mirror the integer sentinel masking + all-missing rewrite logic on GPU for CPU/GPU parity.
xrspatial/geotiff/tests/test_cog_int_overview_nodata_2026_05_12.py Add extensive regression coverage for integer nodata behavior across methods/dtypes, end-to-end COG, and CPU/GPU byte parity.
.claude/sweep-accuracy-state.csv Update sweep accuracy state to record the finding/fix and test coverage notes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/_writer.py Outdated
Comment on lines 163 to 168
float case). The sentinel is ignored for ``nearest`` and ``mode``
methods (those pick existing values rather than synthesise new
averages). The ``cubic`` branch honours ``nodata`` by masking the
sentinel to NaN, running cubic with ``prefilter=False`` to keep the
kernel local, and rewriting any NaN in the output back to the
sentinel before returning (issue #1623).
brendancol added a commit that referenced this pull request May 12, 2026
Per Copilot review on PR #1691: the docstring previously claimed the
sentinel is "ignored" for mode and nearest, but neither method applies
any nodata masking. nearest returns the top-left pixel of each 2x2
block, so the sentinel survives if it's in that corner. mode runs over
raw values, so the sentinel can be selected as the overview pixel if
it's the most frequent value in the block. Clarify the docstring to
reflect actual behavior without claiming masking that doesn't happen.

Docstring-only change. The existing tests already pin mode/nearest at
their actual (sentinel-passthrough) behavior, so no test updates needed.
`to_geotiff(int_data, cog=True, nodata=N)` produced overview pyramids that
mixed the sentinel into surrounding valid pixels for `mean`, `min`, `max`,
and `median` resampling. The reader can't mask the poisoned values back to
NaN because they don't equal the sentinel, so the user silently sees
garbage at every zoom level above 0.

Root cause: `_block_reduce_2d` (`_writer.py:258-264`) and
`_block_reduce_2d_gpu` (`_gpu_decode.py:3027-3028`) promoted the integer
block to `float64` but never masked the sentinel to NaN before calling
`nanmean` / `nanmin` / `nanmax` / `nanmedian`. The reduction then averaged
the sentinel as if it were signal -- `(-9999 + 100 + 100 + 100) / 4 = -2425`
cast back to `int16`.

Fix: apply the same sentinel-to-NaN mask the float branch already uses,
gated on the sentinel being representable in the source integer dtype
(mirrors `_int_nodata_in_range` in `_reader.py`). After the reduction,
all-sentinel blocks come back as NaN; rewrite those to the sentinel before
the integer dtype cast so the cast is well-defined. The caller's
post-overview rewrite loop in `write()` only runs for floats, so the integer
branch closes the loop itself. GPU mirror gets the same treatment for byte
parity with CPU (the contract from #1623).

Tests: 38 cases in test_cog_int_overview_nodata_2026_05_12.py covering the
`_block_reduce_2d` per-dtype / per-method matrix (uint8 / uint16 / int16 /
int32 x mean / min / max / median), the all-sentinel block case, no-nodata
regression, out-of-range sentinel no-op, end-to-end uint16 + int16
round-trip, 3-band integer COG, GPU per-dtype / per-method matrix, and
CPU/GPU byte-match parity. All 1606 existing geotiff tests still pass.

Found via /sweep-accuracy on the geotiff module (pass 20). State CSV
updated.
Per Copilot review on PR #1691: the docstring previously claimed the
sentinel is "ignored" for mode and nearest, but neither method applies
any nodata masking. nearest returns the top-left pixel of each 2x2
block, so the sentinel survives if it's in that corner. mode runs over
raw values, so the sentinel can be selected as the overview pixel if
it's the most frequent value in the block. Clarify the docstring to
reflect actual behavior without claiming masking that doesn't happen.

Docstring-only change. The existing tests already pin mode/nearest at
their actual (sentinel-passthrough) behavior, so no test updates needed.
@brendancol brendancol force-pushed the deep-sweep-accuracy-geotiff-2026-05-12-v2 branch from df5a50d to 447f20f Compare May 12, 2026 18:28
@brendancol brendancol merged commit 2bc44f6 into main May 12, 2026
10 checks passed
@brendancol brendancol deleted the deep-sweep-accuracy-geotiff-2026-05-12-v2 branch May 15, 2026 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants