Skip to content

Fix COG overview poisoned by nodata sentinel (#1613)#1618

Merged
brendancol merged 2 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-11-d
May 11, 2026
Merged

Fix COG overview poisoned by nodata sentinel (#1613)#1618
brendancol merged 2 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-11-d

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Fixes COG overview generation poisoned by nodata sentinel #1613. to_geotiff(..., cog=True, nodata=<finite>) produced corrupted overview pyramids because the NaN-to-sentinel rewrite ran before _make_overview / make_overview_gpu. np.nanmean (and the GPU equivalents) then saw the sentinel as a real value and biased every reduced pixel toward it.
  • Threads nodata through _block_reduce_2d, _block_reduce_2d_gpu, _make_overview, and make_overview_gpu so each reducer masks the sentinel back to NaN before aggregating. The writer's overview loop rewrites any all-sentinel reductions (NaN from the reducer) back to the sentinel for the on-disk pyramid.
  • Both CPU and GPU writer paths fixed. Same fix shape so the two backends stay in lockstep.

Reproduction (before fix)

arr = np.array([
    [1.0, 2.0, 3.0, 4.0],
    [np.nan, np.nan, np.nan, np.nan],
    [10.0, 20.0, 30.0, 40.0],
    [10.0, 20.0, 30.0, 40.0],
], dtype=np.float32)
to_geotiff(xr.DataArray(arr, dims=['y','x']), 'out.tif',
           nodata=-9999.0, cog=True, tile_size=2,
           overview_levels=[1], overview_resampling='mean')
open_geotiff('out.tif', overview_level=1).data
# [[-4998.75 -4997.75]   # poisoned by sentinel
#  [   15.      35.  ]]

After fix: [[1.5, 3.5], [15.0, 35.0]] (matches np.nanmean on the original NaN-keyed data).

Test plan

  • New regression tests in test_cog_overview_nodata_1613.py (11 tests, 8 CPU + 3 GPU): mean / min / max / median ignore the sentinel, partial-NaN blocks reduce correctly, all-NaN blocks reduce to NaN then get rewritten to the sentinel, integer dtype passthrough unchanged, CPU and GPU produce identical pyramids.
  • Existing TestOverviewResampling suite still passes (12 tests).
  • All 235 nodata/overview/cog tests in the geotiff suite pass.

The NaN-to-sentinel rewrite in to_geotiff and write_geotiff_gpu ran
before _make_overview / make_overview_gpu, so np.nanmean and the GPU
counterparts saw the sentinel as a finite value and biased every
overview pixel. A raster with NaN pixels and nodata=-9999 produced
overview cells like -4998.75 where the correct nan-aware mean was 1.5.

Thread a nodata kwarg through the reducers so they mask the sentinel
back to NaN before aggregating. The writer's overview loop passes
nodata in, then rewrites any all-sentinel cells (which surface as NaN
from the reducer) back to the sentinel for the on-disk pyramid.

CPU and GPU paths both fixed. New regression tests cover mean / min /
max / median, partial-NaN blocks, all-NaN blocks, integer dtype
passthrough, and CPU-GPU agreement.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 11, 2026
@brendancol brendancol requested a review from Copilot May 11, 2026 20:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes COG overview generation when nodata=<finite> is used on float rasters containing NaNs by ensuring overview reducers ignore the nodata sentinel (CPU + GPU paths), and adds a focused regression test suite for issue #1613.

Changes:

  • Thread nodata through CPU overview helpers (_block_reduce_2d, _make_overview) and mask sentinel back to NaN during reduction.
  • Mirror the same nodata-aware masking behavior in GPU overview helpers (_block_reduce_2d_gpu, make_overview_gpu) and in the GPU COG overview loop.
  • Add regression tests covering CPU/GPU behavior and direct helper-level reduction semantics for issue #1613.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
xrspatial/geotiff/_writer.py Adds nodata-aware masking in CPU overview reducers and rewrites NaNs back to sentinel in the overview loop.
xrspatial/geotiff/_gpu_decode.py Adds nodata-aware masking in GPU overview reducers and threads nodata through make_overview_gpu.
xrspatial/geotiff/__init__.py Passes nodata into GPU overview generation and rewrites reducer-produced NaNs back to the sentinel per level.
xrspatial/geotiff/tests/test_cog_overview_nodata_1613.py New regression tests validating correct overview values with nodata sentinels on CPU + GPU.
.claude/sweep-accuracy-state.csv Updates internal accuracy tracking notes to record the #1613 fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/_writer.py Outdated
Comment on lines +193 to +196
# and poisons the overview (issue #1613).
if (nodata is not None
and not np.isnan(nodata)
and np.isfinite(nodata)):
Comment thread xrspatial/geotiff/_writer.py Outdated
Comment on lines +1155 to +1158
# all-sentinel comes back as NaN; ``_write_tiled`` / ``_write_stripped``
# serialise that NaN to disk, where the eager reader will mask
# it (and a future writer pass could rewrite to ``nodata`` for
# external readers -- out of scope for this fix).
Comment thread xrspatial/geotiff/_writer.py Outdated
Comment on lines +215 to +217
# nanmean / nanmin / nanmax / nanmedian raise warnings on all-nan
# blocks; ``np.errstate`` would silence them but the resulting NaN is
# the desired output so we leave the warning visible.
Comment thread xrspatial/geotiff/_gpu_decode.py Outdated
Comment on lines +2926 to +2929
# honour it as missing-data (issue #1613).
if (nodata is not None
and not np.isnan(nodata)
and np.isfinite(nodata)):
- Drop redundant np.isfinite gate in _block_reduce_2d (CPU + GPU) so
  nodata=+/-inf is masked back to NaN like a finite sentinel, matching
  the upstream NaN->sentinel rewrite gate (`not np.isnan(nodata)` used
  at _writer.py:1171,1525,1620).
- Suppress RuntimeWarning from nanmean/nanmin/nanmax/nanmedian on
  all-NaN blocks locally; the all-NaN output is the desired signal
  that the overview loop rewrites to the sentinel, so the warning was
  noise on every nodata-border COG write.
- Fix the comment above the overview loop: NaN from an all-sentinel
  reduction is rewritten back to the sentinel before _write_tiled /
  _write_stripped runs, not serialised as NaN.
- Add regression tests covering nodata=inf (CPU + GPU) and the
  warning-suppression contract for all-NaN blocks.
@brendancol brendancol merged commit dd907b8 into main May 11, 2026
10 of 11 checks passed
@brendancol brendancol deleted the deep-sweep-accuracy-geotiff-2026-05-11-d branch May 15, 2026 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

COG overview generation poisoned by nodata sentinel

2 participants