diff --git a/.claude/sweep-accuracy-state.csv b/.claude/sweep-accuracy-state.csv
index 4698b929..bc765361 100644
--- a/.claude/sweep-accuracy-state.csv
+++ b/.claude/sweep-accuracy-state.csv
@@ -12,7 +12,7 @@ emerging_hotspots,2026-04-30,,MEDIUM,2;3,MEDIUM: threshold_90 uses int() (trunca
 fire,2026-04-30,,,,All ops per-pixel (no accumulation/stencil/projected distance). NaN handled via x!=x; CUDA bounds use strict <; rdnbr and ros divisions guarded; CPU/GPU/dask paths algorithmically identical. No accuracy issues found.
 flood,2026-04-30,,MEDIUM,2;5,"MEDIUM (not fixed): dask backend preserves float32 input dtype while numpy promotes to float64 in flood_depth and curve_number_runoff; DataArray inputs for curve_number, mannings_n bypass scalar > 0 (and CN <= 100) range validation, silently producing NaN/garbage."
 focal,2026-03-30T13:00:00Z,1092,,,
-geotiff,2026-05-11,1613,HIGH,2;5,"Pass 15 (2026-05-11): HIGH fixed -- issue #1613. to_geotiff(cog=True, nodata=<finite>) on a float raster with NaN produced a corrupted overview pyramid. The NaN-to-sentinel rewrite in __init__.py:1202 (CPU) and :2852 (GPU write_geotiff_gpu) ran BEFORE _make_overview / make_overview_gpu, so the nan-aware aggregations (np.nanmean/min/max/median, cupy.nanmean/min/max/median) saw the sentinel as a real number and biased every overview pixel. Reproduction with -9999 sentinel produced [[-4998.75,-4997.75],..] where np.nanmean gives [[1.5,3.5],..]. Both CPU and GPU paths affected; backend results matched each other but were both wrong (CAT 2 NaN propagation + CAT 5 documents the parity). Fix: _block_reduce_2d / _block_reduce_2d_gpu accept a nodata kwarg that masks the sentinel back to NaN for float dtypes before the reduction; the writer's overview loop passes nodata in, then rewrites all-sentinel reductions (which surface as NaN from the reducer) back to the sentinel for the on-disk pyramid. 11 regression tests in test_cog_overview_nodata_1613.py (CPU mean / partial-block / min/max/median / no-nodata passthrough / helper kwarg / all-sentinel block / GPU mean / GPU helper / CPU-GPU agreement). All 235 nodata/overview/cog tests still pass. | Pass 14 (2026-05-11): HIGH fixed -- issue #1611.  read_vrt(band=None) on a multi-band integer VRT with per-band <NoDataValue> tags only masks band 0's sentinel. __init__.py lines 2795-2809 in read_vrt apply vrt.bands[0].nodata to the full ndim==3 array; bands 1+ keep their integer sentinels as literal finite values (e.g. 65000 surfaces as 65000.0 after the dtype=float64 cast, not NaN). Float-VRT path masks per-band correctly in _vrt._read_data lines 296-297 + 347-351. PR #1602 fixed the single-band band=N case for issue #1598; the band=None multi-band case is the same class of bug. Repro: 2-band uint16 VRT with NoDataValue 65535 / 65000 returns r.values[1,1,1] == 65000.0 instead of NaN; r.values[1,1,0] is NaN (band 0 sentinel masked). Fix scope: in read_vrt, when band is None, iterate over vrt.bands and mask each arr[..., i] slice against its own <NoDataValue> (gated by the same _int_nodata_in_range guard PR #1583 introduced). Severity HIGH (Cat 2 NaN propagation + Cat 5 backend inconsistency: identical input semantics produce different masking outcomes based on dtype, with finite garbage values where NaN expected). Fix in PR #1612: walks vrt.bands when band is None and ndim==3, masks each arr[..., i] slice against its own <NoDataValue> via the refactored _sentinel_for_dtype helper (reuses PR #1583's range guard so out-of-range/non-finite/fractional sentinels are a no-op). attrs['nodata'] still carries band 0's sentinel for band=None reads (documented contract). 7 regression tests in test_vrt_multiband_int_nodata_1611.py: uint16 per-band, int32 negative, mixed presence, dtype preservation when no sentinel hit, out-of-range gating, band=N non-regression, attrs contract. 135 existing vrt/nodata geotiff tests still pass. | Pass 13 (2026-05-11): HIGH fixed -- issue #1599. write_geotiff_gpu (and to_geotiff gpu=True) emitted raw NaN bytes for missing pixels even when nodata=<finite> was supplied, while the CPU writer substituted NaN with the sentinel before encoding. xrspatial-only round-trips were unaffected (the reader masks both NaN and the sentinel), but external readers (rasterio/GDAL/QGIS) that mask only on the GDAL_NODATA tag saw NaN pixels as valid data -- rasterio reported 100% valid pixels on a 25-NaN file vs CPU's 25-invalid report. Root cause: __init__.py lines 2579-2587 jumped from shape/dtype resolution straight to compression, missing the equivalent of the CPU writer's NaN-to-sentinel rewrite at to_geotiff line ~1156. Fix: cupy.isnan + masked write on a defensive copy of arr, gated on np_dtype.kind=='f' and not np.isnan(float(nodata)). Caller's CuPy buffer preserved (copy before mutate). 7 regression tests in test_gpu_writer_nan_sentinel_1599.py: substitution lands as sentinel, CPU/GPU byte-equivalent, caller buffer not mutated, no-NaN no-op, NaN sentinel skips substitution, rasterio sees identical invalid count on CPU/GPU, multiband 3D path. All other GPU writer tests still pass (50 passed across band-first, attrs, nodata, dask+cupy, writer, nodata aliases). | Pass 12 (2026-05-11): HIGH fixed -- issue #1581. Reading a uint TIFF with a negative GDAL_NODATA sentinel (e.g. uint16 + -9999) raised OverflowError on every backend because the nodata-mask code did arr.dtype.type(int(nodata)) with no range check. Three identical cast sites in __init__.py (numpy eager, _apply_nodata_mask_gpu, _delayed_read_window) plus _resolve_masked_fill and _sparse_fill_value in _reader.py. Fix: _int_nodata_in_range helper gates the cast; out-of-range sentinels are a no-op for value matching (the file can never contain that value), file dtype is preserved, attrs['nodata'] still surfaces the original sentinel so write round-trips keep the GDAL_NODATA tag intact. Matches rasterio behavior. 8 regression tests in test_nodata_out_of_range_1581.py cover the helper, both eager and dask read paths, in-range sentinel non-regression, and GPU helper (cupy-gated). | Pass 11 (2026-05-10): CLEAN. Audited the one additional commit since pass 10 -- #1559 (PR 1548, Centralise GeoTIFF attrs population across all read backends). Refactor extracts _populate_attrs_from_geo_info helper and routes eager numpy, dask, GPU stripped, GPU tiled read paths through it; before the fix dask only emitted crs/transform/raster_type/nodata while numpy emitted the full attrs set including x/y_resolution, resolution_unit, image_description, extra_samples, GDAL metadata, and the CRS-description fields. No data-path arithmetic touched; only attrs dict population. Windowed origin math (origin_x + c0*pixel_width, origin_y + r0*pixel_height) verified to produce -98.0 / 48.75 origin for window=(10,20,50,70) on a (0.1,-0.125) pixel-size raster, with PixelIsArea half-pixel offset preserved on coord lookups (-97.95, 48.6875). Cross-backend attrs parity re-verified: numpy/dask/cupy all emit identical key set on deflate+predictor3+nodata round-trip (crs, crs_wkt, nodata, transform, x_resolution, y_resolution). Data bit-parity re-verified across numpy/dask/cupy on same payload (np.array_equal with equal_nan=True). test_attrs_parity_1548.py (5 tests), test_reader.py/test_writer.py/test_dask_cupy_combined.py (25 tests), GPU orientation/predictor2-BE/LERC-mask/nodata/byteswap suites (65 tests) all green. No accuracy or backend-divergence findings. | Pass 10 (2026-05-10): CLEAN. Audited 5 recent commits: #1558 drop-defensive-copies (frombuffer path still .copy()s before in-place predictor decode at _reader.py:778), #1556 fp-predictor ngjit (writer pre-ravels so 1-D slice arg is correct, float32/64 LE+BE bit-exact), #1552 batched D2H (OOM guard fires before cupy.concatenate, host_buf offsets correct), #1551 parallel-decode gate (>= vs > sends 256x256 default to parallel path, no value diff confirmed via partial-tile parity), #1549 nvjpeg constants (gray + RGB GPU JPEG decode pixel-identical to Pillow CPU, max diff = 0). Cross-backend parity re-verified clean: numpy/dask+numpy/cupy/dask+cupy equal .data/.dtype/.coords/nodata/NaN-mask on deflate+predictor3+nodata; orientations 1-8 numpy==GPU; partial edge tiles 100x150, 257x383, 512x257 numpy==GPU==dask; predictor2 LE/BE round-trip uint8/int16/uint16/int32/uint32 pass; predictor3 LE/BE float32/64 pass. Deferred LOW (pre-existing, not opened): float16 (bps=16, SampleFormat=3) absent from tiff_dtype_to_numpy map - writer never emits, asymmetric but unreachable. | Pass 9 (2026-05-09): TWO HIGH fixed -- (a) PR #1539 closes #1537: TIFF Orientation tag 2/3/4 (mirror flips) on georeferenced files left y/x coords computed from the un-flipped transform, so xarray label lookups returned the wrong pixel even though _apply_orientation flipped the buffer. PR #1521 only updated the transform for the 5-8 axis-swap branch. Fix updates origin and pixel-scale signs along whichever axes were flipped, for both PixelIsArea (origin shifts by N*step) and PixelIsPoint (shifts by (N-1)*step). 10 new tests in test_orientation.py. (b) PR #1546 closes #1540: read_geotiff_gpu ignored Orientation tag completely; CPU correctly applied 2-8 (PR #1521) but GPU returned the raw stored buffer. Cross-backend disagreement on every non-default orientation. Fix adds _apply_orientation_gpu (cupy slicing mirror of the CPU helper) and _apply_orientation_geo_info, threads them into the tiled GPU pipeline, reuses CPU-fallback geo_info for the stripped path to avoid double-applying. 28 new tests in test_orientation_gpu.py (every orientation, single-band tiled, single-band stripped, 3-band tiled, mirror-flip sel-fidelity, default no-tag passthrough). Re-confirmed clean: HTTP coalesce_ranges with overlapping ranges and zero-length ranges, parallel streaming write thread-safety (each tile gets independent buffer via copy or padded zeros), planar=2 + chunky GPU LERC mask propagation matches CPU, IFD chain cap MAX_IFDS=256, max_z_error round-trip on tiled write, _resolve_masked_fill float vs integer dtype semantics. Deferred LOW: per-sample LERC mask (3D mask (h,w,samples)) collapsed to per-pixel ""any sample invalid"" on GPU while CPU honours per-sample; LERC implementations rarely emit 3D masks (verified: lerc.encode with 2D mask on 3-band returns 2D mask). Documented planar=2 + LERC + GPU silently drops mask (rare in practice, source comment acknowledges). | Pass 8 (2026-05-07): HIGH fixed in fix-jpeg-tiff-disable -- to_geotiff(compression='jpeg') wrote files that no external reader can decode. The writer tags compression=7 (new-style JPEG) but emits a self-contained JFIF stream per tile/strip and never writes the JPEGTables tag (347) that the TIFF spec requires for that codec. libtiff/GDAL/rasterio all reject the file with TIFFReadEncodedStrip() failed; our reader round-trips because Pillow decodes the standalone JFIF, hiding the break. Pass-4 notes flagged the read side of the same JPEGTables gap and deferred it; pass-8 covers the write side. Fix: reject compression='jpeg' at the to_geotiff entry with a clear ValueError pointing at deflate/zstd/lzw. The internal _writer.write is untouched so the existing self-decoding tests still cover the codec; re-enabling the public path needs a JPEGTables-aware encoder. PR diffs reviewed but not merged: #1512 (BytesIO source) and #1513 (LERC max_z_error) -- both look correct; #1512 file-like read path goes through read_all() once so the per-call BytesIOSource lock is theoretical, and #1513 forwards max_z_error through every overview/tile/strip/streaming path including _write_vrt_tiled and _compress_block. No regressions found in either open PR. Other surfaces audited clean: predictor=3 with float16 (writer auto-promotes to float32 on both eager and streaming paths, value-exact round-trip); planar=2 multi-tile read uses band_idx*tiles_per_band offset so no cross-contamination between planes; _header.py multi-byte tag parsing uses bo (byte_order) consistently; Pillow YCbCr-vs-tagged-RGB photometric mismatch becomes moot once JPEG is disabled. Deferred (LOW/MEDIUM, not filed): JPEG2000 writer accepts arbitrary dtype with no validation (rare codec, narrow risk); float16 dtype not in tiff_dtype_to_numpy decode map (writer never emits it - asymmetric but unreachable); Orientation tag (274) still ignored on read (pass-4 deferral). | Pass 7 (2026-05-07): HIGH fixed in fix-mmap-cache-refcount-after-replace -- _MmapCache.release() looked up the cache entry by realpath, so a holder that acquired the OLD mmap before an os.replace and released it AFTER another caller had acquired the post-replace entry would decrement the new holder's refcount. Subsequent eviction (cache full, or another acquire) closed the still-in-use mmap, breaking reads with 'mmap closed or invalid'. Real exposure: any concurrent reader/writer pattern where to_geotiff replaces a file that another reader had just opened via open_geotiff with chunks= or via _FileSource. PR #1506 added stale-replacement detection but did not fix the refcount confusion across the pop. Fix: acquire returns an opaque entry token; release takes the token and decrements that exact entry, regardless of cache state. Orphaned (popped) entries close their fh+mmap when their own refcount hits zero. _FileSource updated to pass the token. Regression test test_release_after_path_replacement_does_not_clobber_new_holder added. All 665 geotiff tests pass; GPU path verified. | Pass 6 (2026-05-07) PR #1507: BE pred2 numba TypingError. | Pass 5 (2026-05-06) PR #1506: mmap cache stale after file replace. | Pass 4 (2026-05-06) PR #1501: sparse COG tiles. | Pass 3 (2026-05-06) PR #1500: predictor=3 byte order. | Pass 2 (2026-05-05) PR #1498: predictor=2 sample-wise. | Pass 1 (2026-04-23) PR #1247. Re-confirmed clean over passes 2-7: items 2 (writer always emits LE TIFFs - hardcoded b'II'), 3 (RowsPerStrip default = height when missing), 4 (StripByteCounts missing raises clear ValueError), 5 (TileWidth without TileLength caught by 'tw <= 0 or th <= 0' check at _reader.py:688), 9 (read determinism on compressed+tiled+multiband), 11 (predictor=2 with awkward sample stride round-trips), 18 (compression_level=99 raises ValueError 'out of range for deflate (valid: 1-9)'), 21 (concurrent writes serialize correctly via mkstemp+os.replace), 24 (uint16 dtype preserved on numpy backend, dask honors chunks param), 26 (chunks rounds correctly with remainder chunk for non-tile-aligned). Deferred: item 8 (BytesIO/file-like sources are not supported, source.lower() error) - documented as 'str' parameter, not a bug; item 19 (LERC max_z_error not user-exposed by to_geotiff) - missing feature, not a bug."
+geotiff,2026-05-11,1623,HIGH,2;5,"Pass 16 (2026-05-11): HIGH fixed -- issue #1623. to_geotiff(cog=True, overview_resampling='cubic', nodata=<finite>) on a float raster with NaN regions produced overview pixels with severe ringing artefacts near nodata borders. Same class of bug as #1613 but for the cubic branch: writer rewrites NaN to the sentinel upstream, then _block_reduce_2d(method=cubic) handed the sentinel-poisoned array straight to scipy.ndimage.zoom(order=3). The cubic spline blended the sentinel (e.g. -9999) into neighbouring cells, producing values like 1133.44, -10290.08 where the data was a constant 100. Repro on 16x16 float32 with a 4x4 NaN corner showed 18 polluted pixels in the 8x8 overview. Fix: when nodata is supplied on a float dtype and the sentinel is found, mask sentinel to NaN, run cubic with prefilter=False so a single NaN cannot poison the entire row/column (default B-spline prefilter is global), then rewrite any NaN in the result back to the sentinel. prefilter=False only fires when a sentinel is present so the non-nodata cubic semantics are unchanged. GPU side: _block_reduce_2d_gpu previously raised on method='cubic'; added a CPU fallback (same pattern as 'mode') so GPU writer produces byte-equivalent overviews. GPU_OVERVIEW_METHODS now includes 'cubic'. 12 regression tests in test_cog_cubic_overview_nodata_1623.py (helper no-ringing, poisoning repro, no-nodata unchanged, end-to-end round-trip, GPU fallback, CPU/GPU byte-match, +/-inf nodata mask, NaN-sentinel no-op, GPU_OVERVIEW_METHODS contract). All 1256 existing geotiff tests still pass (3 pre-existing matplotlib failures unrelated). | Pass 15 (2026-05-11): HIGH fixed -- issue #1613. to_geotiff(cog=True, nodata=<finite>) on a float raster with NaN produced a corrupted overview pyramid. The NaN-to-sentinel rewrite in __init__.py:1202 (CPU) and :2852 (GPU write_geotiff_gpu) ran BEFORE _make_overview / make_overview_gpu, so the nan-aware aggregations (np.nanmean/min/max/median, cupy.nanmean/min/max/median) saw the sentinel as a real number and biased every overview pixel. Reproduction with -9999 sentinel produced [[-4998.75,-4997.75],..] where np.nanmean gives [[1.5,3.5],..]. Both CPU and GPU paths affected; backend results matched each other but were both wrong (CAT 2 NaN propagation + CAT 5 documents the parity). Fix: _block_reduce_2d / _block_reduce_2d_gpu accept a nodata kwarg that masks the sentinel back to NaN for float dtypes before the reduction; the writer's overview loop passes nodata in, then rewrites all-sentinel reductions (which surface as NaN from the reducer) back to the sentinel for the on-disk pyramid. 11 regression tests in test_cog_overview_nodata_1613.py (CPU mean / partial-block / min/max/median / no-nodata passthrough / helper kwarg / all-sentinel block / GPU mean / GPU helper / CPU-GPU agreement). All 235 nodata/overview/cog tests still pass. | Pass 14 (2026-05-11): HIGH fixed -- issue #1611.  read_vrt(band=None) on a multi-band integer VRT with per-band <NoDataValue> tags only masks band 0's sentinel. __init__.py lines 2795-2809 in read_vrt apply vrt.bands[0].nodata to the full ndim==3 array; bands 1+ keep their integer sentinels as literal finite values (e.g. 65000 surfaces as 65000.0 after the dtype=float64 cast, not NaN). Float-VRT path masks per-band correctly in _vrt._read_data lines 296-297 + 347-351. PR #1602 fixed the single-band band=N case for issue #1598; the band=None multi-band case is the same class of bug. Repro: 2-band uint16 VRT with NoDataValue 65535 / 65000 returns r.values[1,1,1] == 65000.0 instead of NaN; r.values[1,1,0] is NaN (band 0 sentinel masked). Fix scope: in read_vrt, when band is None, iterate over vrt.bands and mask each arr[..., i] slice against its own <NoDataValue> (gated by the same _int_nodata_in_range guard PR #1583 introduced). Severity HIGH (Cat 2 NaN propagation + Cat 5 backend inconsistency: identical input semantics produce different masking outcomes based on dtype, with finite garbage values where NaN expected). Fix in PR #1612: walks vrt.bands when band is None and ndim==3, masks each arr[..., i] slice against its own <NoDataValue> via the refactored _sentinel_for_dtype helper (reuses PR #1583's range guard so out-of-range/non-finite/fractional sentinels are a no-op). attrs['nodata'] still carries band 0's sentinel for band=None reads (documented contract). 7 regression tests in test_vrt_multiband_int_nodata_1611.py: uint16 per-band, int32 negative, mixed presence, dtype preservation when no sentinel hit, out-of-range gating, band=N non-regression, attrs contract. 135 existing vrt/nodata geotiff tests still pass. | Pass 13 (2026-05-11): HIGH fixed -- issue #1599. write_geotiff_gpu (and to_geotiff gpu=True) emitted raw NaN bytes for missing pixels even when nodata=<finite> was supplied, while the CPU writer substituted NaN with the sentinel before encoding. xrspatial-only round-trips were unaffected (the reader masks both NaN and the sentinel), but external readers (rasterio/GDAL/QGIS) that mask only on the GDAL_NODATA tag saw NaN pixels as valid data -- rasterio reported 100% valid pixels on a 25-NaN file vs CPU's 25-invalid report. Root cause: __init__.py lines 2579-2587 jumped from shape/dtype resolution straight to compression, missing the equivalent of the CPU writer's NaN-to-sentinel rewrite at to_geotiff line ~1156. Fix: cupy.isnan + masked write on a defensive copy of arr, gated on np_dtype.kind=='f' and not np.isnan(float(nodata)). Caller's CuPy buffer preserved (copy before mutate). 7 regression tests in test_gpu_writer_nan_sentinel_1599.py: substitution lands as sentinel, CPU/GPU byte-equivalent, caller buffer not mutated, no-NaN no-op, NaN sentinel skips substitution, rasterio sees identical invalid count on CPU/GPU, multiband 3D path. All other GPU writer tests still pass (50 passed across band-first, attrs, nodata, dask+cupy, writer, nodata aliases). | Pass 12 (2026-05-11): HIGH fixed -- issue #1581. Reading a uint TIFF with a negative GDAL_NODATA sentinel (e.g. uint16 + -9999) raised OverflowError on every backend because the nodata-mask code did arr.dtype.type(int(nodata)) with no range check. Three identical cast sites in __init__.py (numpy eager, _apply_nodata_mask_gpu, _delayed_read_window) plus _resolve_masked_fill and _sparse_fill_value in _reader.py. Fix: _int_nodata_in_range helper gates the cast; out-of-range sentinels are a no-op for value matching (the file can never contain that value), file dtype is preserved, attrs['nodata'] still surfaces the original sentinel so write round-trips keep the GDAL_NODATA tag intact. Matches rasterio behavior. 8 regression tests in test_nodata_out_of_range_1581.py cover the helper, both eager and dask read paths, in-range sentinel non-regression, and GPU helper (cupy-gated). | Pass 11 (2026-05-10): CLEAN. Audited the one additional commit since pass 10 -- #1559 (PR 1548, Centralise GeoTIFF attrs population across all read backends). Refactor extracts _populate_attrs_from_geo_info helper and routes eager numpy, dask, GPU stripped, GPU tiled read paths through it; before the fix dask only emitted crs/transform/raster_type/nodata while numpy emitted the full attrs set including x/y_resolution, resolution_unit, image_description, extra_samples, GDAL metadata, and the CRS-description fields. No data-path arithmetic touched; only attrs dict population. Windowed origin math (origin_x + c0*pixel_width, origin_y + r0*pixel_height) verified to produce -98.0 / 48.75 origin for window=(10,20,50,70) on a (0.1,-0.125) pixel-size raster, with PixelIsArea half-pixel offset preserved on coord lookups (-97.95, 48.6875). Cross-backend attrs parity re-verified: numpy/dask/cupy all emit identical key set on deflate+predictor3+nodata round-trip (crs, crs_wkt, nodata, transform, x_resolution, y_resolution). Data bit-parity re-verified across numpy/dask/cupy on same payload (np.array_equal with equal_nan=True). test_attrs_parity_1548.py (5 tests), test_reader.py/test_writer.py/test_dask_cupy_combined.py (25 tests), GPU orientation/predictor2-BE/LERC-mask/nodata/byteswap suites (65 tests) all green. No accuracy or backend-divergence findings. | Pass 10 (2026-05-10): CLEAN. Audited 5 recent commits: #1558 drop-defensive-copies (frombuffer path still .copy()s before in-place predictor decode at _reader.py:778), #1556 fp-predictor ngjit (writer pre-ravels so 1-D slice arg is correct, float32/64 LE+BE bit-exact), #1552 batched D2H (OOM guard fires before cupy.concatenate, host_buf offsets correct), #1551 parallel-decode gate (>= vs > sends 256x256 default to parallel path, no value diff confirmed via partial-tile parity), #1549 nvjpeg constants (gray + RGB GPU JPEG decode pixel-identical to Pillow CPU, max diff = 0). Cross-backend parity re-verified clean: numpy/dask+numpy/cupy/dask+cupy equal .data/.dtype/.coords/nodata/NaN-mask on deflate+predictor3+nodata; orientations 1-8 numpy==GPU; partial edge tiles 100x150, 257x383, 512x257 numpy==GPU==dask; predictor2 LE/BE round-trip uint8/int16/uint16/int32/uint32 pass; predictor3 LE/BE float32/64 pass. Deferred LOW (pre-existing, not opened): float16 (bps=16, SampleFormat=3) absent from tiff_dtype_to_numpy map - writer never emits, asymmetric but unreachable. | Pass 9 (2026-05-09): TWO HIGH fixed -- (a) PR #1539 closes #1537: TIFF Orientation tag 2/3/4 (mirror flips) on georeferenced files left y/x coords computed from the un-flipped transform, so xarray label lookups returned the wrong pixel even though _apply_orientation flipped the buffer. PR #1521 only updated the transform for the 5-8 axis-swap branch. Fix updates origin and pixel-scale signs along whichever axes were flipped, for both PixelIsArea (origin shifts by N*step) and PixelIsPoint (shifts by (N-1)*step). 10 new tests in test_orientation.py. (b) PR #1546 closes #1540: read_geotiff_gpu ignored Orientation tag completely; CPU correctly applied 2-8 (PR #1521) but GPU returned the raw stored buffer. Cross-backend disagreement on every non-default orientation. Fix adds _apply_orientation_gpu (cupy slicing mirror of the CPU helper) and _apply_orientation_geo_info, threads them into the tiled GPU pipeline, reuses CPU-fallback geo_info for the stripped path to avoid double-applying. 28 new tests in test_orientation_gpu.py (every orientation, single-band tiled, single-band stripped, 3-band tiled, mirror-flip sel-fidelity, default no-tag passthrough). Re-confirmed clean: HTTP coalesce_ranges with overlapping ranges and zero-length ranges, parallel streaming write thread-safety (each tile gets independent buffer via copy or padded zeros), planar=2 + chunky GPU LERC mask propagation matches CPU, IFD chain cap MAX_IFDS=256, max_z_error round-trip on tiled write, _resolve_masked_fill float vs integer dtype semantics. Deferred LOW: per-sample LERC mask (3D mask (h,w,samples)) collapsed to per-pixel ""any sample invalid"" on GPU while CPU honours per-sample; LERC implementations rarely emit 3D masks (verified: lerc.encode with 2D mask on 3-band returns 2D mask). Documented planar=2 + LERC + GPU silently drops mask (rare in practice, source comment acknowledges). | Pass 8 (2026-05-07): HIGH fixed in fix-jpeg-tiff-disable -- to_geotiff(compression='jpeg') wrote files that no external reader can decode. The writer tags compression=7 (new-style JPEG) but emits a self-contained JFIF stream per tile/strip and never writes the JPEGTables tag (347) that the TIFF spec requires for that codec. libtiff/GDAL/rasterio all reject the file with TIFFReadEncodedStrip() failed; our reader round-trips because Pillow decodes the standalone JFIF, hiding the break. Pass-4 notes flagged the read side of the same JPEGTables gap and deferred it; pass-8 covers the write side. Fix: reject compression='jpeg' at the to_geotiff entry with a clear ValueError pointing at deflate/zstd/lzw. The internal _writer.write is untouched so the existing self-decoding tests still cover the codec; re-enabling the public path needs a JPEGTables-aware encoder. PR diffs reviewed but not merged: #1512 (BytesIO source) and #1513 (LERC max_z_error) -- both look correct; #1512 file-like read path goes through read_all() once so the per-call BytesIOSource lock is theoretical, and #1513 forwards max_z_error through every overview/tile/strip/streaming path including _write_vrt_tiled and _compress_block. No regressions found in either open PR. Other surfaces audited clean: predictor=3 with float16 (writer auto-promotes to float32 on both eager and streaming paths, value-exact round-trip); planar=2 multi-tile read uses band_idx*tiles_per_band offset so no cross-contamination between planes; _header.py multi-byte tag parsing uses bo (byte_order) consistently; Pillow YCbCr-vs-tagged-RGB photometric mismatch becomes moot once JPEG is disabled. Deferred (LOW/MEDIUM, not filed): JPEG2000 writer accepts arbitrary dtype with no validation (rare codec, narrow risk); float16 dtype not in tiff_dtype_to_numpy decode map (writer never emits it - asymmetric but unreachable); Orientation tag (274) still ignored on read (pass-4 deferral). | Pass 7 (2026-05-07): HIGH fixed in fix-mmap-cache-refcount-after-replace -- _MmapCache.release() looked up the cache entry by realpath, so a holder that acquired the OLD mmap before an os.replace and released it AFTER another caller had acquired the post-replace entry would decrement the new holder's refcount. Subsequent eviction (cache full, or another acquire) closed the still-in-use mmap, breaking reads with 'mmap closed or invalid'. Real exposure: any concurrent reader/writer pattern where to_geotiff replaces a file that another reader had just opened via open_geotiff with chunks= or via _FileSource. PR #1506 added stale-replacement detection but did not fix the refcount confusion across the pop. Fix: acquire returns an opaque entry token; release takes the token and decrements that exact entry, regardless of cache state. Orphaned (popped) entries close their fh+mmap when their own refcount hits zero. _FileSource updated to pass the token. Regression test test_release_after_path_replacement_does_not_clobber_new_holder added. All 665 geotiff tests pass; GPU path verified. | Pass 6 (2026-05-07) PR #1507: BE pred2 numba TypingError. | Pass 5 (2026-05-06) PR #1506: mmap cache stale after file replace. | Pass 4 (2026-05-06) PR #1501: sparse COG tiles. | Pass 3 (2026-05-06) PR #1500: predictor=3 byte order. | Pass 2 (2026-05-05) PR #1498: predictor=2 sample-wise. | Pass 1 (2026-04-23) PR #1247. Re-confirmed clean over passes 2-7: items 2 (writer always emits LE TIFFs - hardcoded b'II'), 3 (RowsPerStrip default = height when missing), 4 (StripByteCounts missing raises clear ValueError), 5 (TileWidth without TileLength caught by 'tw <= 0 or th <= 0' check at _reader.py:688), 9 (read determinism on compressed+tiled+multiband), 11 (predictor=2 with awkward sample stride round-trips), 18 (compression_level=99 raises ValueError 'out of range for deflate (valid: 1-9)'), 21 (concurrent writes serialize correctly via mkstemp+os.replace), 24 (uint16 dtype preserved on numpy backend, dask honors chunks param), 26 (chunks rounds correctly with remainder chunk for non-tile-aligned). Deferred: item 8 (BytesIO/file-like sources are not supported, source.lower() error) - documented as 'str' parameter, not a bug; item 19 (LERC max_z_error not user-exposed by to_geotiff) - missing feature, not a bug."
 glcm,2026-05-01,1408,HIGH,2,"angle=None averaged NaN as 0, masking no-valid-pairs as zero texture; fixed via nanmean-style averaging"
 hillshade,2026-04-10T12:00:00Z,,,,"Horn's method correct. All backends consistent. NaN propagation correct. float32 adequate for [0,1] output."
 hydro,2026-04-30,,LOW,1,Only LOW: twi log(0)=-inf if fa=0 (out-of-contract); MFD weighted sum no Kahan (negligible). No CRIT/HIGH issues.
diff --git a/xrspatial/geotiff/_gpu_decode.py b/xrspatial/geotiff/_gpu_decode.py
index f496450e..77731441 100644
--- a/xrspatial/geotiff/_gpu_decode.py
+++ b/xrspatial/geotiff/_gpu_decode.py
@@ -2888,7 +2888,8 @@ def gpu_compress_tiles(d_image, tile_width, tile_height,
 # GPU overview (pyramid) generation
 # ---------------------------------------------------------------------------
 
-GPU_OVERVIEW_METHODS = ('mean', 'nearest', 'min', 'max', 'median', 'mode')
+GPU_OVERVIEW_METHODS = ('mean', 'nearest', 'min', 'max', 'median', 'mode',
+                        'cubic')
 
 
 def _block_reduce_2d_gpu(arr2d, method, nodata=None):
@@ -2919,6 +2920,17 @@ def _block_reduce_2d_gpu(arr2d, method, nodata=None):
         cpu_result = _block_reduce_2d(cpu_arr, 'mode', nodata=nodata)
         return cupy.asarray(cpu_result)
 
+    if method == 'cubic':
+        # No native cupy cubic resampler that handles arbitrary zoom
+        # factors with the same prefilter=False NaN-safety the CPU
+        # helper uses for issue #1623. Fall back to CPU so cubic on
+        # the GPU writer path produces the same overview bytes as the
+        # CPU writer and so the sentinel handling matches.
+        cpu_arr = arr2d.get()
+        from ._writer import _block_reduce_2d
+        cpu_result = _block_reduce_2d(cpu_arr, 'cubic', nodata=nodata)
+        return cupy.asarray(cpu_result)
+
     # Block reshape for mean/min/max/median
     if arr2d.dtype.kind == 'f':
         blocks = cropped.reshape(oh, 2, ow, 2)
@@ -2966,13 +2978,15 @@ def make_overview_gpu(arr, method='mean', nodata=None):
         2D or 3D (height, width, bands) array on GPU.
     method : str
         Resampling method: 'mean', 'nearest', 'min', 'max', 'median',
-        or 'mode'.
+        'mode', or 'cubic'. ``mode`` and ``cubic`` fall back to the CPU
+        implementation in :mod:`xrspatial.geotiff._writer` so the GPU
+        writer path produces the same overview bytes as the CPU writer.
     nodata : scalar or None
         When supplied and ``arr`` is a float dtype, cells equal to the
         sentinel are masked back to NaN before the reduction so the
         sentinel does not bias the result. Required for COG output that
-        sets ``nodata=...`` (issue #1613). Ignored for integer arrays
-        and for ``nearest`` / ``mode``.
+        sets ``nodata=...`` (issue #1613, extended to ``cubic`` in
+        issue #1623). Ignored for integer arrays and for ``nearest``.
 
     Returns
     -------
diff --git a/xrspatial/geotiff/_writer.py b/xrspatial/geotiff/_writer.py
index 85f5d8d0..0a46a9cb 100644
--- a/xrspatial/geotiff/_writer.py
+++ b/xrspatial/geotiff/_writer.py
@@ -145,11 +145,15 @@ def _block_reduce_2d(arr2d, method, nodata=None):
 
     When ``nodata`` is supplied and ``arr2d`` is a float dtype, cells that
     equal the sentinel are treated as NaN during the reduction so the
-    ``nan*`` aggregation routines correctly skip them. The reduced output
-    keeps NaN wherever every contributing input cell was the sentinel
-    (so callers can rewrite that NaN back to the sentinel after the
-    reduction). The sentinel is ignored entirely for integer dtypes and
-    for non-aggregation methods (``nearest``, ``mode``, ``cubic``).
+    ``nan*`` aggregation routines correctly skip them. For the nan-aware
+    aggregation methods, the reduced output keeps NaN wherever every
+    contributing input cell was the sentinel (so callers can rewrite
+    that NaN back to the sentinel after the reduction). The sentinel is
+    ignored entirely for integer dtypes and for ``nearest`` and ``mode``
+    methods. The ``cubic`` branch honours ``nodata`` by masking the
+    sentinel to NaN, running cubic with ``prefilter=False`` to keep the
+    kernel local, and rewriting any NaN in the output back to the
+    sentinel before returning (issue #1623).
     """
     h, w = arr2d.shape
     h2 = (h // 2) * 2
@@ -168,6 +172,42 @@ def _block_reduce_2d(arr2d, method, nodata=None):
             raise ImportError(
                 "scipy is required for cubic overview resampling. "
                 "Install it with: pip install scipy")
+        # When ``nodata`` is supplied on a float array, the writer has
+        # already rewritten NaN to the sentinel value upstream. Feeding
+        # that sentinel-poisoned array straight into ``zoom`` blends the
+        # sentinel into neighbouring cells and produces ringing
+        # artefacts near nodata borders (issue #1623, same root cause
+        # as #1613 but for the cubic branch).
+        #
+        # Mask the sentinel back to NaN before the spline so the
+        # interpolation does not treat it as signal, run cubic with
+        # ``prefilter=False`` so a single NaN does not poison the entire
+        # row/column (the default B-spline prefilter is global), then
+        # rewrite any NaN in the result back to the sentinel so the
+        # on-disk overview keeps the same convention as the
+        # full-resolution band. The ``prefilter=False`` switch only
+        # fires when a sentinel was actually found in the input, so the
+        # default cubic semantics still apply to inputs without nodata.
+        if (nodata is not None
+                and arr2d.dtype.kind == 'f'
+                and not np.isnan(nodata)):
+            try:
+                sentinel = arr2d.dtype.type(nodata)
+            except (OverflowError, ValueError):
+                sentinel = None
+            if sentinel is not None:
+                mask = arr2d == sentinel
+                if mask.any():
+                    masked = np.where(mask, np.float64('nan'), arr2d)
+                    with warnings.catch_warnings():
+                        warnings.simplefilter('ignore', RuntimeWarning)
+                        result = zoom(masked, 0.5, order=3,
+                                      prefilter=False)
+                    nan_mask = np.isnan(result)
+                    if nan_mask.any():
+                        result = result.copy()
+                        result[nan_mask] = float(nodata)
+                    return result.astype(arr2d.dtype)
         return zoom(arr2d, 0.5, order=3).astype(arr2d.dtype)
 
     if method == 'mode':
@@ -253,8 +293,9 @@ def _make_overview(arr: np.ndarray, method: str = 'mean',
         When supplied and ``arr`` is a float dtype, cells equal to the
         sentinel are masked back to NaN before the reduction so the
         sentinel does not bias the result. Required for COG output that
-        sets ``nodata=...`` (issue #1613). Ignored for integer arrays
-        and for ``nearest`` / ``mode`` / ``cubic`` methods.
+        sets ``nodata=...`` (issue #1613, extended to ``cubic`` in
+        issue #1623). Ignored for integer arrays and for ``nearest`` /
+        ``mode`` methods.
 
     Returns
     -------
diff --git a/xrspatial/geotiff/tests/test_cog_cubic_overview_nodata_1623.py b/xrspatial/geotiff/tests/test_cog_cubic_overview_nodata_1623.py
new file mode 100644
index 00000000..d5be06b5
--- /dev/null
+++ b/xrspatial/geotiff/tests/test_cog_cubic_overview_nodata_1623.py
@@ -0,0 +1,281 @@
+"""COG cubic overview respects the nodata sentinel (issue #1623).
+
+Before the fix, ``to_geotiff(..., cog=True, nodata=<finite>,
+overview_resampling='cubic')`` produced wrong overview pixels near
+nodata borders on float rasters. The writer rewrote NaN to the
+sentinel before reduction; ``_block_reduce_2d(method='cubic')`` then
+ignored ``nodata`` and handed the sentinel-poisoned array straight to
+``scipy.ndimage.zoom(order=3)``. The cubic spline blended the sentinel
+into neighbouring cells (values like ``1133`` and ``-10290`` appeared
+where the data was a constant 100).
+
+The fix masks the sentinel to NaN, runs cubic with
+``prefilter=False`` so a single NaN does not poison the entire
+row/column, and rewrites any NaN in the output back to the sentinel.
+The GPU helper falls back to the CPU cubic path the same way it does
+for ``mode``.
+
+These tests pin:
+
+* the helper produces no ringing near a sentinel border,
+* the round-trip through ``to_geotiff`` writes a clean overview,
+* the no-nodata cubic path is unchanged,
+* the GPU writer routes cubic through the CPU helper and produces
+  byte-identical overview tiles.
+"""
+from __future__ import annotations
+
+import importlib.util
+
+import numpy as np
+import pytest
+import xarray as xr
+
+
+def _gpu_available() -> bool:
+    if importlib.util.find_spec("cupy") is None:
+        return False
+    try:
+        import cupy
+        return bool(cupy.cuda.is_available())
+    except Exception:
+        return False
+
+
+_HAS_GPU = _gpu_available()
+_gpu_only = pytest.mark.skipif(
+    not _HAS_GPU,
+    reason="cupy + CUDA required",
+)
+
+
+def _flat_with_corner_nan(side: int = 16, nan_side: int = 4):
+    """``side x side`` float32 ones with a ``nan_side x nan_side`` NaN corner."""
+    arr = np.ones((side, side), dtype=np.float32) * 100.0
+    arr[:nan_side, :nan_side] = np.nan
+    return arr
+
+
+def test_block_reduce_cubic_nodata_helper_no_ringing():
+    """Helper: cubic with nodata must not leak the sentinel into neighbours."""
+    pytest.importorskip("scipy")
+    from xrspatial.geotiff._writer import _block_reduce_2d
+
+    # Mimic what to_geotiff does: rewrite NaN to the sentinel before
+    # handing the array to the reducer.
+    arr = _flat_with_corner_nan()
+    arr[np.isnan(arr)] = -9999.0
+
+    out = _block_reduce_2d(arr, 'cubic', nodata=-9999.0)
+
+    # The valid region must still read ~100.  Without the fix the cells
+    # adjacent to the sentinel corner returned values like 1196.28 and
+    # -19.00 from the cubic blend.
+    valid = out != -9999.0
+    assert np.all(np.abs(out[valid] - 100.0) < 1e-3), (
+        f"ringing leaked into cubic output: {out[valid]}")
+
+    # Sentinel cells still mark the nodata region.
+    assert (out == -9999.0).any()
+
+
+def test_block_reduce_cubic_nodata_poisoning_repro():
+    """Without the fix the sentinel poisoned the cubic output.
+
+    Pin the failure mode by running cubic on the same array *without*
+    a nodata argument and confirming the documented buggy values
+    appear. This guards against a regression where ``nodata`` silently
+    stops being honoured.
+    """
+    pytest.importorskip("scipy")
+    from xrspatial.geotiff._writer import _block_reduce_2d
+
+    arr = _flat_with_corner_nan()
+    arr[np.isnan(arr)] = -9999.0
+
+    # nodata=None reproduces the pre-fix behaviour.
+    poisoned = _block_reduce_2d(arr, 'cubic')
+    # At least one cell outside the corner has a wildly wrong value.
+    valid_no_sentinel = (poisoned != -9999.0)
+    drift = np.abs(poisoned[valid_no_sentinel] - 100.0)
+    assert drift.max() > 50.0, (
+        "expected the no-nodata cubic path to ring; got a clean output "
+        f"with max drift {drift.max()}")
+
+
+def test_block_reduce_cubic_no_nodata_unchanged():
+    """Cubic on data without nodata stays at order=3 with prefilter."""
+    pytest.importorskip("scipy")
+    from xrspatial.geotiff._writer import _block_reduce_2d
+
+    arr = np.arange(256, dtype=np.float32).reshape(16, 16)
+    out_default = _block_reduce_2d(arr, 'cubic')
+    # The same array round-tripped through scipy zoom directly should
+    # match (since no sentinel is present the fix path is not taken).
+    from scipy.ndimage import zoom
+    expected = zoom(arr, 0.5, order=3).astype(arr.dtype)
+    np.testing.assert_array_equal(out_default, expected)
+
+
+def test_block_reduce_cubic_nodata_unset_is_zoom():
+    """nodata=None goes through the original zoom path, no prefilter change."""
+    pytest.importorskip("scipy")
+    from xrspatial.geotiff._writer import _block_reduce_2d
+    from scipy.ndimage import zoom
+
+    arr = np.linspace(0.0, 1.0, 64, dtype=np.float32).reshape(8, 8)
+    out = _block_reduce_2d(arr, 'cubic', nodata=None)
+    expected = zoom(arr, 0.5, order=3).astype(arr.dtype)
+    np.testing.assert_array_equal(out, expected)
+
+
+def test_to_geotiff_cog_cubic_nodata_round_trip(tmp_path):
+    """End-to-end: writing a COG with cubic + nodata produces a clean overview."""
+    pytest.importorskip("scipy")
+    from xrspatial.geotiff import to_geotiff, open_geotiff
+
+    arr = _flat_with_corner_nan()
+    da = xr.DataArray(arr, dims=['y', 'x'])
+    p = str(tmp_path / 'cog_cubic_nodata.tif')
+    to_geotiff(da, p, nodata=-9999.0, cog=True, compression='deflate',
+               tiled=True, tile_size=8, overview_levels=[1],
+               overview_resampling='cubic')
+
+    ov = open_geotiff(p, overview_level=1)
+    data = np.asarray(ov.data)
+
+    # No polluted pixels: every cell is either NaN (reader unmasked the
+    # sentinel back to NaN), the literal sentinel value (reader kept it),
+    # or ~100 (the source value).
+    polluted = (
+        (~np.isnan(data))
+        & (data != -9999.0)
+        & (np.abs(data - 100.0) > 1e-3)
+    )
+    assert not polluted.any(), (
+        f"polluted overview cells: {data[polluted]}")
+
+
+def test_to_geotiff_cog_cubic_no_nodata_round_trip(tmp_path):
+    """Regression guard: cubic without nodata still produces the same overview."""
+    pytest.importorskip("scipy")
+    from xrspatial.geotiff import to_geotiff, open_geotiff
+
+    arr = np.arange(256, dtype=np.float32).reshape(16, 16)
+    da = xr.DataArray(arr, dims=['y', 'x'])
+    p = str(tmp_path / 'cog_cubic_no_nodata.tif')
+    to_geotiff(da, p, cog=True, compression='deflate',
+               tiled=True, tile_size=8, overview_levels=[1],
+               overview_resampling='cubic')
+
+    ov = open_geotiff(p, overview_level=1)
+    assert ov.shape == (8, 8)
+    assert ov.dtype == np.float32
+    # Cubic on a monotonic ramp stays bounded by source range.
+    assert float(np.asarray(ov.data).min()) >= float(arr.min()) - 1.0
+    assert float(np.asarray(ov.data).max()) <= float(arr.max()) + 1.0
+
+
+def test_block_reduce_cubic_inf_nodata_is_masked():
+    """nodata=+/-inf must be masked just like a finite sentinel."""
+    pytest.importorskip("scipy")
+    from xrspatial.geotiff._writer import _block_reduce_2d
+
+    arr = np.ones((16, 16), dtype=np.float32) * 5.0
+    arr[:4, :4] = np.inf  # treat +inf as sentinel
+    out = _block_reduce_2d(arr, 'cubic', nodata=np.inf)
+    valid = ~np.isinf(out)
+    # Outside the masked region we still read ~5.0.
+    np.testing.assert_allclose(out[valid], 5.0, atol=1e-4)
+
+
+def test_block_reduce_cubic_nan_sentinel_skips_mask():
+    """nodata=NaN is a no-op (matches the existing nan-pass-through gate)."""
+    pytest.importorskip("scipy")
+    from xrspatial.geotiff._writer import _block_reduce_2d
+    from scipy.ndimage import zoom
+
+    arr = np.linspace(0.0, 1.0, 64, dtype=np.float32).reshape(8, 8)
+    out = _block_reduce_2d(arr, 'cubic', nodata=np.nan)
+    expected = zoom(arr, 0.5, order=3).astype(arr.dtype)
+    np.testing.assert_array_equal(out, expected)
+
+
+def test_gpu_overview_methods_includes_cubic():
+    """The GPU constant must list ``cubic`` so callers do not pre-validate
+    against the smaller pre-#1623 set."""
+    from xrspatial.geotiff._gpu_decode import GPU_OVERVIEW_METHODS
+    assert 'cubic' in GPU_OVERVIEW_METHODS
+
+
+@_gpu_only
+def test_gpu_block_reduce_cubic_falls_back_to_cpu():
+    """GPU cubic must route through the CPU helper and return cupy data."""
+    pytest.importorskip("scipy")
+    import cupy
+    from xrspatial.geotiff._gpu_decode import _block_reduce_2d_gpu
+    from xrspatial.geotiff._writer import _block_reduce_2d
+
+    arr = _flat_with_corner_nan()
+    arr[np.isnan(arr)] = -9999.0
+
+    gpu_arr = cupy.asarray(arr)
+    gpu_out = _block_reduce_2d_gpu(gpu_arr, 'cubic', nodata=-9999.0)
+    assert isinstance(gpu_out, cupy.ndarray)
+
+    cpu_out = _block_reduce_2d(arr, 'cubic', nodata=-9999.0)
+    np.testing.assert_array_equal(cupy.asnumpy(gpu_out), cpu_out)
+
+
+@_gpu_only
+def test_to_geotiff_cog_cubic_nodata_gpu_round_trip(tmp_path):
+    """End-to-end GPU writer: cubic + nodata produces a clean overview."""
+    pytest.importorskip("scipy")
+    import cupy
+    from xrspatial.geotiff import to_geotiff, open_geotiff
+
+    arr = _flat_with_corner_nan()
+    da = xr.DataArray(cupy.asarray(arr), dims=['y', 'x'])
+    p = str(tmp_path / 'cog_cubic_nodata_gpu.tif')
+    to_geotiff(da, p, nodata=-9999.0, cog=True, compression='deflate',
+               tiled=True, tile_size=8, overview_levels=[1],
+               overview_resampling='cubic')
+
+    ov = open_geotiff(p, overview_level=1)
+    data = np.asarray(ov.data)
+    polluted = (
+        (~np.isnan(data))
+        & (data != -9999.0)
+        & (np.abs(data - 100.0) > 1e-3)
+    )
+    assert not polluted.any(), (
+        f"GPU cubic overview leaked sentinel into neighbours: "
+        f"{data[polluted]}")
+
+
+@_gpu_only
+def test_gpu_cpu_cubic_overview_bytes_match(tmp_path):
+    """CPU and GPU writers produce the same cubic overview pixels."""
+    pytest.importorskip("scipy")
+    import cupy
+    from xrspatial.geotiff import to_geotiff, open_geotiff
+
+    arr = _flat_with_corner_nan()
+    cpu_da = xr.DataArray(arr, dims=['y', 'x'])
+    gpu_da = xr.DataArray(cupy.asarray(arr), dims=['y', 'x'])
+
+    cpu_path = str(tmp_path / 'cpu_cubic.tif')
+    gpu_path = str(tmp_path / 'gpu_cubic.tif')
+    to_geotiff(cpu_da, cpu_path, nodata=-9999.0, cog=True,
+               compression='deflate', tiled=True, tile_size=8,
+               overview_levels=[1], overview_resampling='cubic')
+    to_geotiff(gpu_da, gpu_path, nodata=-9999.0, cog=True,
+               compression='deflate', tiled=True, tile_size=8,
+               overview_levels=[1], overview_resampling='cubic')
+
+    cpu_ov = np.asarray(open_geotiff(cpu_path, overview_level=1).data)
+    gpu_ov = np.asarray(open_geotiff(gpu_path, overview_level=1).data)
+    # NaN-aware compare since the reader unmasks the sentinel.
+    np.testing.assert_array_equal(np.isnan(cpu_ov), np.isnan(gpu_ov))
+    finite = ~np.isnan(cpu_ov)
+    np.testing.assert_allclose(cpu_ov[finite], gpu_ov[finite], atol=1e-3)