diff --git a/.claude/sweep-accuracy-state.csv b/.claude/sweep-accuracy-state.csv index 2d962d4ba..47f627437 100644 --- a/.claude/sweep-accuracy-state.csv +++ b/.claude/sweep-accuracy-state.csv @@ -12,7 +12,7 @@ emerging_hotspots,2026-04-30,,MEDIUM,2;3,MEDIUM: threshold_90 uses int() (trunca fire,2026-04-30,,,,All ops per-pixel (no accumulation/stencil/projected distance). NaN handled via x!=x; CUDA bounds use strict <; rdnbr and ros divisions guarded; CPU/GPU/dask paths algorithmically identical. No accuracy issues found. flood,2026-04-30,,MEDIUM,2;5,"MEDIUM (not fixed): dask backend preserves float32 input dtype while numpy promotes to float64 in flood_depth and curve_number_runoff; DataArray inputs for curve_number, mannings_n bypass scalar > 0 (and CN <= 100) range validation, silently producing NaN/garbage." focal,2026-03-30T13:00:00Z,1092,,, -geotiff,2026-05-11,1634,MEDIUM,3;5,"Pass 17 (2026-05-11): MEDIUM fixed -- issue #1634. open_geotiff eager path windowed read produced confusing CoordinateValidationError when window extended past source extent. read_to_array clamped the window internally and returned a smaller array, but the eager code path used unclamped window indices for y/x coord generation (xrspatial/geotiff/__init__.py lines 562-572), so the coord array length differed from the data and xarray refused to construct the DataArray. Same bug affected the windowed transform shift in _populate_attrs_from_geo_info. The dask path (read_geotiff_dask) already validated up front since #1561, raising a clear ValueError with the format 'window=... is outside the source extent (HxW) or has non-positive size.' so the two backends diverged on the contract. Fix: validate the window up front in open_geotiff's eager branch via _read_geo_info (metadata-only read, no extra pixel cost) using the exact same condition the dask path uses, raising the same ValueError message format. Reproduction: 10x10 raster + window=(5,5,15,15) on eager raised CoordinateValidationError('conflicting sizes ... length 5 ... length 10'); now raises ValueError('window=(5, 5, 15, 15) is outside the source extent (10x10) or has non-positive size.'). Categories: Cat 3 (off-by-one / boundary handling) + Cat 5 (backend inconsistency). 12 regression tests in test_window_out_of_bounds_1634.py: negative start, past-right-edge, past-bottom-edge, past-both-edges, zero-size, inverted window, full-extent ok, interior subset, edge-aligned, eager-vs-dask parity, message-format parity, issue reproducer. All 1286 existing non-network geotiff tests still pass. | Pass 16 (2026-05-11): HIGH fixed -- issue #1623. to_geotiff(cog=True, overview_resampling='cubic', nodata=) on a float raster with NaN regions produced overview pixels with severe ringing artefacts near nodata borders. Same class of bug as #1613 but for the cubic branch: writer rewrites NaN to the sentinel upstream, then _block_reduce_2d(method=cubic) handed the sentinel-poisoned array straight to scipy.ndimage.zoom(order=3). The cubic spline blended the sentinel (e.g. -9999) into neighbouring cells, producing values like 1133.44, -10290.08 where the data was a constant 100. Repro on 16x16 float32 with a 4x4 NaN corner showed 18 polluted pixels in the 8x8 overview. Fix: when nodata is supplied on a float dtype and the sentinel is found, mask sentinel to NaN, run cubic with prefilter=False so a single NaN cannot poison the entire row/column (default B-spline prefilter is global), then rewrite any NaN in the result back to the sentinel. prefilter=False only fires when a sentinel is present so the non-nodata cubic semantics are unchanged. GPU side: _block_reduce_2d_gpu previously raised on method='cubic'; added a CPU fallback (same pattern as 'mode') so GPU writer produces byte-equivalent overviews. GPU_OVERVIEW_METHODS now includes 'cubic'. 12 regression tests in test_cog_cubic_overview_nodata_1623.py (helper no-ringing, poisoning repro, no-nodata unchanged, end-to-end round-trip, GPU fallback, CPU/GPU byte-match, +/-inf nodata mask, NaN-sentinel no-op, GPU_OVERVIEW_METHODS contract). All 1256 existing geotiff tests still pass (3 pre-existing matplotlib failures unrelated). | Pass 15 (2026-05-11): HIGH fixed -- issue #1613. to_geotiff(cog=True, nodata=) on a float raster with NaN produced a corrupted overview pyramid. The NaN-to-sentinel rewrite in __init__.py:1202 (CPU) and :2852 (GPU write_geotiff_gpu) ran BEFORE _make_overview / make_overview_gpu, so the nan-aware aggregations (np.nanmean/min/max/median, cupy.nanmean/min/max/median) saw the sentinel as a real number and biased every overview pixel. Reproduction with -9999 sentinel produced [[-4998.75,-4997.75],..] where np.nanmean gives [[1.5,3.5],..]. Both CPU and GPU paths affected; backend results matched each other but were both wrong (CAT 2 NaN propagation + CAT 5 documents the parity). Fix: _block_reduce_2d / _block_reduce_2d_gpu accept a nodata kwarg that masks the sentinel back to NaN for float dtypes before the reduction; the writer's overview loop passes nodata in, then rewrites all-sentinel reductions (which surface as NaN from the reducer) back to the sentinel for the on-disk pyramid. 11 regression tests in test_cog_overview_nodata_1613.py (CPU mean / partial-block / min/max/median / no-nodata passthrough / helper kwarg / all-sentinel block / GPU mean / GPU helper / CPU-GPU agreement). All 235 nodata/overview/cog tests still pass. | Pass 14 (2026-05-11): HIGH fixed -- issue #1611. read_vrt(band=None) on a multi-band integer VRT with per-band tags only masks band 0's sentinel. __init__.py lines 2795-2809 in read_vrt apply vrt.bands[0].nodata to the full ndim==3 array; bands 1+ keep their integer sentinels as literal finite values (e.g. 65000 surfaces as 65000.0 after the dtype=float64 cast, not NaN). Float-VRT path masks per-band correctly in _vrt._read_data lines 296-297 + 347-351. PR #1602 fixed the single-band band=N case for issue #1598; the band=None multi-band case is the same class of bug. Repro: 2-band uint16 VRT with NoDataValue 65535 / 65000 returns r.values[1,1,1] == 65000.0 instead of NaN; r.values[1,1,0] is NaN (band 0 sentinel masked). Fix scope: in read_vrt, when band is None, iterate over vrt.bands and mask each arr[..., i] slice against its own (gated by the same _int_nodata_in_range guard PR #1583 introduced). Severity HIGH (Cat 2 NaN propagation + Cat 5 backend inconsistency: identical input semantics produce different masking outcomes based on dtype, with finite garbage values where NaN expected). Fix in PR #1612: walks vrt.bands when band is None and ndim==3, masks each arr[..., i] slice against its own via the refactored _sentinel_for_dtype helper (reuses PR #1583's range guard so out-of-range/non-finite/fractional sentinels are a no-op). attrs['nodata'] still carries band 0's sentinel for band=None reads (documented contract). 7 regression tests in test_vrt_multiband_int_nodata_1611.py: uint16 per-band, int32 negative, mixed presence, dtype preservation when no sentinel hit, out-of-range gating, band=N non-regression, attrs contract. 135 existing vrt/nodata geotiff tests still pass. | Pass 13 (2026-05-11): HIGH fixed -- issue #1599. write_geotiff_gpu (and to_geotiff gpu=True) emitted raw NaN bytes for missing pixels even when nodata= was supplied, while the CPU writer substituted NaN with the sentinel before encoding. xrspatial-only round-trips were unaffected (the reader masks both NaN and the sentinel), but external readers (rasterio/GDAL/QGIS) that mask only on the GDAL_NODATA tag saw NaN pixels as valid data -- rasterio reported 100% valid pixels on a 25-NaN file vs CPU's 25-invalid report. Root cause: __init__.py lines 2579-2587 jumped from shape/dtype resolution straight to compression, missing the equivalent of the CPU writer's NaN-to-sentinel rewrite at to_geotiff line ~1156. Fix: cupy.isnan + masked write on a defensive copy of arr, gated on np_dtype.kind=='f' and not np.isnan(float(nodata)). Caller's CuPy buffer preserved (copy before mutate). 7 regression tests in test_gpu_writer_nan_sentinel_1599.py: substitution lands as sentinel, CPU/GPU byte-equivalent, caller buffer not mutated, no-NaN no-op, NaN sentinel skips substitution, rasterio sees identical invalid count on CPU/GPU, multiband 3D path. All other GPU writer tests still pass (50 passed across band-first, attrs, nodata, dask+cupy, writer, nodata aliases). | Pass 12 (2026-05-11): HIGH fixed -- issue #1581. Reading a uint TIFF with a negative GDAL_NODATA sentinel (e.g. uint16 + -9999) raised OverflowError on every backend because the nodata-mask code did arr.dtype.type(int(nodata)) with no range check. Three identical cast sites in __init__.py (numpy eager, _apply_nodata_mask_gpu, _delayed_read_window) plus _resolve_masked_fill and _sparse_fill_value in _reader.py. Fix: _int_nodata_in_range helper gates the cast; out-of-range sentinels are a no-op for value matching (the file can never contain that value), file dtype is preserved, attrs['nodata'] still surfaces the original sentinel so write round-trips keep the GDAL_NODATA tag intact. Matches rasterio behavior. 8 regression tests in test_nodata_out_of_range_1581.py cover the helper, both eager and dask read paths, in-range sentinel non-regression, and GPU helper (cupy-gated). | Pass 11 (2026-05-10): CLEAN. Audited the one additional commit since pass 10 -- #1559 (PR 1548, Centralise GeoTIFF attrs population across all read backends). Refactor extracts _populate_attrs_from_geo_info helper and routes eager numpy, dask, GPU stripped, GPU tiled read paths through it; before the fix dask only emitted crs/transform/raster_type/nodata while numpy emitted the full attrs set including x/y_resolution, resolution_unit, image_description, extra_samples, GDAL metadata, and the CRS-description fields. No data-path arithmetic touched; only attrs dict population. Windowed origin math (origin_x + c0*pixel_width, origin_y + r0*pixel_height) verified to produce -98.0 / 48.75 origin for window=(10,20,50,70) on a (0.1,-0.125) pixel-size raster, with PixelIsArea half-pixel offset preserved on coord lookups (-97.95, 48.6875). Cross-backend attrs parity re-verified: numpy/dask/cupy all emit identical key set on deflate+predictor3+nodata round-trip (crs, crs_wkt, nodata, transform, x_resolution, y_resolution). Data bit-parity re-verified across numpy/dask/cupy on same payload (np.array_equal with equal_nan=True). test_attrs_parity_1548.py (5 tests), test_reader.py/test_writer.py/test_dask_cupy_combined.py (25 tests), GPU orientation/predictor2-BE/LERC-mask/nodata/byteswap suites (65 tests) all green. No accuracy or backend-divergence findings. | Pass 10 (2026-05-10): CLEAN. Audited 5 recent commits: #1558 drop-defensive-copies (frombuffer path still .copy()s before in-place predictor decode at _reader.py:778), #1556 fp-predictor ngjit (writer pre-ravels so 1-D slice arg is correct, float32/64 LE+BE bit-exact), #1552 batched D2H (OOM guard fires before cupy.concatenate, host_buf offsets correct), #1551 parallel-decode gate (>= vs > sends 256x256 default to parallel path, no value diff confirmed via partial-tile parity), #1549 nvjpeg constants (gray + RGB GPU JPEG decode pixel-identical to Pillow CPU, max diff = 0). Cross-backend parity re-verified clean: numpy/dask+numpy/cupy/dask+cupy equal .data/.dtype/.coords/nodata/NaN-mask on deflate+predictor3+nodata; orientations 1-8 numpy==GPU; partial edge tiles 100x150, 257x383, 512x257 numpy==GPU==dask; predictor2 LE/BE round-trip uint8/int16/uint16/int32/uint32 pass; predictor3 LE/BE float32/64 pass. Deferred LOW (pre-existing, not opened): float16 (bps=16, SampleFormat=3) absent from tiff_dtype_to_numpy map - writer never emits, asymmetric but unreachable. | Pass 9 (2026-05-09): TWO HIGH fixed -- (a) PR #1539 closes #1537: TIFF Orientation tag 2/3/4 (mirror flips) on georeferenced files left y/x coords computed from the un-flipped transform, so xarray label lookups returned the wrong pixel even though _apply_orientation flipped the buffer. PR #1521 only updated the transform for the 5-8 axis-swap branch. Fix updates origin and pixel-scale signs along whichever axes were flipped, for both PixelIsArea (origin shifts by N*step) and PixelIsPoint (shifts by (N-1)*step). 10 new tests in test_orientation.py. (b) PR #1546 closes #1540: read_geotiff_gpu ignored Orientation tag completely; CPU correctly applied 2-8 (PR #1521) but GPU returned the raw stored buffer. Cross-backend disagreement on every non-default orientation. Fix adds _apply_orientation_gpu (cupy slicing mirror of the CPU helper) and _apply_orientation_geo_info, threads them into the tiled GPU pipeline, reuses CPU-fallback geo_info for the stripped path to avoid double-applying. 28 new tests in test_orientation_gpu.py (every orientation, single-band tiled, single-band stripped, 3-band tiled, mirror-flip sel-fidelity, default no-tag passthrough). Re-confirmed clean: HTTP coalesce_ranges with overlapping ranges and zero-length ranges, parallel streaming write thread-safety (each tile gets independent buffer via copy or padded zeros), planar=2 + chunky GPU LERC mask propagation matches CPU, IFD chain cap MAX_IFDS=256, max_z_error round-trip on tiled write, _resolve_masked_fill float vs integer dtype semantics. Deferred LOW: per-sample LERC mask (3D mask (h,w,samples)) collapsed to per-pixel ""any sample invalid"" on GPU while CPU honours per-sample; LERC implementations rarely emit 3D masks (verified: lerc.encode with 2D mask on 3-band returns 2D mask). Documented planar=2 + LERC + GPU silently drops mask (rare in practice, source comment acknowledges). | Pass 8 (2026-05-07): HIGH fixed in fix-jpeg-tiff-disable -- to_geotiff(compression='jpeg') wrote files that no external reader can decode. The writer tags compression=7 (new-style JPEG) but emits a self-contained JFIF stream per tile/strip and never writes the JPEGTables tag (347) that the TIFF spec requires for that codec. libtiff/GDAL/rasterio all reject the file with TIFFReadEncodedStrip() failed; our reader round-trips because Pillow decodes the standalone JFIF, hiding the break. Pass-4 notes flagged the read side of the same JPEGTables gap and deferred it; pass-8 covers the write side. Fix: reject compression='jpeg' at the to_geotiff entry with a clear ValueError pointing at deflate/zstd/lzw. The internal _writer.write is untouched so the existing self-decoding tests still cover the codec; re-enabling the public path needs a JPEGTables-aware encoder. PR diffs reviewed but not merged: #1512 (BytesIO source) and #1513 (LERC max_z_error) -- both look correct; #1512 file-like read path goes through read_all() once so the per-call BytesIOSource lock is theoretical, and #1513 forwards max_z_error through every overview/tile/strip/streaming path including _write_vrt_tiled and _compress_block. No regressions found in either open PR. Other surfaces audited clean: predictor=3 with float16 (writer auto-promotes to float32 on both eager and streaming paths, value-exact round-trip); planar=2 multi-tile read uses band_idx*tiles_per_band offset so no cross-contamination between planes; _header.py multi-byte tag parsing uses bo (byte_order) consistently; Pillow YCbCr-vs-tagged-RGB photometric mismatch becomes moot once JPEG is disabled. Deferred (LOW/MEDIUM, not filed): JPEG2000 writer accepts arbitrary dtype with no validation (rare codec, narrow risk); float16 dtype not in tiff_dtype_to_numpy decode map (writer never emits it - asymmetric but unreachable); Orientation tag (274) still ignored on read (pass-4 deferral). | Pass 7 (2026-05-07): HIGH fixed in fix-mmap-cache-refcount-after-replace -- _MmapCache.release() looked up the cache entry by realpath, so a holder that acquired the OLD mmap before an os.replace and released it AFTER another caller had acquired the post-replace entry would decrement the new holder's refcount. Subsequent eviction (cache full, or another acquire) closed the still-in-use mmap, breaking reads with 'mmap closed or invalid'. Real exposure: any concurrent reader/writer pattern where to_geotiff replaces a file that another reader had just opened via open_geotiff with chunks= or via _FileSource. PR #1506 added stale-replacement detection but did not fix the refcount confusion across the pop. Fix: acquire returns an opaque entry token; release takes the token and decrements that exact entry, regardless of cache state. Orphaned (popped) entries close their fh+mmap when their own refcount hits zero. _FileSource updated to pass the token. Regression test test_release_after_path_replacement_does_not_clobber_new_holder added. All 665 geotiff tests pass; GPU path verified. | Pass 6 (2026-05-07) PR #1507: BE pred2 numba TypingError. | Pass 5 (2026-05-06) PR #1506: mmap cache stale after file replace. | Pass 4 (2026-05-06) PR #1501: sparse COG tiles. | Pass 3 (2026-05-06) PR #1500: predictor=3 byte order. | Pass 2 (2026-05-05) PR #1498: predictor=2 sample-wise. | Pass 1 (2026-04-23) PR #1247. Re-confirmed clean over passes 2-7: items 2 (writer always emits LE TIFFs - hardcoded b'II'), 3 (RowsPerStrip default = height when missing), 4 (StripByteCounts missing raises clear ValueError), 5 (TileWidth without TileLength caught by 'tw <= 0 or th <= 0' check at _reader.py:688), 9 (read determinism on compressed+tiled+multiband), 11 (predictor=2 with awkward sample stride round-trips), 18 (compression_level=99 raises ValueError 'out of range for deflate (valid: 1-9)'), 21 (concurrent writes serialize correctly via mkstemp+os.replace), 24 (uint16 dtype preserved on numpy backend, dask honors chunks param), 26 (chunks rounds correctly with remainder chunk for non-tile-aligned). Deferred: item 8 (BytesIO/file-like sources are not supported, source.lower() error) - documented as 'str' parameter, not a bug; item 19 (LERC max_z_error not user-exposed by to_geotiff) - missing feature, not a bug." +geotiff,2026-05-11,1642,MEDIUM,3;5,"Pass 18 (2026-05-11): MEDIUM fixed -- issue #1642. PR #1641 (issue #1640) inherited level-0 georef on overview reads but kept the level-0 origin_x/origin_y unchanged. That is correct for PixelIsArea (origin = upper-left corner of pixel (0,0)) but wrong for PixelIsPoint (origin = center of pixel (0,0), GeoKey 1025 = 2). For a 1024x1024 PixelIsPoint COG with 10 m pixels and origin (0, 0), open_geotiff(overview_level=1) returned x[:3]=[0,20,40] instead of [5,25,45] (level-1 pixel 0 covers level-0 pixels 0-1 whose centers are 0 and 10, centroid 5); same for y. Downstream sel/interp/reproject silently snaps to the wrong pixel for any DEM-style PixelIsPoint COG (USGS, OpenTopography, Copernicus DEM). Categories: Cat 3 (off-by-one / boundary handling) + Cat 5 (raster_type-dependent backend convention). Fix: in extract_geo_info_with_overview_inheritance (_geotags.py), pick the effective raster_type first (overview-declared if non-default, otherwise inherited from parent), then when it is PixelIsPoint apply origin_shift = (scale - 1) * 0.5 * pixel_size_lvl0 along each axis before building the new GeoTransform. PixelIsArea path is byte-equivalent. 13 regression tests in test_overview_pixel_is_point_1642.py: centroid identity across all 4 backends, transform tuple across all 4 backends, uniform grid step, unit-level helper tests for both raster_types via stubbed extract_geo_info, own-geokeys-not-clobbered path on PixelIsPoint, and a PixelIsArea regression check. All 1397 existing non-network geotiff tests still pass (3 pre-existing matplotlib palette failures unrelated). Deferred LOW: non-power-of-two overview dimensions cause scale = base_w/ov_w to diverge from the true 2^level reduction (writer drops the right/bottom strip via h2=(h//2)*2; for h=1023 a level-1 overview has 511 rows so scale=2.0019 not 2.0). Fix would need to either (a) emit explicit geo tags on overview IFDs from the writer or (b) pass the level number into the inheritance helper; neither is a one-line change and the resulting coord error is sub-pixel of level 0. | Pass 17 (2026-05-11): MEDIUM fixed -- issue #1634. open_geotiff eager path windowed read produced confusing CoordinateValidationError when window extended past source extent. read_to_array clamped the window internally and returned a smaller array, but the eager code path used unclamped window indices for y/x coord generation (xrspatial/geotiff/__init__.py lines 562-572), so the coord array length differed from the data and xarray refused to construct the DataArray. Same bug affected the windowed transform shift in _populate_attrs_from_geo_info. The dask path (read_geotiff_dask) already validated up front since #1561, raising a clear ValueError with the format 'window=... is outside the source extent (HxW) or has non-positive size.' so the two backends diverged on the contract. Fix: validate the window up front in open_geotiff's eager branch via _read_geo_info (metadata-only read, no extra pixel cost) using the exact same condition the dask path uses, raising the same ValueError message format. Reproduction: 10x10 raster + window=(5,5,15,15) on eager raised CoordinateValidationError('conflicting sizes ... length 5 ... length 10'); now raises ValueError('window=(5, 5, 15, 15) is outside the source extent (10x10) or has non-positive size.'). Categories: Cat 3 (off-by-one / boundary handling) + Cat 5 (backend inconsistency). 12 regression tests in test_window_out_of_bounds_1634.py: negative start, past-right-edge, past-bottom-edge, past-both-edges, zero-size, inverted window, full-extent ok, interior subset, edge-aligned, eager-vs-dask parity, message-format parity, issue reproducer. All 1286 existing non-network geotiff tests still pass. | Pass 16 (2026-05-11): HIGH fixed -- issue #1623. to_geotiff(cog=True, overview_resampling='cubic', nodata=) on a float raster with NaN regions produced overview pixels with severe ringing artefacts near nodata borders. Same class of bug as #1613 but for the cubic branch: writer rewrites NaN to the sentinel upstream, then _block_reduce_2d(method=cubic) handed the sentinel-poisoned array straight to scipy.ndimage.zoom(order=3). The cubic spline blended the sentinel (e.g. -9999) into neighbouring cells, producing values like 1133.44, -10290.08 where the data was a constant 100. Repro on 16x16 float32 with a 4x4 NaN corner showed 18 polluted pixels in the 8x8 overview. Fix: when nodata is supplied on a float dtype and the sentinel is found, mask sentinel to NaN, run cubic with prefilter=False so a single NaN cannot poison the entire row/column (default B-spline prefilter is global), then rewrite any NaN in the result back to the sentinel. prefilter=False only fires when a sentinel is present so the non-nodata cubic semantics are unchanged. GPU side: _block_reduce_2d_gpu previously raised on method='cubic'; added a CPU fallback (same pattern as 'mode') so GPU writer produces byte-equivalent overviews. GPU_OVERVIEW_METHODS now includes 'cubic'. 12 regression tests in test_cog_cubic_overview_nodata_1623.py (helper no-ringing, poisoning repro, no-nodata unchanged, end-to-end round-trip, GPU fallback, CPU/GPU byte-match, +/-inf nodata mask, NaN-sentinel no-op, GPU_OVERVIEW_METHODS contract). All 1256 existing geotiff tests still pass (3 pre-existing matplotlib failures unrelated). | Pass 15 (2026-05-11): HIGH fixed -- issue #1613. to_geotiff(cog=True, nodata=) on a float raster with NaN produced a corrupted overview pyramid. The NaN-to-sentinel rewrite in __init__.py:1202 (CPU) and :2852 (GPU write_geotiff_gpu) ran BEFORE _make_overview / make_overview_gpu, so the nan-aware aggregations (np.nanmean/min/max/median, cupy.nanmean/min/max/median) saw the sentinel as a real number and biased every overview pixel. Reproduction with -9999 sentinel produced [[-4998.75,-4997.75],..] where np.nanmean gives [[1.5,3.5],..]. Both CPU and GPU paths affected; backend results matched each other but were both wrong (CAT 2 NaN propagation + CAT 5 documents the parity). Fix: _block_reduce_2d / _block_reduce_2d_gpu accept a nodata kwarg that masks the sentinel back to NaN for float dtypes before the reduction; the writer's overview loop passes nodata in, then rewrites all-sentinel reductions (which surface as NaN from the reducer) back to the sentinel for the on-disk pyramid. 11 regression tests in test_cog_overview_nodata_1613.py (CPU mean / partial-block / min/max/median / no-nodata passthrough / helper kwarg / all-sentinel block / GPU mean / GPU helper / CPU-GPU agreement). All 235 nodata/overview/cog tests still pass. | Pass 14 (2026-05-11): HIGH fixed -- issue #1611. read_vrt(band=None) on a multi-band integer VRT with per-band tags only masks band 0's sentinel. __init__.py lines 2795-2809 in read_vrt apply vrt.bands[0].nodata to the full ndim==3 array; bands 1+ keep their integer sentinels as literal finite values (e.g. 65000 surfaces as 65000.0 after the dtype=float64 cast, not NaN). Float-VRT path masks per-band correctly in _vrt._read_data lines 296-297 + 347-351. PR #1602 fixed the single-band band=N case for issue #1598; the band=None multi-band case is the same class of bug. Repro: 2-band uint16 VRT with NoDataValue 65535 / 65000 returns r.values[1,1,1] == 65000.0 instead of NaN; r.values[1,1,0] is NaN (band 0 sentinel masked). Fix scope: in read_vrt, when band is None, iterate over vrt.bands and mask each arr[..., i] slice against its own (gated by the same _int_nodata_in_range guard PR #1583 introduced). Severity HIGH (Cat 2 NaN propagation + Cat 5 backend inconsistency: identical input semantics produce different masking outcomes based on dtype, with finite garbage values where NaN expected). Fix in PR #1612: walks vrt.bands when band is None and ndim==3, masks each arr[..., i] slice against its own via the refactored _sentinel_for_dtype helper (reuses PR #1583's range guard so out-of-range/non-finite/fractional sentinels are a no-op). attrs['nodata'] still carries band 0's sentinel for band=None reads (documented contract). 7 regression tests in test_vrt_multiband_int_nodata_1611.py: uint16 per-band, int32 negative, mixed presence, dtype preservation when no sentinel hit, out-of-range gating, band=N non-regression, attrs contract. 135 existing vrt/nodata geotiff tests still pass. | Pass 13 (2026-05-11): HIGH fixed -- issue #1599. write_geotiff_gpu (and to_geotiff gpu=True) emitted raw NaN bytes for missing pixels even when nodata= was supplied, while the CPU writer substituted NaN with the sentinel before encoding. xrspatial-only round-trips were unaffected (the reader masks both NaN and the sentinel), but external readers (rasterio/GDAL/QGIS) that mask only on the GDAL_NODATA tag saw NaN pixels as valid data -- rasterio reported 100% valid pixels on a 25-NaN file vs CPU's 25-invalid report. Root cause: __init__.py lines 2579-2587 jumped from shape/dtype resolution straight to compression, missing the equivalent of the CPU writer's NaN-to-sentinel rewrite at to_geotiff line ~1156. Fix: cupy.isnan + masked write on a defensive copy of arr, gated on np_dtype.kind=='f' and not np.isnan(float(nodata)). Caller's CuPy buffer preserved (copy before mutate). 7 regression tests in test_gpu_writer_nan_sentinel_1599.py: substitution lands as sentinel, CPU/GPU byte-equivalent, caller buffer not mutated, no-NaN no-op, NaN sentinel skips substitution, rasterio sees identical invalid count on CPU/GPU, multiband 3D path. All other GPU writer tests still pass (50 passed across band-first, attrs, nodata, dask+cupy, writer, nodata aliases). | Pass 12 (2026-05-11): HIGH fixed -- issue #1581. Reading a uint TIFF with a negative GDAL_NODATA sentinel (e.g. uint16 + -9999) raised OverflowError on every backend because the nodata-mask code did arr.dtype.type(int(nodata)) with no range check. Three identical cast sites in __init__.py (numpy eager, _apply_nodata_mask_gpu, _delayed_read_window) plus _resolve_masked_fill and _sparse_fill_value in _reader.py. Fix: _int_nodata_in_range helper gates the cast; out-of-range sentinels are a no-op for value matching (the file can never contain that value), file dtype is preserved, attrs['nodata'] still surfaces the original sentinel so write round-trips keep the GDAL_NODATA tag intact. Matches rasterio behavior. 8 regression tests in test_nodata_out_of_range_1581.py cover the helper, both eager and dask read paths, in-range sentinel non-regression, and GPU helper (cupy-gated). | Pass 11 (2026-05-10): CLEAN. Audited the one additional commit since pass 10 -- #1559 (PR 1548, Centralise GeoTIFF attrs population across all read backends). Refactor extracts _populate_attrs_from_geo_info helper and routes eager numpy, dask, GPU stripped, GPU tiled read paths through it; before the fix dask only emitted crs/transform/raster_type/nodata while numpy emitted the full attrs set including x/y_resolution, resolution_unit, image_description, extra_samples, GDAL metadata, and the CRS-description fields. No data-path arithmetic touched; only attrs dict population. Windowed origin math (origin_x + c0*pixel_width, origin_y + r0*pixel_height) verified to produce -98.0 / 48.75 origin for window=(10,20,50,70) on a (0.1,-0.125) pixel-size raster, with PixelIsArea half-pixel offset preserved on coord lookups (-97.95, 48.6875). Cross-backend attrs parity re-verified: numpy/dask/cupy all emit identical key set on deflate+predictor3+nodata round-trip (crs, crs_wkt, nodata, transform, x_resolution, y_resolution). Data bit-parity re-verified across numpy/dask/cupy on same payload (np.array_equal with equal_nan=True). test_attrs_parity_1548.py (5 tests), test_reader.py/test_writer.py/test_dask_cupy_combined.py (25 tests), GPU orientation/predictor2-BE/LERC-mask/nodata/byteswap suites (65 tests) all green. No accuracy or backend-divergence findings. | Pass 10 (2026-05-10): CLEAN. Audited 5 recent commits: #1558 drop-defensive-copies (frombuffer path still .copy()s before in-place predictor decode at _reader.py:778), #1556 fp-predictor ngjit (writer pre-ravels so 1-D slice arg is correct, float32/64 LE+BE bit-exact), #1552 batched D2H (OOM guard fires before cupy.concatenate, host_buf offsets correct), #1551 parallel-decode gate (>= vs > sends 256x256 default to parallel path, no value diff confirmed via partial-tile parity), #1549 nvjpeg constants (gray + RGB GPU JPEG decode pixel-identical to Pillow CPU, max diff = 0). Cross-backend parity re-verified clean: numpy/dask+numpy/cupy/dask+cupy equal .data/.dtype/.coords/nodata/NaN-mask on deflate+predictor3+nodata; orientations 1-8 numpy==GPU; partial edge tiles 100x150, 257x383, 512x257 numpy==GPU==dask; predictor2 LE/BE round-trip uint8/int16/uint16/int32/uint32 pass; predictor3 LE/BE float32/64 pass. Deferred LOW (pre-existing, not opened): float16 (bps=16, SampleFormat=3) absent from tiff_dtype_to_numpy map - writer never emits, asymmetric but unreachable. | Pass 9 (2026-05-09): TWO HIGH fixed -- (a) PR #1539 closes #1537: TIFF Orientation tag 2/3/4 (mirror flips) on georeferenced files left y/x coords computed from the un-flipped transform, so xarray label lookups returned the wrong pixel even though _apply_orientation flipped the buffer. PR #1521 only updated the transform for the 5-8 axis-swap branch. Fix updates origin and pixel-scale signs along whichever axes were flipped, for both PixelIsArea (origin shifts by N*step) and PixelIsPoint (shifts by (N-1)*step). 10 new tests in test_orientation.py. (b) PR #1546 closes #1540: read_geotiff_gpu ignored Orientation tag completely; CPU correctly applied 2-8 (PR #1521) but GPU returned the raw stored buffer. Cross-backend disagreement on every non-default orientation. Fix adds _apply_orientation_gpu (cupy slicing mirror of the CPU helper) and _apply_orientation_geo_info, threads them into the tiled GPU pipeline, reuses CPU-fallback geo_info for the stripped path to avoid double-applying. 28 new tests in test_orientation_gpu.py (every orientation, single-band tiled, single-band stripped, 3-band tiled, mirror-flip sel-fidelity, default no-tag passthrough). Re-confirmed clean: HTTP coalesce_ranges with overlapping ranges and zero-length ranges, parallel streaming write thread-safety (each tile gets independent buffer via copy or padded zeros), planar=2 + chunky GPU LERC mask propagation matches CPU, IFD chain cap MAX_IFDS=256, max_z_error round-trip on tiled write, _resolve_masked_fill float vs integer dtype semantics. Deferred LOW: per-sample LERC mask (3D mask (h,w,samples)) collapsed to per-pixel ""any sample invalid"" on GPU while CPU honours per-sample; LERC implementations rarely emit 3D masks (verified: lerc.encode with 2D mask on 3-band returns 2D mask). Documented planar=2 + LERC + GPU silently drops mask (rare in practice, source comment acknowledges). | Pass 8 (2026-05-07): HIGH fixed in fix-jpeg-tiff-disable -- to_geotiff(compression='jpeg') wrote files that no external reader can decode. The writer tags compression=7 (new-style JPEG) but emits a self-contained JFIF stream per tile/strip and never writes the JPEGTables tag (347) that the TIFF spec requires for that codec. libtiff/GDAL/rasterio all reject the file with TIFFReadEncodedStrip() failed; our reader round-trips because Pillow decodes the standalone JFIF, hiding the break. Pass-4 notes flagged the read side of the same JPEGTables gap and deferred it; pass-8 covers the write side. Fix: reject compression='jpeg' at the to_geotiff entry with a clear ValueError pointing at deflate/zstd/lzw. The internal _writer.write is untouched so the existing self-decoding tests still cover the codec; re-enabling the public path needs a JPEGTables-aware encoder. PR diffs reviewed but not merged: #1512 (BytesIO source) and #1513 (LERC max_z_error) -- both look correct; #1512 file-like read path goes through read_all() once so the per-call BytesIOSource lock is theoretical, and #1513 forwards max_z_error through every overview/tile/strip/streaming path including _write_vrt_tiled and _compress_block. No regressions found in either open PR. Other surfaces audited clean: predictor=3 with float16 (writer auto-promotes to float32 on both eager and streaming paths, value-exact round-trip); planar=2 multi-tile read uses band_idx*tiles_per_band offset so no cross-contamination between planes; _header.py multi-byte tag parsing uses bo (byte_order) consistently; Pillow YCbCr-vs-tagged-RGB photometric mismatch becomes moot once JPEG is disabled. Deferred (LOW/MEDIUM, not filed): JPEG2000 writer accepts arbitrary dtype with no validation (rare codec, narrow risk); float16 dtype not in tiff_dtype_to_numpy decode map (writer never emits it - asymmetric but unreachable); Orientation tag (274) still ignored on read (pass-4 deferral). | Pass 7 (2026-05-07): HIGH fixed in fix-mmap-cache-refcount-after-replace -- _MmapCache.release() looked up the cache entry by realpath, so a holder that acquired the OLD mmap before an os.replace and released it AFTER another caller had acquired the post-replace entry would decrement the new holder's refcount. Subsequent eviction (cache full, or another acquire) closed the still-in-use mmap, breaking reads with 'mmap closed or invalid'. Real exposure: any concurrent reader/writer pattern where to_geotiff replaces a file that another reader had just opened via open_geotiff with chunks= or via _FileSource. PR #1506 added stale-replacement detection but did not fix the refcount confusion across the pop. Fix: acquire returns an opaque entry token; release takes the token and decrements that exact entry, regardless of cache state. Orphaned (popped) entries close their fh+mmap when their own refcount hits zero. _FileSource updated to pass the token. Regression test test_release_after_path_replacement_does_not_clobber_new_holder added. All 665 geotiff tests pass; GPU path verified. | Pass 6 (2026-05-07) PR #1507: BE pred2 numba TypingError. | Pass 5 (2026-05-06) PR #1506: mmap cache stale after file replace. | Pass 4 (2026-05-06) PR #1501: sparse COG tiles. | Pass 3 (2026-05-06) PR #1500: predictor=3 byte order. | Pass 2 (2026-05-05) PR #1498: predictor=2 sample-wise. | Pass 1 (2026-04-23) PR #1247. Re-confirmed clean over passes 2-7: items 2 (writer always emits LE TIFFs - hardcoded b'II'), 3 (RowsPerStrip default = height when missing), 4 (StripByteCounts missing raises clear ValueError), 5 (TileWidth without TileLength caught by 'tw <= 0 or th <= 0' check at _reader.py:688), 9 (read determinism on compressed+tiled+multiband), 11 (predictor=2 with awkward sample stride round-trips), 18 (compression_level=99 raises ValueError 'out of range for deflate (valid: 1-9)'), 21 (concurrent writes serialize correctly via mkstemp+os.replace), 24 (uint16 dtype preserved on numpy backend, dask honors chunks param), 26 (chunks rounds correctly with remainder chunk for non-tile-aligned). Deferred: item 8 (BytesIO/file-like sources are not supported, source.lower() error) - documented as 'str' parameter, not a bug; item 19 (LERC max_z_error not user-exposed by to_geotiff) - missing feature, not a bug." glcm,2026-05-01,1408,HIGH,2,"angle=None averaged NaN as 0, masking no-valid-pairs as zero texture; fixed via nanmean-style averaging" hillshade,2026-04-10T12:00:00Z,,,,"Horn's method correct. All backends consistent. NaN propagation correct. float32 adequate for [0,1] output." hydro,2026-04-30,,LOW,1,Only LOW: twi log(0)=-inf if fa=0 (out-of-contract); MFD weighted sum no Kahan (negligible). No CRIT/HIGH issues. diff --git a/xrspatial/geotiff/_geotags.py b/xrspatial/geotiff/_geotags.py index 34e2405a0..ca133ca2e 100644 --- a/xrspatial/geotiff/_geotags.py +++ b/xrspatial/geotiff/_geotags.py @@ -758,9 +758,42 @@ def extract_geo_info_with_overview_inheritance( scale_y = base_h / ov_h base_t = base_info.transform + + # Decide which raster_type the inherited info should carry, since the + # origin shift below depends on it. The overview IFD usually does not + # re-declare GeoKey 1025; when its own value is the default + # ``PixelIsArea`` (1), prefer the parent's setting (typically + # ``PixelIsPoint`` is only declared once at level 0). When the overview + # explicitly declared a different value, keep it. + if info.raster_type == RASTER_PIXEL_IS_AREA: + effective_raster_type = base_info.raster_type + else: + effective_raster_type = info.raster_type + + # ``origin_x`` / ``origin_y`` semantics depend on raster_type + # (GeoKey 1025): + # + # * ``PixelIsArea`` (the default): origin is the upper-left corner + # of pixel (0, 0). An overview pixel covering the first + # ``scale_x`` columns of level 0 has its upper-left corner in + # exactly the same place as level 0's, so we keep the origin + # unchanged. + # * ``PixelIsPoint`` (common for DEMs, GeoKey 1025 = 2): origin is + # the *center* of pixel (0, 0). The overview pixel-0 center sits + # at the centroid of the ``scale_x`` x ``scale_y`` level-0 pixels + # it covers, which is + # ``origin + (scale - 1) * 0.5 * pixel_size_lvl0`` along each + # axis (issue #1642). + if effective_raster_type == RASTER_PIXEL_IS_POINT: + origin_shift_x = (scale_x - 1.0) * 0.5 * base_t.pixel_width + origin_shift_y = (scale_y - 1.0) * 0.5 * base_t.pixel_height + else: + origin_shift_x = 0.0 + origin_shift_y = 0.0 + info.transform = GeoTransform( - origin_x=base_t.origin_x, - origin_y=base_t.origin_y, + origin_x=base_t.origin_x + origin_shift_x, + origin_y=base_t.origin_y + origin_shift_y, pixel_width=base_t.pixel_width * scale_x, pixel_height=base_t.pixel_height * scale_y, ) @@ -787,11 +820,11 @@ def extract_geo_info_with_overview_inheritance( info.vertical_units = base_info.vertical_units info.vertical_units_code = base_info.vertical_units_code info.model_type = base_info.model_type - # Keep ``raster_type`` from the overview unless it was the default - # AREA, in which case prefer the parent's setting (PixelIsPoint is - # almost never re-declared on overview IFDs). - if info.raster_type == RASTER_PIXEL_IS_AREA: - info.raster_type = base_info.raster_type + # Use the raster_type we already chose above so the value we wrote + # into the transform stays consistent with the value attached to + # the GeoInfo (PixelIsPoint is almost never re-declared on overview + # IFDs, so this normally inherits from the parent). + info.raster_type = effective_raster_type return info diff --git a/xrspatial/geotiff/tests/test_overview_pixel_is_point_1642.py b/xrspatial/geotiff/tests/test_overview_pixel_is_point_1642.py new file mode 100644 index 000000000..78be52e4b --- /dev/null +++ b/xrspatial/geotiff/tests/test_overview_pixel_is_point_1642.py @@ -0,0 +1,312 @@ +"""Regression tests for issue #1642. + +PR #1641 (issue #1640) inherits level-0 georef on overview reads, but +keeps the level-0 ``origin_x`` / ``origin_y`` unchanged. That is +correct for the default ``PixelIsArea`` raster_type, where the origin +is the upper-left corner of pixel (0, 0). It is *wrong* for +``PixelIsPoint`` (GeoKey 1025 = 2), where the origin is the **center** +of pixel (0, 0): an overview pixel that spans the first ``scale_x`` +columns of level 0 has its center at the centroid of those level-0 +pixels, i.e. ``origin + (scale - 1) * 0.5 * pixel_size_lvl0`` along +each axis. + +These tests pin the corrected behaviour across all four backends and +sanity-check the ``PixelIsArea`` path so the fix doesn't regress +issue #1640 itself. +""" +from __future__ import annotations + +import importlib.util + +import numpy as np +import pytest +import xarray as xr + +from xrspatial.geotiff import open_geotiff, to_geotiff +from xrspatial.geotiff._geotags import ( + GeoTransform, + RASTER_PIXEL_IS_AREA, + RASTER_PIXEL_IS_POINT, + GeoInfo, + extract_geo_info_with_overview_inheritance, +) + + +def _gpu_available() -> bool: + if importlib.util.find_spec("cupy") is None: + return False + try: + import cupy + return bool(cupy.cuda.is_available()) + except Exception: + return False + + +_HAS_GPU = _gpu_available() + + +def _make_pp_cog(path: str, size: int = 1024, pixel: float = 10.0) -> xr.DataArray: + """Write a PixelIsPoint COG with three overview levels. + + Pixel (0, 0) is centred at world (0, 0); each pixel is ``pixel`` + units wide. EPSG:32610 keeps the writer happy without inventing + geographic semantics. Returns the source DataArray. + """ + arr = np.arange(size * size, dtype=np.float32).reshape(size, size) + x = np.arange(size, dtype=np.float64) * pixel + y = -(np.arange(size, dtype=np.float64) * pixel) + da = xr.DataArray(arr, dims=['y', 'x'], + coords={'y': y, 'x': x}, + attrs={'crs': 'EPSG:32610', + 'raster_type': 'point'}) + to_geotiff(da, path, cog=True, overview_levels=[1, 2, 3]) + return da + + +def _make_pa_cog(path: str, size: int = 1024, pixel: float = 10.0) -> xr.DataArray: + """PixelIsArea companion of :func:`_make_pp_cog` for regression checks.""" + arr = np.arange(size * size, dtype=np.float32).reshape(size, size) + x = np.arange(size, dtype=np.float64) * pixel + 0.5 * pixel + y = -(np.arange(size, dtype=np.float64) * pixel + 0.5 * pixel) + da = xr.DataArray(arr, dims=['y', 'x'], + coords={'y': y, 'x': x}, + attrs={'crs': 'EPSG:32610'}) + to_geotiff(da, path, cog=True, overview_levels=[1, 2, 3]) + return da + + +# --------------------------------------------------------------------------- +# End-to-end cross-backend tests on a real COG. +# --------------------------------------------------------------------------- + +@pytest.mark.parametrize("backend_kwargs", [ + pytest.param({}, id="numpy"), + pytest.param({"chunks": 256}, id="dask+numpy"), + pytest.param({"gpu": True}, id="cupy", + marks=pytest.mark.skipif( + not _HAS_GPU, reason="cupy + CUDA required")), + pytest.param({"gpu": True, "chunks": 256}, id="dask+cupy", + marks=pytest.mark.skipif( + not _HAS_GPU, reason="cupy + CUDA required")), +]) +def test_point_overview_first_pixel_center_at_block_centroid(tmp_path, + backend_kwargs): + """For PixelIsPoint COGs, overview pixel-0 center is the level-0 centroid.""" + path = str(tmp_path / "pp_1642_centroid.tif") + _make_pp_cog(path) + + # Level-0 pixel (0, 0) center is at (0, 0). Level-1 pixel (0, 0) + # covers level-0 pixels with centers at x=0 and x=10, y=0 and y=-10; + # the centroid is (5, -5). Level-2 covers level-0 0..3 with centers + # 0, 10, 20, 30 -> centroid 15; same for y -> -15. Level-3 covers + # 0..7 -> centroid 35; y -> -35. + expected = { + 1: (5.0, -5.0), + 2: (15.0, -15.0), + 3: (35.0, -35.0), + } + for lvl, (exp_x, exp_y) in expected.items(): + da = open_geotiff(path, overview_level=lvl, **backend_kwargs) + x0 = float(np.asarray(da.coords['x'])[0]) + y0 = float(np.asarray(da.coords['y'])[0]) + assert abs(x0 - exp_x) < 1e-9, ( + f"backend={backend_kwargs}, lvl={lvl}: first pixel x={x0}, " + f"expected {exp_x}") + assert abs(y0 - exp_y) < 1e-9, ( + f"backend={backend_kwargs}, lvl={lvl}: first pixel y={y0}, " + f"expected {exp_y}") + # raster_type is inherited from level 0. + assert da.attrs.get('raster_type') == 'point' + + +@pytest.mark.parametrize("backend_kwargs", [ + pytest.param({}, id="numpy"), + pytest.param({"chunks": 256}, id="dask+numpy"), + pytest.param({"gpu": True}, id="cupy", + marks=pytest.mark.skipif( + not _HAS_GPU, reason="cupy + CUDA required")), + pytest.param({"gpu": True, "chunks": 256}, id="dask+cupy", + marks=pytest.mark.skipif( + not _HAS_GPU, reason="cupy + CUDA required")), +]) +def test_point_overview_transform_origin_shifted(tmp_path, backend_kwargs): + """``transform`` attr carries the shifted origin for PixelIsPoint overviews.""" + path = str(tmp_path / "pp_1642_transform.tif") + _make_pp_cog(path) + + # transform tuple: (pixel_width, 0, origin_x, 0, pixel_height, origin_y). + expected = { + 0: (10.0, 0.0, -10.0, 0.0), # level 0: no shift + 1: (20.0, 5.0, -20.0, -5.0), + 2: (40.0, 15.0, -40.0, -15.0), + 3: (80.0, 35.0, -80.0, -35.0), + } + for lvl, (exp_pw, exp_ox, exp_ph, exp_oy) in expected.items(): + da = open_geotiff(path, overview_level=lvl, **backend_kwargs) + t = da.attrs['transform'] + assert abs(t[0] - exp_pw) < 1e-9, ( + f"lvl={lvl}: pixel_width={t[0]}, expected {exp_pw}") + assert abs(t[2] - exp_ox) < 1e-9, ( + f"lvl={lvl}: origin_x={t[2]}, expected {exp_ox}") + assert abs(t[4] - exp_ph) < 1e-9, ( + f"lvl={lvl}: pixel_height={t[4]}, expected {exp_ph}") + assert abs(t[5] - exp_oy) < 1e-9, ( + f"lvl={lvl}: origin_y={t[5]}, expected {exp_oy}") + + +def test_point_overview_coords_are_uniform(tmp_path): + """The shifted origin still yields a regular grid -- step == pixel size.""" + path = str(tmp_path / "pp_1642_uniform.tif") + _make_pp_cog(path) + + for lvl in (0, 1, 2, 3): + da = open_geotiff(path, overview_level=lvl) + x = np.asarray(da.coords['x']) + y = np.asarray(da.coords['y']) + if x.size > 1: + dx = np.diff(x) + assert np.allclose(dx, dx[0], atol=1e-9), ( + f"lvl={lvl}: x is non-uniform: {dx[:5]}") + assert abs(float(dx[0]) - da.attrs['transform'][0]) < 1e-9 + if y.size > 1: + dy = np.diff(y) + assert np.allclose(dy, dy[0], atol=1e-9) + assert abs(float(dy[0]) - da.attrs['transform'][4]) < 1e-9 + + +# --------------------------------------------------------------------------- +# Direct helper test: synthetic GeoInfo round-trips. +# --------------------------------------------------------------------------- + +class _StubIFD: + """Minimal IFD-like stub for unit-testing the helper directly.""" + def __init__(self, subfile_type: int, width: int, height: int): + self.subfile_type = subfile_type + self.width = width + self.height = height + + +def test_helper_pixel_is_point_origin_shift_unit(monkeypatch): + """Unit-level: the helper applies the correct origin shift for + PixelIsPoint inheritance, given a stubbed ``extract_geo_info``. + + This exercises the math without going through the writer/reader so a + regression in the formula is caught even if the COG pipeline changes. + """ + from xrspatial.geotiff import _geotags as _gt + + base_ifd = _StubIFD(subfile_type=0, width=1024, height=1024) + ov_ifd = _StubIFD(subfile_type=1, width=128, height=128) + + base_info = GeoInfo( + transform=GeoTransform(origin_x=0.0, origin_y=0.0, + pixel_width=10.0, pixel_height=-10.0), + has_georef=True, + raster_type=RASTER_PIXEL_IS_POINT, + crs_epsg=32610, + ) + ov_info = GeoInfo(has_georef=False, + raster_type=RASTER_PIXEL_IS_AREA) + + calls = {'count': 0} + + def fake_extract(ifd, data, byte_order): + calls['count'] += 1 + if ifd is base_ifd: + return base_info + return ov_info + + monkeypatch.setattr(_gt, 'extract_geo_info', fake_extract) + + out = extract_geo_info_with_overview_inheritance( + ov_ifd, [base_ifd, ov_ifd], b'', '<') + + # scale = 1024 / 128 = 8. Shift = (8 - 1) * 0.5 * 10 = 35. + assert abs(out.transform.pixel_width - 80.0) < 1e-9 + assert abs(out.transform.pixel_height - (-80.0)) < 1e-9 + assert abs(out.transform.origin_x - 35.0) < 1e-9 + assert abs(out.transform.origin_y - (-35.0)) < 1e-9 + assert out.raster_type == RASTER_PIXEL_IS_POINT + assert out.has_georef + + +def test_helper_pixel_is_area_no_origin_shift_unit(monkeypatch): + """PixelIsArea path is unchanged: origin stays at the level-0 corner.""" + from xrspatial.geotiff import _geotags as _gt + + base_ifd = _StubIFD(subfile_type=0, width=1024, height=1024) + ov_ifd = _StubIFD(subfile_type=1, width=128, height=128) + + base_info = GeoInfo( + transform=GeoTransform(origin_x=100.0, origin_y=200.0, + pixel_width=0.5, pixel_height=-0.5), + has_georef=True, + raster_type=RASTER_PIXEL_IS_AREA, + crs_epsg=4326, + ) + ov_info = GeoInfo(has_georef=False, raster_type=RASTER_PIXEL_IS_AREA) + + def fake_extract(ifd, data, byte_order): + return base_info if ifd is base_ifd else ov_info + + monkeypatch.setattr(_gt, 'extract_geo_info', fake_extract) + + out = extract_geo_info_with_overview_inheritance( + ov_ifd, [base_ifd, ov_ifd], b'', '<') + + assert abs(out.transform.origin_x - 100.0) < 1e-9 + assert abs(out.transform.origin_y - 200.0) < 1e-9 + assert abs(out.transform.pixel_width - 4.0) < 1e-9 + assert abs(out.transform.pixel_height - (-4.0)) < 1e-9 + assert out.raster_type == RASTER_PIXEL_IS_AREA + + +def test_helper_point_overview_with_own_geokeys_not_shifted(monkeypatch): + """If the overview IFD carries its own georef, the shift never fires.""" + from xrspatial.geotiff import _geotags as _gt + + base_ifd = _StubIFD(subfile_type=0, width=1024, height=1024) + ov_ifd = _StubIFD(subfile_type=1, width=128, height=128) + + base_info = GeoInfo( + transform=GeoTransform(origin_x=0.0, origin_y=0.0, + pixel_width=10.0, pixel_height=-10.0), + has_georef=True, raster_type=RASTER_PIXEL_IS_POINT, crs_epsg=32610, + ) + own_ov_info = GeoInfo( + transform=GeoTransform(origin_x=999.0, origin_y=-999.0, + pixel_width=77.0, pixel_height=-77.0), + has_georef=True, raster_type=RASTER_PIXEL_IS_POINT, crs_epsg=32610, + ) + + def fake_extract(ifd, data, byte_order): + return base_info if ifd is base_ifd else own_ov_info + + monkeypatch.setattr(_gt, 'extract_geo_info', fake_extract) + + out = extract_geo_info_with_overview_inheritance( + ov_ifd, [base_ifd, ov_ifd], b'', '<') + + # Untouched: helper returns the overview's own info verbatim. + assert abs(out.transform.origin_x - 999.0) < 1e-9 + assert abs(out.transform.origin_y - (-999.0)) < 1e-9 + assert abs(out.transform.pixel_width - 77.0) < 1e-9 + + +# --------------------------------------------------------------------------- +# PixelIsArea regression: ensure the #1640 fix still passes. +# --------------------------------------------------------------------------- + +def test_area_overview_origin_unchanged_regression(tmp_path): + """PixelIsArea overview origin must still equal level-0 origin (#1640).""" + path = str(tmp_path / "pa_1642_regression.tif") + _make_pa_cog(path) + base = open_geotiff(path, overview_level=0) + base_t = base.attrs['transform'] + for lvl in (1, 2, 3): + da = open_geotiff(path, overview_level=lvl) + t = da.attrs['transform'] + assert abs(t[2] - base_t[2]) < 1e-9, ( + f"PixelIsArea lvl={lvl}: origin_x drifted from {base_t[2]} " + f"to {t[2]}") + assert abs(t[5] - base_t[5]) < 1e-9