Skip to content

read_geotiff_dask per-chunk astype copies even when dtype already matches #1624

@brendancol

Description

@brendancol

Describe the bug

_delayed_read_window in xrspatial/geotiff/__init__.py:1871-1872 calls arr.astype(target_dtype) on every chunk. When target_dtype already equals arr.dtype (the common case for float source rasters), numpy.ndarray.astype still allocates a new buffer and copies because its default is copy=True.

PR #1601 widened the call site to always pass target_dtype so dask declared dtype and per-chunk dtype agree. That fixed #1597 for integer rasters with an in-range nodata sentinel, where the dask graph declared float64 but only chunks that hit the sentinel actually promoted, so concatenation cast later chunks back to int and clobbered NaN with 0. Correct fix, but the always-on cast now allocates a same-dtype copy on every chunk of every read.

Reproduction

import numpy as np, xarray as xr, tempfile, os
from xrspatial.geotiff import to_geotiff, read_geotiff_dask
import xrspatial.geotiff as gt

H, W = 1024, 1024
data = np.random.rand(H, W).astype(np.float32)
arr_in = xr.DataArray(data, dims=['y', 'x'],
                      coords={'y': np.arange(H), 'x': np.arange(W)})
tmp = tempfile.mkdtemp()
path = os.path.join(tmp, 'probe.tif')
to_geotiff(arr_in, path, compression='none')

orig = gt._delayed_read_window
trace = []
def patched(*args, **kwargs):
    trace.append(kwargs.get('target_dtype'))
    return orig(*args, **kwargs)
gt._delayed_read_window = patched

read_geotiff_dask(path, chunks=256)
# Every delayed call carries target_dtype=float32 even though no cast is needed.
assert all(t == np.dtype('float32') for t in trace if t is not None)

Expected behavior

Skip the cast when target_dtype == arr.dtype. A 30 TB float32 read with 100 MB chunks shouldn't pay an extra 100 MB allocation and memcpy per task for a no-op cast.

Fix

if target_dtype is not None and arr.dtype != target_dtype:
    arr = arr.astype(target_dtype)

The #1597 fix still holds. The integer-mask branch above promotes arr to float64 in place when sentinels hit, so when target_dtype == float64 and arr.dtype == float64, the astype is a no-op; when target_dtype differs (caller-supplied dtype, or effective_dtype=float64 over an unmasked int chunk), the astype runs as before.

Context

Found during the 2026-05-11 geotiff performance sweep.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingperformancePR touches performance-sensitive code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions