Skip no-op astype in _delayed_read_window (#1624)#1626
Conversation
After #1601 widened the call site to always pass target_dtype, every dask chunk ran arr.astype(target_dtype) even when arr.dtype already matched. numpy.astype defaults to copy=True and so allocated a same- dtype chunk-sized buffer and memcpy on every read. Gate the cast on a real dtype mismatch; the #1597 mask path still promotes uint to float64 inline so every chunk lands in the dask-declared dtype. Regression test in test_dask_no_op_astype_1624.py wraps read_to_array to capture an ndarray subclass that records astype calls, then asserts no same-dtype call survives the per-chunk return path.
There was a problem hiding this comment.
Pull request overview
This PR addresses a performance regression in the GeoTIFF dask read path by avoiding redundant per-chunk astype() calls when the chunk dtype already matches the dask-declared dtype, while preserving the existing integer-nodata promotion behavior introduced for #1597/#1601.
Changes:
- Skip the per-chunk
arr.astype(target_dtype)in_delayed_read_windowwhenarr.dtype == target_dtypeto avoid unnecessary chunk-sized allocations/copies. - Add a regression test that instruments the per-chunk
read_to_arrayreturn value to detect same-dtypeastype()calls. - Add coverage to ensure the uint16 nodata-sentinel path still promotes to float64.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
xrspatial/geotiff/__init__.py |
Gates per-chunk astype() on a real dtype mismatch to avoid no-op copy costs. |
xrspatial/geotiff/tests/test_dask_no_op_astype_1624.py |
Adds regression tests to confirm no-op casts are skipped while nodata promotion remains intact. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Copilot review on #1626 flagged two unused-variable issues. Fixes: - Delete test_float32_chunks_avoid_redundant_copy. It declared same_dtype_casts and wrapped _delayed_read_window without recording anything, so it only asserted output dtype -- which does not validate the #1624 regression. test_astype_skipped_when_dtypes_match already covers the same surface properly via an ndarray subclass that tracks astype calls; verified it fails on the pre-fix code with "Same-dtype astype still runs per chunk". - In test_uint16_mask_path_still_promotes, exercise the previously- unused arr fixture: assert that pixels equal to the 65535 sentinel become NaN and every other pixel matches arr.astype(float64). Anchors the test to fixture values so regressions in the mask branch surface here, not just as dtype drift.
|
Addressed both Copilot review comments in 4c66bb1:
|
Closes #1624.
Summary
arr.astype(target_dtype)in_delayed_read_windowon a real dtype mismatch.target_dtype, every dask chunk ranastypeeven whenarr.dtype == target_dtype.numpy.astypedefaults tocopy=Trueand so allocated a chunk-sized same-dtype buffer and memcpy on every read.Test plan
test_dask_no_op_astype_1624.pywrapsread_to_arrayso it returns an ndarray subclass that recordsastypecalls, then asserts no same-dtypeastypesurvives the per-chunk return path.mainwith the expected error message; passes after the fix.test_dask_int_nodata_chunks_1597.py(the regression suite from Fix read_geotiff_dask losing nodata mask in non-first chunks (#1597) #1601) still passes -- the int-with-sentinel promotion is intact.xrspatial/geotiff/tests/suite passes apart from three pre-existingTestPalettefailures onmainthat are unrelated to this change.