geotiff: add mask_nodata kwarg to preserve integer source dtypes (closes #2052)#2058
Merged
Conversation
…dtypes (#2052) open_geotiff(path, dtype="uint16") used to raise on a uint16 file whose nodata sentinel matched real pixels. The masking block promoted the array to float64 before the dtype= cast ran, and the cast then rejected float-to-int. The docstring promised "Pass dtype=... to keep the source dtype", but for the common integer-nodata case that contract was unreachable. Add mask_nodata: bool = True to open_geotiff, read_geotiff_dask, read_geotiff_gpu, and read_vrt. mask_nodata=False skips the sentinel-to-NaN step (and the float64 promotion that comes with it) so the source dtype survives. attrs['nodata'] still carries the raw sentinel either way, so downstream code can mask explicitly. The existing local variable named mask_nodata in the eager reader is renamed to nodata_sentinel to free up the name. Docstrings on all four public readers document the new kwarg and its interaction with dtype=. Default behaviour is unchanged. Threads through every public reader entry, including the dask graph declared-dtype calculation, the GPU chunked GDS path, and the VRT chunked path. test_reader_kwarg_order_1935 picks up the new kwarg in its canonical-order list.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2052.
Problem
open_geotiff(path, dtype="uint16")on a uint16 file whose nodatasentinel matched real pixels raised
ValueError. The masking blockpromoted the array to
float64before thedtype=cast ran, and thecast then rejected the float-to-int conversion.
The docstring promised "Pass
dtype=...to keep the source dtype",but the most common integer-nodata case never reached the cast with
its source dtype intact.
Fix
Add
mask_nodata: bool = Truetoopen_geotiff,read_geotiff_dask,read_geotiff_gpu, andread_vrt. Whenmask_nodata=False:dtype=);attrs['nodata']still carries the raw sentinel so callers canmask themselves.
Default stays
True, so existing callers see no change.The eager reader's existing local named
mask_nodata(the resolvedsentinel value used for comparison) is renamed to
nodata_sentinelto free up the kwarg name. Docstrings on all four readers document
the new kwarg and the
dtype=interaction.Threading
The kwarg flows through every public reader entry, including:
open_geotiffdispatcher (forwards to all backends)read_geotiff_dask(gates theeffective_dtypepromotion and theper-chunk
_delayed_read_windowmask)read_geotiff_gpu(gates_apply_nodata_mask_gpuon the eagerpath; gates the chunked GDS
declared_dtypecalculation and theper-chunk task)
read_vrt(gates_apply_integer_sentinel_maskon eager andchunked paths)
The pre-existing kwarg-order canonical-list test (#1935) picks up
the new entry.
Tests
10 new tests in
test_mask_nodata_kwarg_2052.py:dtype="uint16"raised before, now works withmask_nodata=Falsemask_nodata=Truestill promotes to float64 + NaNdtype="uint32"integer-to-integer cast works with the opt-outmask_nodata=Falsedtype=+mask_nodata=Falseround-tripTest plan
pytest xrspatial/geotiff/tests/test_mask_nodata_kwarg_2052.py -xpytest xrspatial/geotiff/tests/ -k "nodata or dtype" -q(
test_predictor2_big_endian_gpu_1517,test_size_param_validation_gpu_vrt_1776)remain, and both reproduce on main.