The public rasterize() API accepts width, height, and resolution with no upper bound on the resulting raster size. A caller can request arbitrarily large output arrays, which triggers unbounded host or device memory allocation before any geometry work runs.
Where
xrspatial/rasterize.py:
_run_numpy line 998-999: np.full((height, width), fill, dtype=np.float64) plus an int8 written mask, 9 bytes per pixel, with no guard.
_run_cupy line 1336-1337: same allocation on device with cupy.full / cupy.zeros.
_rasterize_tile_numpy / _rasterize_tile_cupy allocate per-tile, but the full-raster size still drives tile count and filtering work.
rasterize() public API lines 2118-2137: final_width, final_height = int(width), int(height) and the resolution path final_width = max(int(np.ceil((xmax - xmin) / x_res)), 1) accept any positive size.
Reproducer
from xrspatial.rasterize import rasterize
from shapely.geometry import box
# 8 GB float64 + 1 GB int8 = 9 GB, no guard
rasterize([(box(0, 0, 1, 1), 1.0)], width=31623, height=31623,
bounds=(0, 0, 1, 1))
# Via resolution: 10^18 pixels requested
rasterize([(box(0, 0, 1, 1), 1.0)], resolution=1e-9,
bounds=(0, 0, 1, 1))
Severity
HIGH. The existing xrspatial/geotiff/_reader.py already sets MAX_PIXELS_DEFAULT = 1_000_000_000 and ships a _check_dimensions(width, height, samples, max_pixels) helper. The rasterize path does not use it, so a caller can request terabyte-scale allocations before any input is parsed.
Fix direction
- Add a
max_pixels keyword to rasterize() defaulting to MAX_PIXELS_DEFAULT (shared with the geotiff reader).
- After resolving
final_width and final_height, call _check_dimensions(final_width, final_height, 1, max_pixels) and raise a clear ValueError before allocating.
- Cover both the explicit
width / height path and the resolution path.
- Add regression tests for oversize width/height and oversize resolution-derived dimensions, plus a passing test with a reasonable size and a test that a larger explicit
max_pixels lets the call through.
Other findings in this audit
Also noted a MEDIUM int32 overflow risk in _build_row_csr_numba (total = row_ptr[height] with int32 accumulator) that can bite under extreme edge-row cell counts. Out of scope here; the allocation guard indirectly bounds realistic inputs.
The public
rasterize()API acceptswidth,height, andresolutionwith no upper bound on the resulting raster size. A caller can request arbitrarily large output arrays, which triggers unbounded host or device memory allocation before any geometry work runs.Where
xrspatial/rasterize.py:_run_numpyline 998-999:np.full((height, width), fill, dtype=np.float64)plus an int8writtenmask, 9 bytes per pixel, with no guard._run_cupyline 1336-1337: same allocation on device withcupy.full/cupy.zeros._rasterize_tile_numpy/_rasterize_tile_cupyallocate per-tile, but the full-raster size still drives tile count and filtering work.rasterize()public API lines 2118-2137:final_width, final_height = int(width), int(height)and theresolutionpathfinal_width = max(int(np.ceil((xmax - xmin) / x_res)), 1)accept any positive size.Reproducer
Severity
HIGH. The existing
xrspatial/geotiff/_reader.pyalready setsMAX_PIXELS_DEFAULT = 1_000_000_000and ships a_check_dimensions(width, height, samples, max_pixels)helper. The rasterize path does not use it, so a caller can request terabyte-scale allocations before any input is parsed.Fix direction
max_pixelskeyword torasterize()defaulting toMAX_PIXELS_DEFAULT(shared with the geotiff reader).final_widthandfinal_height, call_check_dimensions(final_width, final_height, 1, max_pixels)and raise a clearValueErrorbefore allocating.width/heightpath and theresolutionpath.max_pixelslets the call through.Other findings in this audit
Also noted a MEDIUM int32 overflow risk in
_build_row_csr_numba(total = row_ptr[height]with int32 accumulator) that can bite under extreme edge-row cell counts. Out of scope here; the allocation guard indirectly bounds realistic inputs.