Skip to content

Add dask backends for rasterize()#997

Merged
brendancol merged 13 commits into
masterfrom
dask-rasterize
Mar 10, 2026
Merged

Add dask backends for rasterize()#997
brendancol merged 13 commits into
masterfrom
dask-rasterize

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

@brendancol brendancol commented Mar 10, 2026

Summary

  • New chunks parameter on rasterize() for dask-backed parallel output. Splits the output raster into tiles, rasterizes each tile via dask.delayed, assembles with da.block.
  • Polygons rasterized per-tile (scanline fill checks pixel centers, which are unique per tile). Lines and points pre-rasterized at full-raster scale and distributed to tiles -- avoids Bresenham clipping artifacts at tile boundaries.
  • dask-geopandas input support (materializes eagerly; geometry data is small, parallelism is on the output side).
  • cuspatial optional import stub for future GPU geometry parsing.
  • dask and gpu extras in setup.cfg.

Test plan

  • 34 TestDaskNumpy tests: polygon/line/point/mixed parity vs numpy across chunk sizes (5,5), (3,7), (100,100). Chunk boundary, empty tile, all 6 merge modes, holes, all_touched, dtype, coords, GeoDataFrame, dask-geopandas.
  • 8 TestDaskCupy tests: same structure with GPU backend.
  • All 94 tests pass (2 skipped for missing dask-geopandas).

Tile-based parallel rasterization via dask.delayed. New `chunks`
parameter splits the output raster into tiles, rasterizes each
independently, and assembles with da.block.

Polygons are rasterized per-tile (scanline fill is tile-exact).
Lines and points are pre-rasterized at full-raster scale then
distributed to tiles to avoid Bresenham boundary artifacts.

Also adds dask-geopandas input support (.compute() eagerly) and
cuspatial optional import stub.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Mar 10, 2026
Removes three functions (_bresenham_collect, _precompute_line_pixels,
_pixels_for_tile) that materialized all Bresenham pixels in the
scheduler process before any dask task ran. Line segments and points
are now kept in their compact representation (5 arrays for segments,
3 for points) and filtered per-tile via bbox overlap.

Bresenham runs inside each tile worker using the existing @ngjit
_burn_lines_cpu (numpy) or GPU burn_lines kernel (cupy). Segments
are offset to tile-local coords; since Bresenham is translation-
invariant, the pixel path is exact.

Fixes: pure-Python _bresenham_collect (now removed), O(T*P) per-tile
pixel scanning (now O(T*S) segment filtering where S << P), and
O(total_line_pixels) upfront memory allocation (now O(segments)).
Shapely geometry pickle goes through __reduce__ with WKB encoding per
object. When a polygon intersects many tiles, that serialization cost
multiplies. Serialize to WKB once up front (~20x cheaper to pickle as
raw bytes), pass the bytes into each delayed task, and deserialize
inside the worker via shapely.from_wkb.

Applied to both _run_dask_numpy and _run_dask_cupy orchestrators.
Expand TestDaskCupy from 8 to 30 tests, matching TestDaskNumpy
coverage: line/point parity across chunk sizes, chunk boundary
tests for lines and points, single-chunk, output type/shape/coords,
all 6 merge modes, holes, all_touched, dtype, int chunks shorthand,
and GeoDataFrame input.
The merge parameter now accepts a callable in addition to the built-in
string modes. For CPU backends, pass a @ngjit function. For GPU backends,
pass a @cuda.jit(device=True) function. Signature:

    merge_fn(old_val, new_val, is_first) -> float64

Internally, the integer-dispatch _merge_pixel is replaced by
_apply_merge which calls the merge function directly. Built-in modes
are now @ngjit functions (_merge_last, _merge_first, etc.) looked up
from _MERGE_FUNCTIONS. GPU kernels are compiled per merge function
via a factory/cache pattern in _ensure_gpu_kernels.

Adds 11 new tests (TestCustomMerge, TestCustomMergeDask,
TestCustomMergeGPU) and a rasterize user guide notebook stub.
The merge function signature is now (pixel, props, is_first) -> float64
where props is a 1D float64 array carrying one or more feature
properties. Built-in modes (last, first, max, min, sum, count) use
props[0]. Custom functions can access any column.

New 'columns' parameter on rasterize() extracts multiple GeoDataFrame
columns into the props array. For (geometry, value) pairs, props is
a length-1 array containing the value.

Internally, the pipeline now uses geometry-index + props-table instead
of carrying scalar values. Polygon edges use edge_geom_id (already
existed) to index into a poly_props table. Line segments and points
carry geom_idx arrays that index into line_props / point_props tables.
The edge_value array is removed entirely.

Example:

    @ngjit
    def weighted_density(pixel, props, is_first):
        density = props[0] / props[1]
        if is_first:
            return density
        return pixel + density

    rasterize(gdf, columns=['population', 'area'],
              merge=weighted_density, ...)
…ations

- CSR construction: difference-array counting O(n_edges + height) vs O(n_edges * avg_height)
- Shell sort replaces insertion sort in scanline fill (CPU + GPU): O(n^4/3) vs O(n^2)
- Pre-compute segment bboxes once before dask tile loop
- Pre-transfer shared GPU props tables once in dask+cupy path
- Boundary segment extraction: direct coord extraction, no LineString wrappers
- Edge loop fallback: pre-allocated arrays instead of per-edge list appends
- Vectorized edge extraction: early del of intermediates to cut peak memory
- Shared _geometry_bboxes() call in list-of-pairs input parsing
… dedup _parse_input

- Remove per-element np.int32() wrapping in _extract_points_loop and _extract_lines_loop
- Use object array + boolean indexing for WKB tile filtering (replaces nonzero + list comprehension)
- Hoist geometry.tolist() and total_bounds above column/columns branch in _parse_input
- Add early return in _build_row_csr_numba for empty edge input
- Use np.arange instead of list(range()) for poly_ids in tile workers
@brendancol brendancol merged commit 332ad5e into master Mar 10, 2026
11 checks passed
@brendancol brendancol deleted the dask-rasterize branch May 4, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant