Skip to content

Use int64 row_ptr in _build_row_csr_numba (#1388)#1391

Merged
brendancol merged 1 commit into
mainfrom
issue-1388
Apr 30, 2026
Merged

Use int64 row_ptr in _build_row_csr_numba (#1388)#1391
brendancol merged 1 commit into
mainfrom
issue-1388

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Casts row_ptr, diff, running, and offsets in _build_row_csr_numba to int64 so the cumulative edge count cannot wrap on a tall raster with many long polygon edges
  • Keeps col_idx at int32 (it stores edge indices, not cumulative counts)
  • Adds 5 tests in TestBuildRowCsrInt64 covering the dtype contract, the empty-edges short-circuit, int64 representability past the int32 boundary, CSR layout for overlapping edges, and an end-to-end rasterize() call

Fixes #1388. Audit follow-up to #1223.

Test plan

  • pytest xrspatial/tests/test_rasterize.py passes (143 passed, 2 skipped on this CPU host)
  • pytest xrspatial/tests/test_rasterize.py::TestBuildRowCsrInt64 -v passes (5/5)

Cumulative edge counts in _build_row_csr_numba previously stored to an
int32 row_ptr / diff / running scalar / offsets array. On a tall raster
(~50k rows) carrying many long polygon edges, the prefix sum could pass
2**31 - 1 and wrap, after which np.empty(total, dtype=np.int32) either
raised on a negative size or allocated an undersized buffer that the
Pass 2 fill walked off the end of inside the numba kernel.

Cast row_ptr, diff, running, and offsets to int64. col_idx values are
edge indices and stay int32. Downstream consumers (the GPU scanline
kernel and np.diff(row_ptr).max()) accept int64 unchanged.

Tests cover the dtype contract (row_ptr is int64 on both the populated
and empty-edge paths), int64 representability past int32 max, CSR
correctness for overlapping edges, and an end-to-end rasterize call.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Apr 30, 2026
@brendancol brendancol merged commit 570bdd9 into main Apr 30, 2026
11 checks passed
@brendancol brendancol deleted the issue-1388 branch May 4, 2026 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rasterize: _build_row_csr_numba int32 overflow on large/dense edge inputs

1 participant