feat: add numpy array support to .intersect() by aacovski · Pull Request #2646 · modflowpy/flopy

aacovski · 2025-11-04T20:37:59Z

Summary

This PR adds numpy array support to the .intersect() Grid subclasses with significant performance improvements for large datasets, while maintaining full backward compatibility. ~~Turned 20 minute intersect() loop with float coords to vectorized intersect() passing arrays down to around 6 seconds.~~ Will change the methods in a future PR.

Key Improvements

StructuredGrid (10,000 cells)

Average speedup: 5.19x
Best: 8.99x (10,000 points)
Speedup improves with more points: 1.55x → 8.99x

VertexGrid (9,975 cells)

Average speedup: 1.75x
Best: 3.54x (10,000 points)
Scalar 10k points took 22 minutes, vectorized 6 minutes

UnstructuredGrid (9,975 cells)

Average speedup: 2.13x
Best: 2.65x (10,000 points)
Scalar 10k points took 12.5 minutes, vectorized 4.7 minutes

benchmark_scalar_vs_vectorized.py

Verification:

✅ All results verified for equivalence with original scalar implementations
✅ All repository tests passing (test_intersection, test_*_xyz_intersect, GridIntersect tests)
✅ Added comprehensive array input tests to test_grid.py

The vectorization provides significant performance improvements while maintaining 100% backward compatibility and result equivalence.

Add vectorized array support to the intersect() method with significant performance improvements for large datasets. Key improvements: - Support numpy arrays for x, y, z coordinates - Use cKDTree spatial indexing for efficient neighbor queries - Pre-compute bounding boxes for faster filtering - Batch point-in-polygon checks using Path.contains_points() - Maintain backward compatibility with scalar inputs Performance: - Small datasets: Same speed as before - Large datasets: ~10-100x faster with cKDTree and batching Testing: - All existing intersection tests pass - Passes ruff check and format 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…l centers - Move defaultdict import to module level - Use xyzcellcenters cached property instead of direct _xc/_yc access - Ensures coordinate transformations are applied correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

codecov · 2025-11-04T22:16:08Z

Codecov Report

❌ Patch coverage is 89.28571% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.6%. Comparing base (556c088) to head (659d576).
⚠️ Report is 85 commits behind head on develop.

Files with missing lines	Patch %	Lines
flopy/discretization/structuredgrid.py	89.0%	7 Missing ⚠️
flopy/discretization/vertexgrid.py	86.5%	7 Missing ⚠️
flopy/discretization/unstructuredgrid.py	92.3%	4 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #2646      +/-   ##
===========================================
+ Coverage     55.5%    72.6%   +17.1%     
===========================================
  Files          644      667      +23     
  Lines       124135   129332    +5197     
===========================================
+ Hits         68947    93965   +25018     
+ Misses       55188    35367   -19821

Files with missing lines	Coverage Δ
flopy/discretization/unstructuredgrid.py	`77.0% <92.3%> (-4.5%)`	⬇️
flopy/discretization/structuredgrid.py	`53.1% <89.0%> (+5.7%)`	⬆️
flopy/discretization/vertexgrid.py	`80.3% <86.5%> (-3.4%)`	⬇️

... and 557 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

wpbonelli

Thanks for this @aacovski. Some thoughts

It would be best to keep the Grid.intersect() API consistent. If one subclass supports both scalars and arrays, they all should IMO?
We have a dedicated GridIntersect utility to whose intersect() method you can pass a shapely.MultiPoint. If Grid.intersect() begins to support arrays, can it delegate to GridIntersect? Does this new intersection method then belong in GridIntersect? Does it have any limitations, e.g. only work for point intersections or can it do arbitrary shapes?
On that note, can you describe your motivation for kdtree over e.g. rtree from shapely? a performance comparison may be worthwile? I am aware kdtree is more expensive to change after building but I guess that isn't relevant as the usage pattern for GridIntersect is currently if your grid changes, make a new GridIntersect. but an intersection benchmark seems in order.
Regardless how we proceed this should probably have tests before merging

Maybe @dbrakenhoff can weigh in. He has written most of the GridIntersect capability AFAIK and is much more knowledgeable here than I

aacovski · 2025-11-07T15:34:47Z

Thank you for your response @wpbonelli. I used a similar intersection method with MINEDW, where we're dealing with 3D unstructured meshes, and I changed my method ad hoc to use a 2D grid approach without thinking about the tree choice. ~~You're right about Rtree - it is much faster for searching 2D candidates then checking z than using a 2D/3D KDTree. I imagine the speedup might be like 2x vs KDTree~~: but the bottleneck is looping through scalars.

Edit: I did a benchmark with my model data against a strtree method using bounding boxes: KDtree is still much faster, to my surprise. The problems with shapely's strtree: geometry (bounding boxes) vs point and you can't query multiple points.

My method is similar to the current approach where we look at grid cell centroids for candidates, then check if grid cells contain points. If I understand correctly, UnstructuredGrid.intersect() handles points-in-cells (point location) whereas GridIntersect handles cells-in-polygons (spatial overlay). I didn't look at GridIntersect prior - my apologies. Definitely some naming confusion here. Might be better to call intersect() locate().

Regarding your first point about API consistency: I don't think there's a workaround for getting true vectorized performance while maintaining a scalar-only API.

wpbonelli · 2025-11-07T19:20:21Z

@aacovski apology unnecessary, this is valuable! Would be nice to see your benchmark numbers.

Might be better to call intersect() locate().

Idk, it's still an intersection, just limited to points, no? GridIntersect can just do a broader set of geometries, points, lines, polygons, etc. So do I understand right that the kdtree method is limited to points? If the speedup is that significant it still seems it may be worth adding, just for point intersections?

Regarding your first point about API consistency: I don't think there's a workaround for getting true vectorized performance while maintaining a scalar-only API.

I didn't mean to suggest the API should stay scalar-only, rather that the API for all Grid subclasses (StructuredGrid, VertexGrid, UnstructuredGrid) should all accept either scalars or arrays if any do.

aacovski · 2025-11-07T21:09:25Z

@wpbonelli:

benchmark_intersect_methods.zip

Attached is the benchmark script. Note that I had to implement UnstructuredGrid support for GridIntersect (which didn't exist), so this comparison isn't perfect, the methods return different data structures. It's a geometric speedup curve (I tried 50k points too).

More broadly, it could be worth replacing STR-tree with KDTree throughout GridIntersect. This could work for all geometry types by converting them to centroids/vertices to find candidates: one algorithm to handle all geometries and grid structures. Just based on my prior experience: checking vertices within convex hulls is the bottleneck: should be parallelized for performance.

If you make a Github issue, I could potentially resolve the API differences that would be caused by this, and I can try to implement it for all of the Grid subclasses (eventually) in this PR.

dbrakenhoff · 2025-11-08T15:08:11Z

Hi @aacovski and @wpbonelli,

Thanks for putting this together! First of all, I think it would be great to support vector based operations on all Grid.intersect() calls.

As for GridIntersect, this was primarily designed to return the actual intersection result between lines and polygons and the grid, and statistics about the length/area etc. It also works for points of course, but I have also found it to be a bit slow and not ideal.

GridIntersect has an intersects() method that only returns the ids of intersected cells. (Sorry about the naming, the actual intersect method should probably have been called intersection(), following shapely). This is much faster, but the downside of this method however is that it doesn't return one result for each point, losing the connection between input points and grid cellids. If you do want that 1-1 relationship, that forces you to loop over points which is much slower of course. Perhaps there is an optimization there to obtain that 1-1 relationship without incurring all the overhead that performing the actual intersection causes.

Because of these down sides, the method I have used to get a faster point -> grid cell intersection uses geopandas. No idea how it compares, but posting it here in case it is useful.

import geopandas as gpd
import flopy

x = ... # some array
y = ... # some array

# only for collecting grid geometries
gi = flopy.utils.GridIntersect(grid)

# spatial join points with grid and add resulting cellid to obs_ds
spatial_join = gpd.GeoDataFrame(geometry=gpd.points_from_xy(x, y)).sjoin(
    gpd.GeoDataFrame({"cellid": gi.cellids}, geometry=gi.geoms),
    how="left",
)
# deal with edge cases, sort by index and cellid, then pick lowest cellid
cellids = (
    spatial_join.reset_index()
    .sort_values(["index", "cellid"])
    .groupby("index")
    .first()["cellid"]
)

Another thing that came to mind is this library https://github.com/Deltares/numba_celltree, which also contains numba optimized unstructured grid operations. Not sure what types of operations it supports exactly, but might be interesting as well.

As you mentioned, GridIntersect doesn't support UnstructuredGrid because it only supports 2D intersection operations. But it should be fairly simple to just implement UnstructuredGrid if it's layered by allowing intersection with one layer at a time, or by adding some basic support for a z-coordinate. But I haven't really gotten around to doing so since I have mostly worked with VertexGrids up to now. So any suggestions or additions there are very welcome.

Anyway, summarizing, I'd very much support the addition of vectorized intersect calls, and I'd be happy to take a look at any addition of UnstructuredGrid support to the GridIntersect class 👍 .

adacovsk · 2025-11-09T01:21:33Z

Using numba (or numba_celltree) automatically gives use parallelization which is an added benefit. I did further benchmarking and it is much faster my single-threaded kdtree.

I'll redo this PR to just handle the Grid subclasses to pass arrays for vector operations and do another PR in the future to change the methods.

Extends array/vectorization support to all three grid types for consistent API: - StructuredGrid: Process arrays of points efficiently - VertexGrid: Accept array inputs matching UnstructuredGrid behavior - UnstructuredGrid: Fix z-coordinate layer indexing bug - Add comprehensive tests in autotest/test_grid.py Performance improvements (scalar loop vs vectorized, ~10k cells): - StructuredGrid: 5.19x average speedup (up to 8.99x) - VertexGrid: 1.75x average speedup (up to 3.54x) - UnstructuredGrid: 2.13x average speedup (up to 2.65x) All implementations verified for equivalence with original methods. Tests pass for all grid types with various point counts (1-10,000). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Split long exception messages across multiple lines to comply with 88-character line limit. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…ds with z When x,y coordinates are out of bounds and z is provided with forgive=True, the method now correctly returns (None, nan, nan) instead of (None, None, None). This ensures exact equivalence with the original implementation where: - lay (layer) is set to None when x,y are out of bounds - row and col remain as nan (not None) Added test case to verify this edge case behavior. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

dbrakenhoff · 2025-11-10T09:59:44Z

+    """Test StructuredGrid.intersect() edge case: out-of-bounds x,y with z.
+
+    This tests the specific case where x,y are out of bounds and z is provided.
+    The expected behavior is to return (None, nan, nan).


I think it would make more sense to return (nan, nan, nan) in the future (or at least be consistent in output types). Maybe something we could consider changing in a future version?

dbrakenhoff

Looks good! I made some minor comments, more for future PRs I think.

dbrakenhoff · 2025-11-10T10:08:37Z

@aacovski or @adacovsk, could you run ruff format and then commit. There are some linting errors.

The linux openGL test fail is unrelated.

- Run ruff format on modified files - Move Delaunay import to top of test_grid.py - Simplify valid_mask calculation by flipping invalid_mask instead of recalculating with isnan checks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

wpbonelli

Yeah the failure is unrelated, we have had unending CI issues with that, it's just for pyvista notebooks. I'll look at it separately. No reason to hold this up though if you guys are happy with it, LGTM.

jlarsen-usgs

This looks good to me. The only thing that I see that might warrant some consideration is how we return intersections when arrays are supplied. Should it be a tuple of n numpy arrays ex. (layer_array, row_array, col_array) or should it be zipped into ((lay_0, row_0, col_0), ...., (lay_n, row_n, col_n)).

I think there are advantages and disadvantages of each approach, but it might be worth thinking about before the final merge.

Nice work!

wpbonelli · 2025-11-14T15:48:34Z

@dbrakenhoff comment meant for pastas not here?

dbrakenhoff · 2025-11-14T15:56:06Z

Oops!

wpbonelli · 2025-11-14T16:26:43Z

The only thing that I see that might warrant some consideration is how we return intersections when arrays are supplied. Should it be a tuple of n numpy arrays ex. (layer_array, row_array, col_array) or should it be zipped into ((lay_0, row_0, col_0), ...., (lay_n, row_n, col_n)).

I think there are advantages and disadvantages of each approach, but it might be worth thinking about before the final merge.

I'd lean towards tuple of arrays so

it's consistent with the input parameters
caller can decide whether to zip (easy 1-liner)

Assuming we did zip and returned a materialized list/array not just the zip object, I think it's O(3 * N) where N is the number of points. It's a bit more intuitive to get a collection of coordinate tuples back, at least to me, but someone might not want that, say they want to pass the indices to another routine expecting vector inputs, it'd save some cycles to not have to unzip. For tons of points I reckon that outweighs the intuitiveness argument?

dbrakenhoff · 2025-11-14T16:36:48Z

Agreed, it's also (slightly) more convenient to return 3 indexing arrays for indexing numpy arrays.

wpbonelli · 2025-11-14T17:05:45Z

thanks again for this @aacovski

wpbonelli · 2025-11-14T17:09:25Z

Sorry @aacovski just noticed one last thing, do you mind switching the generic Exceptions to ValueError? Also, let me know if you have an updated summary you'd like to go into the commit message, otherwise I'll just merge it with "Add vectorized array support to the intersect() method with significant performance improvements for large datasets."

Replace generic Exception with more specific ValueError for out-of-bounds coordinates in StructuredGrid and VertexGrid intersect() methods. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…intersect Complete the refactoring by replacing the remaining generic Exception with ValueError in the UnstructuredGrid intersect() method. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

aacovski · 2025-11-16T01:23:25Z

Sorry @aacovski just noticed one last thing

No problem, I just wanted to remain consistent. I wouldn't say 2x speedup is very significant though. We'll unlock significant once we have kdtree (or numba_celltree if we want to introduce another library, but I don't think it's necessary).

aacovski marked this pull request as draft November 4, 2025 20:38

jdhughes-dev requested a review from wpbonelli November 4, 2025 22:13

wpbonelli reviewed Nov 7, 2025

View reviewed changes

aacovski mentioned this pull request Nov 7, 2025

bug: intersect method of gridintersect class not working #2603

Closed

wpbonelli mentioned this pull request Nov 8, 2025

API changes and renaming modflowpy/pyphoenix-project#268

Open

aacovski changed the title ~~feat: add numpy array support and optimize UnstructuredGrid.intersect()~~ feat: add numpy array support to .intersect() Nov 9, 2025

aacovski marked this pull request as ready for review November 9, 2025 15:11

adacovsk and others added 2 commits November 9, 2025 15:37

style: fix ruff line length errors

9a8d2cd

Split long exception messages across multiple lines to comply with 88-character line limit. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

dbrakenhoff reviewed Nov 10, 2025

View reviewed changes

Comment thread autotest/test_grid.py Outdated

dbrakenhoff reviewed Nov 10, 2025

View reviewed changes

Comment thread flopy/discretization/structuredgrid.py Outdated

dbrakenhoff approved these changes Nov 10, 2025

View reviewed changes

dbrakenhoff mentioned this pull request Nov 10, 2025

feature: support array with shapely geometries in GeospatialUtils #2649

Closed

adacovsk and others added 2 commits November 10, 2025 18:55

add fixture

9cefd91

dbrakenhoff approved these changes Nov 11, 2025

View reviewed changes

wpbonelli approved these changes Nov 13, 2025

View reviewed changes

wpbonelli requested a review from jlarsen-usgs November 13, 2025 17:11

jlarsen-usgs approved these changes Nov 13, 2025

View reviewed changes

adacovsk and others added 2 commits November 15, 2025 18:14

wpbonelli merged commit 2f24768 into modflowpy:develop Nov 17, 2025
20 checks passed

aacovski deleted the feat/vectorized-intersect branch November 18, 2025 02:56

Conversation

aacovski commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Improvements

Uh oh!

codecov Bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wpbonelli left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aacovski commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wpbonelli commented Nov 7, 2025

Uh oh!

aacovski commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dbrakenhoff commented Nov 8, 2025

Uh oh!

adacovsk commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dbrakenhoff Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dbrakenhoff left a comment

Choose a reason for hiding this comment

Uh oh!

dbrakenhoff commented Nov 10, 2025

Uh oh!

wpbonelli left a comment

Choose a reason for hiding this comment

Uh oh!

jlarsen-usgs left a comment

Choose a reason for hiding this comment

Uh oh!

wpbonelli commented Nov 14, 2025

Uh oh!

dbrakenhoff commented Nov 14, 2025

Uh oh!

wpbonelli commented Nov 14, 2025

Uh oh!

dbrakenhoff commented Nov 14, 2025

Uh oh!

wpbonelli commented Nov 14, 2025

Uh oh!

wpbonelli commented Nov 14, 2025

Uh oh!

aacovski commented Nov 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

aacovski commented Nov 4, 2025 •

edited

Loading

codecov Bot commented Nov 4, 2025 •

edited

Loading

wpbonelli left a comment •

edited

Loading

aacovski commented Nov 7, 2025 •

edited

Loading

aacovski commented Nov 7, 2025 •

edited

Loading

adacovsk commented Nov 9, 2025 •

edited

Loading