Optimized Bounds Construction #1205

philipc2 · 2025-04-09T14:34:21Z

Overview

Optimized the Face Bounds construction
Moved helper functions associated with Bounds construction to uxarray.grid.bounds (originally in uxarray.grid.geometry), with the future intention of having separate modules under a uxarray.geometry module.

Timings

The following timings were taken on a single NCAR Derecho CPU node (256 threads, 256GB memory)

Resolution	Nodes	Faces	Edges	Latest Implementation in Main (s)	Implementation in this PR (s)
30km	1,310,720	655,362	1,966,080	2.541	0.221
15km	5,242,880	2,621,442	7,864,320	10.191	0.451
7.5km	20,971,520	10,485,762	31,457,280	37.866	1.47
3.75km	83,886,080	41,943,042	125,829,120	155.486	5.736

For the 3.75km grid, below is the CPU vs Wall Time

Before

CPU times: user 20min 15s, sys: 17.1 s, total: 20min 32s
Wall time: 2min 44s

After

CPU times: user 19min 18s, sys: 3.61 s, total: 19min 22s
Wall time: 5.77 s

About a 200x parallel speedup.

aaronzedwick · 2025-04-14T16:20:46Z

Wow, that looks like some crazy good performance improvements! Nice work! This is great, will start reviewing it.

Copilot

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

pyproject.toml:43

There is a trailing space in the xarray dependency version specifier; consider removing it for consistency.

  "xarray>=2024.11.0 "

uxarray/grid/grid.py:1989

[nitpick] Review the updated normalization logic to ensure that in-place division of xarray DataArrays behaves as expected, especially with regard to type handling or potential division by zero scenarios.

norm = xr.ufuncs.sqrt(self.node_x**2 + self.node_y**2 + self.node_z**2)

Copilot

Pull Request Overview

This PR optimizes the bounds construction for grid faces while reorganizing related helper functions. Key changes include:

Optimized Face Bounds construction with a significant performance boost.
Moved bounds-related helper functions from uxarray/grid/geometry.py to uxarray/grid/bounds.py.
Updated tests and dependency definitions to reflect the refactor and new xarray version requirement.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
uxarray/grid/validation.py	Replaced inline normalization checks with a numba-enabled helper.
uxarray/grid/utils.py	Introduced new numba functions for edge coordinate conversion.
uxarray/grid/grid.py	Refactored bounds function and updated coordinate normalization.
uxarray/grid/coordinates.py	Removed the redundant _normalize_xyz function.
test/test_geometry.py	Updated tests to use the new bounds helper functions in grid/bounds.py.
pyproject.toml	Updated the xarray dependency to a newer version.

uxarray/grid/grid.py

uxarray/grid/bounds.py

uxarray/grid/utils.py

erogluorhan

This is great optimization overall, and I am inclined to approve it soon. Just a few inline comments throughout in addition to the following questions:

CPU times: user 19min 18s, sys: 3.61 s, total: 19min 22s
Wall time: 5.77 s

I'd be curious about the optimization on a personal laptop-like device. Looks like from the above numbers, Numba was able to benefit from a lot of CPU threads on Derecho, right?

pyproject.toml

uxarray/grid/grid.py

uxarray/grid/validation.py

erogluorhan

Great optimization!

philipc2 · 2025-04-21T15:11:04Z

@erogluorhan

For a 7.5km grid on my M1 Macbook Pro (10 cores)

CPU times: user 3min 5s, sys: 732 ms, total: 3min 6s
Wall time: 20.5 s

About a 9x speedup.

github-actions · 2025-04-21T15:37:17Z

ASV Benchmarking

Benchmark Comparison Results

Benchmarks that have improved:

Change	Before [`68d14f3`]	After [`f95a998`]	Ratio	Benchmark (Parameter)
-	768M	440M	0.57	face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
	failed	870M	n/a	face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
-	20.2±0.2ms	17.8±0.09ms	0.88	face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
-	7.59±0.06ms	3.28±0.03ms	0.43	face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
-	44.5±0.1ms	17.4±0.2ms	0.39	face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
-	4.02±1ms	2.01±0.01ms	0.50	face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
-	748±9μs	206±1μs	0.28	mpas_ocean.CheckNorm.time_check_norm('120km')
-	500±10μs	91.5±0.8μs	0.18	mpas_ocean.CheckNorm.time_check_norm('480km')
-	793±8ns	717±1ns	0.90	mpas_ocean.ConstructTreeStructures.time_ball_tree('120km')
-	499M	389M	0.78	mpas_ocean.Integrate.peakmem_integrate('480km')

Benchmarks that have stayed the same:

Before [`68d14f3`]	After [`f95a998`]	Ratio	Benchmark (Parameter)
435M	439M	1.01	face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
465M	469M	1.01	face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
7.99±0.04s	7.86±0.03s	0.98	import.Imports.timeraw_import_uxarray
673±10ms	656±6ms	0.98	mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('120km')
43.3±2ms	42.4±0.5ms	0.98	mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('480km')
5.66±0.06μs	5.61±0.02μs	0.99	mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('120km')
5.57±0.03μs	5.67±0.05μs	1.02	mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('480km')
5.14±0.08ms	5.14±0.07ms	1	mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('120km')
3.88±0.02ms	3.80±0.01ms	0.98	mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('480km')
3.53±0.01s	3.51±0.01s	0.99	mpas_ocean.ConstructFaceLatLon.time_welzl('120km')
225±2ms	225±2ms	1	mpas_ocean.ConstructFaceLatLon.time_welzl('480km')
299±10ns	292±1ns	0.98	mpas_ocean.ConstructTreeStructures.time_ball_tree('480km')
566±10ns	554±2ns	0.98	mpas_ocean.ConstructTreeStructures.time_kd_tree('120km')
306±20ns	298±6ns	0.97	mpas_ocean.ConstructTreeStructures.time_kd_tree('480km')
454±3ms	455±4ms	1	mpas_ocean.CrossSections.time_const_lat('120km', 1)
231±3ms	232±3ms	1	mpas_ocean.CrossSections.time_const_lat('120km', 2)
119±2ms	121±3ms	1.02	mpas_ocean.CrossSections.time_const_lat('120km', 4)
378±3ms	380±6ms	1	mpas_ocean.CrossSections.time_const_lat('480km', 1)
190±1ms	192±0.7ms	1.01	mpas_ocean.CrossSections.time_const_lat('480km', 2)
98.2±0.6ms	99.0±0.8ms	1.01	mpas_ocean.CrossSections.time_const_lat('480km', 4)
126±0.6ms	128±0.3ms	1.02	mpas_ocean.DualMesh.time_dual_mesh_construction('120km')
9.74±0.3ms	9.60±0.1ms	0.99	mpas_ocean.DualMesh.time_dual_mesh_construction('480km')
1.64±0.04s	1.61±0.01s	0.98	mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', False)
1.29±0.03ms	1.27±0.02ms	0.98	mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', True)
128±2ms	126±2ms	0.98	mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', False)
5.40±0.1ms	5.33±0.2ms	0.99	mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', True)
394M	394M	1	mpas_ocean.Gradient.peakmem_gradient('120km')
376M	376M	1	mpas_ocean.Gradient.peakmem_gradient('480km')
2.73±0.02ms	2.74±0.02ms	1	mpas_ocean.Gradient.time_gradient('120km')
315±3μs	311±1μs	0.99	mpas_ocean.Gradient.time_gradient('480km')
211±1μs	212±2μs	1	mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('120km')
119±1μs	116±0.5μs	0.97	mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('480km')
404M	404M	1	mpas_ocean.Integrate.peakmem_integrate('120km')
148±1ms	146±2ms	0.98	mpas_ocean.Integrate.time_integrate('120km')
10.4±0.3ms	10.4±0.3ms	1	mpas_ocean.Integrate.time_integrate('480km')
348±2ms	344±1ms	0.99	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'exclude')
348±0.8ms	347±2ms	1	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'include')
348±2ms	347±3ms	1	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'split')
22.8±0.5ms	22.3±0.2ms	0.98	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'exclude')
23.1±0.2ms	22.4±0.4ms	0.97	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'include')
22.8±0.2ms	22.2±0.2ms	0.97	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'split')
4.51±0.02ms	4.51±0.03ms	1	mpas_ocean.PointInPolygon.time_face_search('120km')
4.64±0.01ms	4.67±0.05ms	1.01	mpas_ocean.PointInPolygon.time_face_search('480km')
55.6±0.4ms	55.3±0.3ms	0.99	mpas_ocean.RemapDownsample.time_inverse_distance_weighted_remapping
45.2±0.3ms	44.7±0.2ms	0.99	mpas_ocean.RemapDownsample.time_nearest_neighbor_remapping
354±2ms	356±1ms	1	mpas_ocean.RemapUpsample.time_inverse_distance_weighted_remapping
261±1ms	260±0.8ms	0.99	mpas_ocean.RemapUpsample.time_nearest_neighbor_remapping
25.3±0.2ms	27.2±0.4ms	1.07	mpas_ocean.ZonalAverage.time_zonal_average('120km')
4.87±0.07ms	4.67±0.01ms	0.96	mpas_ocean.ZonalAverage.time_zonal_average('480km')
375M	376M	1	quad_hexagon.QuadHexagon.peakmem_open_dataset
372M	372M	1	quad_hexagon.QuadHexagon.peakmem_open_grid
7.44±0.07ms	7.40±0.03ms	1	quad_hexagon.QuadHexagon.time_open_dataset
6.40±0.06ms	6.50±0.04ms	1.02	quad_hexagon.QuadHexagon.time_open_grid

erogluorhan · 2025-04-21T16:24:44Z

@erogluorhan

For a 7.5km grid on my M1 Macbook Pro (10 cores)
CPU times: user 3min 5s, sys: 732 ms, total: 3min 6s
Wall time: 20.5 s
About a 9x speedup.

Looks good! Thanks for running this!

AFAIK, M1 does not have multithreading (i.e. it has one thread per each core), and your device probably has 8 cores, hence this speedup of around 9x I believe.

Also, such speedups with multithreading, especially on HPC clusters, is very powerful, and it is worth to review our documentation to better emphasize this for the user.

philipc2 · 2025-04-21T16:31:22Z

@erogluorhan
For a 7.5km grid on my M1 Macbook Pro (10 cores)
CPU times: user 3min 5s, sys: 732 ms, total: 3min 6s
Wall time: 20.5 s
About a 9x speedup.
Looks good! Thanks for running this!

AFAIK, M1 does not have multithreading (i.e. it has one thread per each core), and your device probably has 8 cores, hence this speedup of around 9x I believe.

Also, such speedups with multithreading, especially on HPC clusters, is very powerful, and it is worth to review our documentation to better emphasize this for the user.

The M1 Pro has 8 perfromance and 2 efficiency cores. I think the regular M1 has only 8 total

Also, such speedups with multithreading, especially on HPC clusters, is very powerful, and it is worth to review our documentation to better emphasize this for the user.

This is a good idea! I could add to the Grid.bounds docstring.

into bounds-optimization

review-notebook-app · 2025-04-23T02:57:18Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Copilot

Pull Request Overview

This PR optimizes the face bounds construction and consolidates related helper functions by moving them from the geometry module to a new bounds module while renaming several utility functions. Key changes include the renaming of key helper functions (e.g. _get_cartesian_face_edge_nodes to _get_cartesian_face_edge_nodes_array), the update of the normalization check using xr.ufuncs with .compute(), and updates in grid and test files to reflect these function name changes.

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
uxarray/grid/validation.py	Updated the normalization check to use xr.ufuncs.abs and .compute().
uxarray/grid/utils.py	Renamed and added numba-compiled helper functions for face edge nodes.
uxarray/grid/grid.py	Updated calls to renamed helpers and switched to using _populate_face_bounds.
uxarray/grid/coordinates.py	Removed the standalone _normalize_xyz function.
uxarray/core/zonal.py	Updated invocations to the new face edge helper name.
Test files	Adjusted function references to match the new helper function names.

Comments suppressed due to low confidence (3)

uxarray/grid/grid.py:1431

[nitpick] Double-check that the renaming to '_populate_face_bounds' is fully integrated with the numba function caching mechanism to ensure that the function is recognized and compiled as expected.

if not is_numba_function_cached(_populate_face_bounds):

uxarray/grid/validation.py:120

Using '.compute()' directly here may introduce unnecessary overhead if the inputs are already in-memory arrays; consider verifying whether the inputs are dask arrays and remove or conditionally apply .compute() to optimize performance.

max_dev = xr.ufuncs.abs(x**2 + y**2 + z**2 - 1.0).max().compute()

uxarray/grid/utils.py:325

Ensure that 'njit' is properly imported from 'numba' so that these decorated functions compile correctly at runtime.

@njit(cache=True)

into bounds-optimization

Copilot

Pull Request Overview

This PR optimizes the face bounds construction and cleans up helper functions for improved clarity and performance. Key changes include:

Refactoring and optimizing the grid normalization and bounds-checking logic.
Renaming and migrating helper functions from uxarray.grid.geometry to uxarray.grid.bounds and updating their usage.
Updating test cases accordingly to support the new function names and implementations.

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
uxarray/grid/validation.py	Refactored normalization check using a loop
uxarray/grid/utils.py	Renamed and added new numba-accelerated helper functions
uxarray/grid/grid.py	Updated bounds logic and normalization implementation
uxarray/grid/coordinates.py	Removed legacy normalization function
uxarray/core/zonal.py	Updated references to the renamed helper function
test/*	Revised tests to use updated function names and bounds

Comments suppressed due to low confidence (1)

uxarray/grid/utils.py:141

[nitpick] The new function name '_get_cartesian_face_edge_nodes_array' is more verbose than its previous version; consider consolidating the naming convention, and if the non-suffixed version is deprecated, add a deprecation notice to assist future maintenance.

def _get_cartesian_face_edge_nodes_array(

uxarray/grid/validation.py

optimize bounds and face edge node construction

24fe42d

philipc2 self-assigned this Apr 9, 2025

philipc2 added 4 commits April 9, 2025 22:34

add parallel norms

4687b69

use xarray

2447702

update norm

a5ccc44

parallel norm

5adf5b6

philipc2 marked this pull request as draft April 10, 2025 15:23

philipc2 added the scalability Related to scalability & performance efforts label Apr 10, 2025

philipc2 mentioned this pull request Apr 10, 2025

Use n_max_face_nodes instead of n_max_face_edges in Bounds construction #1203

Closed

philipc2 and others added 3 commits April 10, 2025 10:43

coordinate optimization

1f4c11d

remove comment

6c4af23

Merge branch 'main' into bounds-optimization

6398e02

philipc2 mentioned this pull request Apr 10, 2025

DRAFT: Optimized Non-Conservative Zonal Average #1180

Closed

14 tasks

philipc2 and others added 6 commits April 10, 2025 12:36

add array types to norm signature

c1564e0

Merge branch 'main' into bounds-optimization

83e0974

Use xarray ufuncs to normalize coordinates, set min xarray version

6aa9c2c

correct pyproject.toml

926d4a7

docstring cleanup

cd8fc28

re-add numba cache check

5e4c030

philipc2 marked this pull request as ready for review April 14, 2025 16:14

philipc2 requested review from aaronzedwick, erogluorhan and hongyuchen1030 April 14, 2025 16:15

philipc2 changed the title ~~DRAFT: Optimized Bounds Construction~~ Optimized Bounds Construction Apr 14, 2025

philipc2 requested a review from Copilot April 14, 2025 18:02

Copilot AI reviewed Apr 14, 2025

View reviewed changes

philipc2 and others added 3 commits April 14, 2025 13:03

remove space after version pin

cdcb173

Merge branch 'main' into bounds-optimization

907b8ec

merge main

fb39036

update function names

6823f7b

philipc2 requested a review from Copilot April 17, 2025 22:57

Copilot AI reviewed Apr 17, 2025

View reviewed changes

uxarray/grid/grid.py Outdated Show resolved Hide resolved

hongyuchen1030 reviewed Apr 17, 2025

View reviewed changes

uxarray/grid/bounds.py Show resolved Hide resolved

hongyuchen1030 reviewed Apr 17, 2025

View reviewed changes

uxarray/grid/bounds.py Show resolved Hide resolved

hongyuchen1030 reviewed Apr 17, 2025

View reviewed changes

uxarray/grid/utils.py Outdated Show resolved Hide resolved

hongyuchen1030 reviewed Apr 17, 2025

View reviewed changes

uxarray/grid/utils.py Outdated Show resolved Hide resolved

erogluorhan reviewed Apr 17, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

uxarray/grid/grid.py Outdated Show resolved Hide resolved

uxarray/grid/validation.py Outdated Show resolved Hide resolved

erogluorhan self-requested a review April 18, 2025 18:12

erogluorhan approved these changes Apr 18, 2025

View reviewed changes

philipc2 added the run-benchmark Run ASV benchmark workflow label Apr 21, 2025

philipc2 removed the run-benchmark Run ASV benchmark workflow label Apr 21, 2025

philipc2 and others added 3 commits April 22, 2025 21:46

Merge branch 'main' into bounds-optimization

21b4cbf

docstrings, update function names, unpin xarray

3f35af6

Merge branch 'bounds-optimization' of https://github.com/UXARRAY/uxarray

849625f

into bounds-optimization

philipc2 and others added 2 commits April 22, 2025 22:08

clean notebook, update util name

a608439

Merge branch 'main' into bounds-optimization

9dc18c3

philipc2 requested a review from Copilot April 23, 2025 12:35

Copilot AI reviewed Apr 23, 2025

View reviewed changes

philipc2 added 2 commits April 23, 2025 07:42

use regular abs, update numba cache check

1405f61

Merge branch 'bounds-optimization' of https://github.com/UXARRAY/uxarray

ebd0627

into bounds-optimization

philipc2 requested a review from Copilot April 23, 2025 12:42

Copilot AI reviewed Apr 23, 2025

View reviewed changes

uxarray/grid/validation.py Show resolved Hide resolved

philipc2 merged commit 94aa3cf into main Apr 23, 2025
20 checks passed

erogluorhan deleted the bounds-optimization branch September 26, 2025 17:49

Optimized Bounds Construction #1205

Optimized Bounds Construction #1205

Uh oh!

Conversation

philipc2 commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Timings

Before

After

Uh oh!

aaronzedwick commented Apr 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erogluorhan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erogluorhan left a comment

Choose a reason for hiding this comment

Uh oh!

philipc2 commented Apr 21, 2025

Uh oh!

github-actions bot commented Apr 21, 2025

ASV Benchmarking

Uh oh!

erogluorhan commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philipc2 commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Apr 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

philipc2 commented Apr 9, 2025 •

edited

Loading

erogluorhan commented Apr 21, 2025 •

edited

Loading

philipc2 commented Apr 21, 2025 •

edited

Loading