Skip to content

Conversation

@philipc2
Copy link
Member

@philipc2 philipc2 commented Apr 9, 2025

Closes #1201 #1200

Overview

  • Optimized the Face Bounds construction
  • Moved helper functions associated with Bounds construction to uxarray.grid.bounds (originally in uxarray.grid.geometry), with the future intention of having separate modules under a uxarray.geometry module.

Timings

The following timings were taken on a single NCAR Derecho CPU node (256 threads, 256GB memory)

Resolution Nodes Faces Edges Latest Implementation in Main (s) Implementation in this PR (s)
30km 1,310,720 655,362 1,966,080 2.541 0.221
15km 5,242,880 2,621,442 7,864,320 10.191 0.451
7.5km 20,971,520 10,485,762 31,457,280 37.866 1.47
3.75km 83,886,080 41,943,042 125,829,120 155.486 5.736

For the 3.75km grid, below is the CPU vs Wall Time

Before

CPU times: user 20min 15s, sys: 17.1 s, total: 20min 32s
Wall time: 2min 44s

After

CPU times: user 19min 18s, sys: 3.61 s, total: 19min 22s
Wall time: 5.77 s

About a 200x parallel speedup.

@philipc2 philipc2 self-assigned this Apr 9, 2025
@philipc2 philipc2 marked this pull request as draft April 10, 2025 15:23
@philipc2 philipc2 added the scalability Related to scalability & performance efforts label Apr 10, 2025
@philipc2 philipc2 marked this pull request as ready for review April 14, 2025 16:14
@aaronzedwick
Copy link
Member

Wow, that looks like some crazy good performance improvements! Nice work! This is great, will start reviewing it.

@philipc2 philipc2 changed the title DRAFT: Optimized Bounds Construction Optimized Bounds Construction Apr 14, 2025
@philipc2 philipc2 requested a review from Copilot April 14, 2025 18:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

pyproject.toml:43

  • There is a trailing space in the xarray dependency version specifier; consider removing it for consistency.
  "xarray>=2024.11.0 "

uxarray/grid/grid.py:1989

  • [nitpick] Review the updated normalization logic to ensure that in-place division of xarray DataArrays behaves as expected, especially with regard to type handling or potential division by zero scenarios.
norm = xr.ufuncs.sqrt(self.node_x**2 + self.node_y**2 + self.node_z**2)

@philipc2 philipc2 requested a review from Copilot April 17, 2025 22:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the bounds construction for grid faces while reorganizing related helper functions. Key changes include:

  • Optimized Face Bounds construction with a significant performance boost.
  • Moved bounds-related helper functions from uxarray/grid/geometry.py to uxarray/grid/bounds.py.
  • Updated tests and dependency definitions to reflect the refactor and new xarray version requirement.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
uxarray/grid/validation.py Replaced inline normalization checks with a numba-enabled helper.
uxarray/grid/utils.py Introduced new numba functions for edge coordinate conversion.
uxarray/grid/grid.py Refactored bounds function and updated coordinate normalization.
uxarray/grid/coordinates.py Removed the redundant _normalize_xyz function.
test/test_geometry.py Updated tests to use the new bounds helper functions in grid/bounds.py.
pyproject.toml Updated the xarray dependency to a newer version.

Copy link
Member

@erogluorhan erogluorhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great optimization overall, and I am inclined to approve it soon. Just a few inline comments throughout in addition to the following questions:

CPU times: user 19min 18s, sys: 3.61 s, total: 19min 22s
Wall time: 5.77 s

I'd be curious about the optimization on a personal laptop-like device. Looks like from the above numbers, Numba was able to benefit from a lot of CPU threads on Derecho, right?

@erogluorhan erogluorhan self-requested a review April 18, 2025 18:12
Copy link
Member

@erogluorhan erogluorhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great optimization!

@philipc2
Copy link
Member Author

@erogluorhan

For a 7.5km grid on my M1 Macbook Pro (10 cores)

CPU times: user 3min 5s, sys: 732 ms, total: 3min 6s
Wall time: 20.5 s

About a 9x speedup.

@philipc2 philipc2 added the run-benchmark Run ASV benchmark workflow label Apr 21, 2025
@github-actions
Copy link

ASV Benchmarking

Benchmark Comparison Results

Benchmarks that have improved:

Change Before [68d14f3] After [f95a998] Ratio Benchmark (Parameter)
- 768M 440M 0.57 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
failed 870M n/a face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
- 20.2±0.2ms 17.8±0.09ms 0.88 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
- 7.59±0.06ms 3.28±0.03ms 0.43 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
- 44.5±0.1ms 17.4±0.2ms 0.39 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
- 4.02±1ms 2.01±0.01ms 0.50 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
- 748±9μs 206±1μs 0.28 mpas_ocean.CheckNorm.time_check_norm('120km')
- 500±10μs 91.5±0.8μs 0.18 mpas_ocean.CheckNorm.time_check_norm('480km')
- 793±8ns 717±1ns 0.90 mpas_ocean.ConstructTreeStructures.time_ball_tree('120km')
- 499M 389M 0.78 mpas_ocean.Integrate.peakmem_integrate('480km')

Benchmarks that have stayed the same:

Change Before [68d14f3] After [f95a998] Ratio Benchmark (Parameter)
435M 439M 1.01 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
465M 469M 1.01 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
7.99±0.04s 7.86±0.03s 0.98 import.Imports.timeraw_import_uxarray
673±10ms 656±6ms 0.98 mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('120km')
43.3±2ms 42.4±0.5ms 0.98 mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('480km')
5.66±0.06μs 5.61±0.02μs 0.99 mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('120km')
5.57±0.03μs 5.67±0.05μs 1.02 mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('480km')
5.14±0.08ms 5.14±0.07ms 1 mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('120km')
3.88±0.02ms 3.80±0.01ms 0.98 mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('480km')
3.53±0.01s 3.51±0.01s 0.99 mpas_ocean.ConstructFaceLatLon.time_welzl('120km')
225±2ms 225±2ms 1 mpas_ocean.ConstructFaceLatLon.time_welzl('480km')
299±10ns 292±1ns 0.98 mpas_ocean.ConstructTreeStructures.time_ball_tree('480km')
566±10ns 554±2ns 0.98 mpas_ocean.ConstructTreeStructures.time_kd_tree('120km')
306±20ns 298±6ns 0.97 mpas_ocean.ConstructTreeStructures.time_kd_tree('480km')
454±3ms 455±4ms 1 mpas_ocean.CrossSections.time_const_lat('120km', 1)
231±3ms 232±3ms 1 mpas_ocean.CrossSections.time_const_lat('120km', 2)
119±2ms 121±3ms 1.02 mpas_ocean.CrossSections.time_const_lat('120km', 4)
378±3ms 380±6ms 1 mpas_ocean.CrossSections.time_const_lat('480km', 1)
190±1ms 192±0.7ms 1.01 mpas_ocean.CrossSections.time_const_lat('480km', 2)
98.2±0.6ms 99.0±0.8ms 1.01 mpas_ocean.CrossSections.time_const_lat('480km', 4)
126±0.6ms 128±0.3ms 1.02 mpas_ocean.DualMesh.time_dual_mesh_construction('120km')
9.74±0.3ms 9.60±0.1ms 0.99 mpas_ocean.DualMesh.time_dual_mesh_construction('480km')
1.64±0.04s 1.61±0.01s 0.98 mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', False)
1.29±0.03ms 1.27±0.02ms 0.98 mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', True)
128±2ms 126±2ms 0.98 mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', False)
5.40±0.1ms 5.33±0.2ms 0.99 mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', True)
394M 394M 1 mpas_ocean.Gradient.peakmem_gradient('120km')
376M 376M 1 mpas_ocean.Gradient.peakmem_gradient('480km')
2.73±0.02ms 2.74±0.02ms 1 mpas_ocean.Gradient.time_gradient('120km')
315±3μs 311±1μs 0.99 mpas_ocean.Gradient.time_gradient('480km')
211±1μs 212±2μs 1 mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('120km')
119±1μs 116±0.5μs 0.97 mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('480km')
404M 404M 1 mpas_ocean.Integrate.peakmem_integrate('120km')
148±1ms 146±2ms 0.98 mpas_ocean.Integrate.time_integrate('120km')
10.4±0.3ms 10.4±0.3ms 1 mpas_ocean.Integrate.time_integrate('480km')
348±2ms 344±1ms 0.99 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'exclude')
348±0.8ms 347±2ms 1 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'include')
348±2ms 347±3ms 1 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'split')
22.8±0.5ms 22.3±0.2ms 0.98 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'exclude')
23.1±0.2ms 22.4±0.4ms 0.97 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'include')
22.8±0.2ms 22.2±0.2ms 0.97 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'split')
4.51±0.02ms 4.51±0.03ms 1 mpas_ocean.PointInPolygon.time_face_search('120km')
4.64±0.01ms 4.67±0.05ms 1.01 mpas_ocean.PointInPolygon.time_face_search('480km')
55.6±0.4ms 55.3±0.3ms 0.99 mpas_ocean.RemapDownsample.time_inverse_distance_weighted_remapping
45.2±0.3ms 44.7±0.2ms 0.99 mpas_ocean.RemapDownsample.time_nearest_neighbor_remapping
354±2ms 356±1ms 1 mpas_ocean.RemapUpsample.time_inverse_distance_weighted_remapping
261±1ms 260±0.8ms 0.99 mpas_ocean.RemapUpsample.time_nearest_neighbor_remapping
25.3±0.2ms 27.2±0.4ms 1.07 mpas_ocean.ZonalAverage.time_zonal_average('120km')
4.87±0.07ms 4.67±0.01ms 0.96 mpas_ocean.ZonalAverage.time_zonal_average('480km')
375M 376M 1 quad_hexagon.QuadHexagon.peakmem_open_dataset
372M 372M 1 quad_hexagon.QuadHexagon.peakmem_open_grid
7.44±0.07ms 7.40±0.03ms 1 quad_hexagon.QuadHexagon.time_open_dataset
6.40±0.06ms 6.50±0.04ms 1.02 quad_hexagon.QuadHexagon.time_open_grid

@philipc2 philipc2 removed the run-benchmark Run ASV benchmark workflow label Apr 21, 2025
@erogluorhan
Copy link
Member

erogluorhan commented Apr 21, 2025

@erogluorhan

For a 7.5km grid on my M1 Macbook Pro (10 cores)

CPU times: user 3min 5s, sys: 732 ms, total: 3min 6s
Wall time: 20.5 s

About a 9x speedup.

Looks good! Thanks for running this!

AFAIK, M1 does not have multithreading (i.e. it has one thread per each core), and your device probably has 8 cores, hence this speedup of around 9x I believe.

Also, such speedups with multithreading, especially on HPC clusters, is very powerful, and it is worth to review our documentation to better emphasize this for the user.

@philipc2
Copy link
Member Author

philipc2 commented Apr 21, 2025

@erogluorhan
For a 7.5km grid on my M1 Macbook Pro (10 cores)

CPU times: user 3min 5s, sys: 732 ms, total: 3min 6s
Wall time: 20.5 s

About a 9x speedup.

Looks good! Thanks for running this!

AFAIK, M1 does not have multithreading (i.e. it has one thread per each core), and your device probably has 8 cores, hence this speedup of around 9x I believe.

Also, such speedups with multithreading, especially on HPC clusters, is very powerful, and it is worth to review our documentation to better emphasize this for the user.

The M1 Pro has 8 perfromance and 2 efficiency cores. I think the regular M1 has only 8 total

Also, such speedups with multithreading, especially on HPC clusters, is very powerful, and it is worth to review our documentation to better emphasize this for the user.

This is a good idea! I could add to the Grid.bounds docstring.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@philipc2 philipc2 requested a review from Copilot April 23, 2025 12:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the face bounds construction and consolidates related helper functions by moving them from the geometry module to a new bounds module while renaming several utility functions. Key changes include the renaming of key helper functions (e.g. _get_cartesian_face_edge_nodes to _get_cartesian_face_edge_nodes_array), the update of the normalization check using xr.ufuncs with .compute(), and updates in grid and test files to reflect these function name changes.

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.

Show a summary per file
File Description
uxarray/grid/validation.py Updated the normalization check to use xr.ufuncs.abs and .compute().
uxarray/grid/utils.py Renamed and added numba-compiled helper functions for face edge nodes.
uxarray/grid/grid.py Updated calls to renamed helpers and switched to using _populate_face_bounds.
uxarray/grid/coordinates.py Removed the standalone _normalize_xyz function.
uxarray/core/zonal.py Updated invocations to the new face edge helper name.
Test files Adjusted function references to match the new helper function names.
Comments suppressed due to low confidence (3)

uxarray/grid/grid.py:1431

  • [nitpick] Double-check that the renaming to '_populate_face_bounds' is fully integrated with the numba function caching mechanism to ensure that the function is recognized and compiled as expected.
if not is_numba_function_cached(_populate_face_bounds):

uxarray/grid/validation.py:120

  • Using '.compute()' directly here may introduce unnecessary overhead if the inputs are already in-memory arrays; consider verifying whether the inputs are dask arrays and remove or conditionally apply .compute() to optimize performance.
max_dev = xr.ufuncs.abs(x**2 + y**2 + z**2 - 1.0).max().compute()

uxarray/grid/utils.py:325

  • Ensure that 'njit' is properly imported from 'numba' so that these decorated functions compile correctly at runtime.
@njit(cache=True)

@philipc2 philipc2 requested a review from Copilot April 23, 2025 12:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the face bounds construction and cleans up helper functions for improved clarity and performance. Key changes include:

  • Refactoring and optimizing the grid normalization and bounds-checking logic.
  • Renaming and migrating helper functions from uxarray.grid.geometry to uxarray.grid.bounds and updating their usage.
  • Updating test cases accordingly to support the new function names and implementations.

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
uxarray/grid/validation.py Refactored normalization check using a loop
uxarray/grid/utils.py Renamed and added new numba-accelerated helper functions
uxarray/grid/grid.py Updated bounds logic and normalization implementation
uxarray/grid/coordinates.py Removed legacy normalization function
uxarray/core/zonal.py Updated references to the renamed helper function
test/* Revised tests to use updated function names and bounds
Comments suppressed due to low confidence (1)

uxarray/grid/utils.py:141

  • [nitpick] The new function name '_get_cartesian_face_edge_nodes_array' is more verbose than its previous version; consider consolidating the naming convention, and if the non-suffixed version is deprecated, add a deprecation notice to assist future maintenance.
def _get_cartesian_face_edge_nodes_array(

@philipc2 philipc2 merged commit 94aa3cf into main Apr 23, 2025
20 checks passed
@erogluorhan erogluorhan deleted the bounds-optimization branch September 26, 2025 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

scalability Related to scalability & performance efforts

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Further Optimize Bounds Counstriction

6 participants