-
Notifications
You must be signed in to change notification settings - Fork 45
Optimized Bounds Construction #1205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Wow, that looks like some crazy good performance improvements! Nice work! This is great, will start reviewing it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (2)
pyproject.toml:43
- There is a trailing space in the xarray dependency version specifier; consider removing it for consistency.
"xarray>=2024.11.0 "
uxarray/grid/grid.py:1989
- [nitpick] Review the updated normalization logic to ensure that in-place division of xarray DataArrays behaves as expected, especially with regard to type handling or potential division by zero scenarios.
norm = xr.ufuncs.sqrt(self.node_x**2 + self.node_y**2 + self.node_z**2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the bounds construction for grid faces while reorganizing related helper functions. Key changes include:
- Optimized Face Bounds construction with a significant performance boost.
- Moved bounds-related helper functions from uxarray/grid/geometry.py to uxarray/grid/bounds.py.
- Updated tests and dependency definitions to reflect the refactor and new xarray version requirement.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| uxarray/grid/validation.py | Replaced inline normalization checks with a numba-enabled helper. |
| uxarray/grid/utils.py | Introduced new numba functions for edge coordinate conversion. |
| uxarray/grid/grid.py | Refactored bounds function and updated coordinate normalization. |
| uxarray/grid/coordinates.py | Removed the redundant _normalize_xyz function. |
| test/test_geometry.py | Updated tests to use the new bounds helper functions in grid/bounds.py. |
| pyproject.toml | Updated the xarray dependency to a newer version. |
erogluorhan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great optimization overall, and I am inclined to approve it soon. Just a few inline comments throughout in addition to the following questions:
CPU times: user 19min 18s, sys: 3.61 s, total: 19min 22s
Wall time: 5.77 s
I'd be curious about the optimization on a personal laptop-like device. Looks like from the above numbers, Numba was able to benefit from a lot of CPU threads on Derecho, right?
erogluorhan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great optimization!
|
For a About a |
ASV BenchmarkingBenchmark Comparison ResultsBenchmarks that have improved:
Benchmarks that have stayed the same:
|
Looks good! Thanks for running this! AFAIK, M1 does not have multithreading (i.e. it has one thread per each core), and your device probably has 8 cores, hence this speedup of around 9x I believe. Also, such speedups with multithreading, especially on HPC clusters, is very powerful, and it is worth to review our documentation to better emphasize this for the user. |
The M1 Pro has 8 perfromance and 2 efficiency cores. I think the regular M1 has only 8 total
This is a good idea! I could add to the |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the face bounds construction and consolidates related helper functions by moving them from the geometry module to a new bounds module while renaming several utility functions. Key changes include the renaming of key helper functions (e.g. _get_cartesian_face_edge_nodes to _get_cartesian_face_edge_nodes_array), the update of the normalization check using xr.ufuncs with .compute(), and updates in grid and test files to reflect these function name changes.
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| uxarray/grid/validation.py | Updated the normalization check to use xr.ufuncs.abs and .compute(). |
| uxarray/grid/utils.py | Renamed and added numba-compiled helper functions for face edge nodes. |
| uxarray/grid/grid.py | Updated calls to renamed helpers and switched to using _populate_face_bounds. |
| uxarray/grid/coordinates.py | Removed the standalone _normalize_xyz function. |
| uxarray/core/zonal.py | Updated invocations to the new face edge helper name. |
| Test files | Adjusted function references to match the new helper function names. |
Comments suppressed due to low confidence (3)
uxarray/grid/grid.py:1431
- [nitpick] Double-check that the renaming to '_populate_face_bounds' is fully integrated with the numba function caching mechanism to ensure that the function is recognized and compiled as expected.
if not is_numba_function_cached(_populate_face_bounds):
uxarray/grid/validation.py:120
- Using '.compute()' directly here may introduce unnecessary overhead if the inputs are already in-memory arrays; consider verifying whether the inputs are dask arrays and remove or conditionally apply .compute() to optimize performance.
max_dev = xr.ufuncs.abs(x**2 + y**2 + z**2 - 1.0).max().compute()
uxarray/grid/utils.py:325
- Ensure that 'njit' is properly imported from 'numba' so that these decorated functions compile correctly at runtime.
@njit(cache=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the face bounds construction and cleans up helper functions for improved clarity and performance. Key changes include:
- Refactoring and optimizing the grid normalization and bounds-checking logic.
- Renaming and migrating helper functions from uxarray.grid.geometry to uxarray.grid.bounds and updating their usage.
- Updating test cases accordingly to support the new function names and implementations.
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| uxarray/grid/validation.py | Refactored normalization check using a loop |
| uxarray/grid/utils.py | Renamed and added new numba-accelerated helper functions |
| uxarray/grid/grid.py | Updated bounds logic and normalization implementation |
| uxarray/grid/coordinates.py | Removed legacy normalization function |
| uxarray/core/zonal.py | Updated references to the renamed helper function |
| test/* | Revised tests to use updated function names and bounds |
Comments suppressed due to low confidence (1)
uxarray/grid/utils.py:141
- [nitpick] The new function name '_get_cartesian_face_edge_nodes_array' is more verbose than its previous version; consider consolidating the naming convention, and if the non-suffixed version is deprecated, add a deprecation notice to assist future maintenance.
def _get_cartesian_face_edge_nodes_array(
Closes #1201 #1200
Overview
uxarray.grid.bounds(originally inuxarray.grid.geometry), with the future intention of having separate modules under auxarray.geometrymodule.Timings
The following timings were taken on a single NCAR Derecho CPU node (256 threads, 256GB memory)
For the
3.75kmgrid, below is the CPU vs Wall TimeBefore
After
About a 200x parallel speedup.