-
Notifications
You must be signed in to change notification settings - Fork 40
Autodiff improvements #1803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
zingale
merged 15 commits into
AMReX-Astro:development
from
yut23:autodiff-improvements
Jun 12, 2025
Merged
Autodiff improvements #1803
zingale
merged 15 commits into
AMReX-Astro:development
from
yut23:autodiff-improvements
Jun 12, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Returning by-value arguments appears to interfere with copy elision on CPU, and I see ~18% better performance with sneut5 after this change (2.2s to 1.8s in a benchmark). It doesn't make any difference for CUDA, though.
If an intermediate value only depends on some of the variables in an `autodiff::dual_array`, we can save time and memory by not calculating or storing the derivatives with respect to the other variables. This is done by assigning such values to a smaller `dual_array` with indices that match the full `dual_array`. Any operations between these partial arrays will skip the unused components at compile time. Operations that combine different ranges will produce a promoted type that holds all the components between the minimum and maximum indices present. For example, multiplying a `dual_array<1, 1>` and a `dual_array<2, 3>` will produce a `dual_array<1, 3>`. Full example: ```cpp using autodiff::dual_array; // these only store a value and one derivative entry, but can still be // combined when needed dual_array<1, 1> x = 1.0_rt; dual_array<2, 2> y = 2.0_rt; autodiff::seed(x); autodiff::seed(y); // this only computes the derivative terms wrt x dual_array<1, 1> x_squared = x * x; // this only computes the derivative terms wrt y dual_array<2, 2> sin_2y = admath::sin(2.0_rt * y); // partial arrays are promoted as needed by overloaded operators dual_array<1, 2> z = x_squared * sin_2y; ```
550bda4 to
b8eb2c2
Compare
yut23
added a commit
to yut23/Microphysics
that referenced
this pull request
Jun 10, 2025
This decreases the runtime on CPU from 1.81s to 1.45s (-20%), compared to PR AMReX-Astro#1803. It's still 25% slower than the original hand-written derivatives, which I believe is mostly due to the creation of extra temporary GradArrays in the autodiff code.
I originally made this change before adding the GradArray copy constructor, which now handles this case.
zingale
approved these changes
Jun 12, 2025
yut23
added a commit
to yut23/Microphysics
that referenced
this pull request
Jun 12, 2025
This decreases the runtime on CPU from 1.81s to 1.45s (-20%), compared to PR AMReX-Astro#1803. It's still 25% slower than the original hand-written derivatives, which I believe is mostly due to the creation of extra temporary GradArrays in the autodiff code.
zingale
pushed a commit
that referenced
this pull request
Jun 23, 2025
This decreases the runtime on CPU from 1.81s to 1.45s (-20%), compared to PR #1803. It's still 25% slower than the original hand-written derivatives, which I believe is mostly due to the creation of extra temporary GradArrays in the autodiff code.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
admath::powi<N>(), to avoid compilation errors when passing autodiff expressions intoamrex::Math::powi<N>()autodiff::dual_arrayby explicitly constructing new aGradArrayin operator overloads (~20% decrease in runtime on CPU)GradArray, to skip calculating extra derivatives for values that only depend on some input variables