Skip to content

Conversation

@yut23
Copy link
Collaborator

@yut23 yut23 commented Jun 6, 2025

  • Add admath::powi<N>(), to avoid compilation errors when passing autodiff expressions into amrex::Math::powi<N>()
  • Improve performance of autodiff::dual_array by explicitly constructing new a GradArray in operator overloads (~20% decrease in runtime on CPU)
  • Allow operating on partial ranges of GradArray, to skip calculating extra derivatives for values that only depend on some input variables

yut23 added 4 commits June 6, 2025 14:19
Returning by-value arguments appears to interfere with copy elision on
CPU, and I see ~18% better performance with sneut5 after this change
(2.2s to 1.8s in a benchmark). It doesn't make any difference for CUDA,
though.
If an intermediate value only depends on some of the variables in an
`autodiff::dual_array`, we can save time and memory by not calculating
or storing the derivatives with respect to the other variables. This is
done by assigning such values to a smaller `dual_array` with indices
that match the full `dual_array`. Any operations between these partial
arrays will skip the unused components at compile time. Operations that
combine different ranges will produce a promoted type that holds all the
components between the minimum and maximum indices present. For example,
multiplying a `dual_array<1, 1>` and a `dual_array<2, 3>` will produce a
`dual_array<1, 3>`.

Full example:
```cpp
using autodiff::dual_array;

// these only store a value and one derivative entry, but can still be
// combined when needed
dual_array<1, 1> x = 1.0_rt;
dual_array<2, 2> y = 2.0_rt;
autodiff::seed(x);
autodiff::seed(y);

// this only computes the derivative terms wrt x
dual_array<1, 1> x_squared = x * x;

// this only computes the derivative terms wrt y
dual_array<2, 2> sin_2y = admath::sin(2.0_rt * y);

// partial arrays are promoted as needed by overloaded operators
dual_array<1, 2> z = x_squared * sin_2y;
```
@yut23 yut23 marked this pull request as draft June 6, 2025 20:58
@yut23 yut23 force-pushed the autodiff-improvements branch from 550bda4 to b8eb2c2 Compare June 6, 2025 21:00
@yut23 yut23 marked this pull request as ready for review June 10, 2025 21:20
yut23 added a commit to yut23/Microphysics that referenced this pull request Jun 10, 2025
This decreases the runtime on CPU from 1.81s to 1.45s (-20%), compared
to PR AMReX-Astro#1803. It's still 25% slower than the original hand-written
derivatives, which I believe is mostly due to the creation of extra
temporary GradArrays in the autodiff code.
@zingale zingale merged commit f3ca377 into AMReX-Astro:development Jun 12, 2025
31 of 32 checks passed
yut23 added a commit to yut23/Microphysics that referenced this pull request Jun 12, 2025
This decreases the runtime on CPU from 1.81s to 1.45s (-20%), compared
to PR AMReX-Astro#1803. It's still 25% slower than the original hand-written
derivatives, which I believe is mostly due to the creation of extra
temporary GradArrays in the autodiff code.
@yut23 yut23 deleted the autodiff-improvements branch June 12, 2025 20:07
zingale pushed a commit that referenced this pull request Jun 23, 2025
This decreases the runtime on CPU from 1.81s to 1.45s (-20%), compared
to PR #1803. It's still 25% slower than the original hand-written
derivatives, which I believe is mostly due to the creation of extra
temporary GradArrays in the autodiff code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants