Autodiff improvements #1803

yut23 · 2025-06-06T20:57:40Z

Add admath::powi<N>(), to avoid compilation errors when passing autodiff expressions into amrex::Math::powi<N>()
Improve performance of autodiff::dual_array by explicitly constructing new a GradArray in operator overloads (~20% decrease in runtime on CPU)
Allow operating on partial ranges of GradArray, to skip calculating extra derivatives for values that only depend on some input variables

Returning by-value arguments appears to interfere with copy elision on CPU, and I see ~18% better performance with sneut5 after this change (2.2s to 1.8s in a benchmark). It doesn't make any difference for CUDA, though.

If an intermediate value only depends on some of the variables in an `autodiff::dual_array`, we can save time and memory by not calculating or storing the derivatives with respect to the other variables. This is done by assigning such values to a smaller `dual_array` with indices that match the full `dual_array`. Any operations between these partial arrays will skip the unused components at compile time. Operations that combine different ranges will produce a promoted type that holds all the components between the minimum and maximum indices present. For example, multiplying a `dual_array<1, 1>` and a `dual_array<2, 3>` will produce a `dual_array<1, 3>`. Full example: ```cpp using autodiff::dual_array; // these only store a value and one derivative entry, but can still be // combined when needed dual_array<1, 1> x = 1.0_rt; dual_array<2, 2> y = 2.0_rt; autodiff::seed(x); autodiff::seed(y); // this only computes the derivative terms wrt x dual_array<1, 1> x_squared = x * x; // this only computes the derivative terms wrt y dual_array<2, 2> sin_2y = admath::sin(2.0_rt * y); // partial arrays are promoted as needed by overloaded operators dual_array<1, 2> z = x_squared * sin_2y; ```

This decreases the runtime on CPU from 1.81s to 1.45s (-20%), compared to PR AMReX-Astro#1803. It's still 25% slower than the original hand-written derivatives, which I believe is mostly due to the creation of extra temporary GradArrays in the autodiff code.

I originally made this change before adding the GradArray copy constructor, which now handles this case.

…vements

This decreases the runtime on CPU from 1.81s to 1.45s (-20%), compared to PR AMReX-Astro#1803. It's still 25% slower than the original hand-written derivatives, which I believe is mostly due to the creation of extra temporary GradArrays in the autodiff code.

This decreases the runtime on CPU from 1.81s to 1.45s (-20%), compared to PR #1803. It's still 25% slower than the original hand-written derivatives, which I believe is mostly due to the creation of extra temporary GradArrays in the autodiff code.

yut23 added 4 commits June 6, 2025 14:19

Explicitly construct new GradArrays in operator overloads

e83f194

Returning by-value arguments appears to interfere with copy elision on CPU, and I see ~18% better performance with sneut5 after this change (2.2s to 1.8s in a benchmark). It doesn't make any difference for CUDA, though.

Add admath::powi<N>()

c74b45a

Remove unnecessary operator overloads for GradArrays

08d1288

yut23 marked this pull request as draft June 6, 2025 20:58

yut23 added 2 commits June 6, 2025 17:00

Fix some clang-tidy warnings

b6bfea9

Use admath::powi where applicable

b8eb2c2

yut23 force-pushed the autodiff-improvements branch from 550bda4 to b8eb2c2 Compare June 6, 2025 21:00

yut23 added 4 commits June 10, 2025 16:07

Add autodiff::make_partial_arrays()

05232f0

Add unit tests for vector-mode autodiff

bffe3d8

Add an additional autodiff::narrow_array overload

bbe7a8f

Add docs for the new autodiff features

8f4847e

yut23 marked this pull request as ready for review June 10, 2025 21:20

Update allowed ifdef variables

2c8bf3e

yut23 mentioned this pull request Jun 10, 2025

Update sneut5 to use partial arrays #1808

Merged

yut23 added 4 commits June 10, 2025 18:53

Fix template deduction for autodiff::narrow_array<LO, HI>

0138da7

Revert a change to dual.hpp

ea87310

I originally made this change before adding the GradArray copy constructor, which now handles this case.

Add more autodiff tests

4085428

Merge remote-tracking branch 'origin/development' into autodiff-impro…

ff3667c

…vements

zingale approved these changes Jun 12, 2025

View reviewed changes

zingale merged commit f3ca377 into AMReX-Astro:development Jun 12, 2025
31 of 32 checks passed

yut23 deleted the autodiff-improvements branch June 12, 2025 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Autodiff improvements #1803

Autodiff improvements #1803

Uh oh!

yut23 commented Jun 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Autodiff improvements #1803

Autodiff improvements #1803

Uh oh!

Conversation

yut23 commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yut23 commented Jun 6, 2025 •

edited

Loading