Generalise `la::Vector` to support GPUs #3855

garth-wells · 2025-08-17T09:19:28Z

Support GPUs by allowing the user to specify the container types for la::Vector and common::Scatterer. For GPUs, the containers can, for example, be thrust::device_vector for on-device storage of data for a multi-GPU Vector.

Changes the data return type for la::Vector::array from std::span to container_type&.
Changes some common::Scatterer std::span args to plain pointers, which can be device pointers. Using spans was misleading because device entries can't be accessed by operator[].
Substantially reduce the number of common::Scatterer member functions. It was confusing. common::Scatterer is not normally called by a user, so it's fine to keep it simple and low-level.
Improves common::Scatterer documentation.
Improve type deduction in vector assembly functions, no need for the user to provide <foo> to assembler functions.
Communication of on-device data requires a GPU-ware MPI build.

Used with GPU backend in https://github.com/ukri-bench/benchmark-dolfinx.

…or-container

chrisrichardson · 2025-08-18T12:15:14Z

Does the pack and unpack work (efficiently) on GPU... I thought we would need to pass a GPU kernel for that?

garth-wells · 2025-08-18T14:01:10Z

Does the pack and unpack work (efficiently) on GPU... I thought we would need to pass a GPU kernel for that?

Yes, you need to (and can) pass the pack and unpack kernels.

We could eventually add default GPU pack/unpack kernels, but no point until we have CI for GPUs.

chrisrichardson · 2025-08-18T14:17:30Z

OK, just looking at e.g. void scatter_fwd(const T* local_data, T* remote_data, GetPtr get_ptr) - which has built-in pack/unpack but docs suggest works with GPU. My understanding is that I would need to call the begin/end methods separarely with custom kernels?

cpp/demo/poisson_matrix_free/main.cpp

cpp/dolfinx/common/Table.cpp

cpp/dolfinx/la/Vector.h

jorgensd · 2025-08-21T20:20:04Z

In general looks good to me. Good documentation of deprecated functions helps a lot! Looking forward to having GPU CI at some point so one could add some tests.

garth-wells · 2025-08-21T21:37:51Z

In general looks good to me. Good documentation of deprecated functions helps a lot! Looking forward to having GPU CI at some point so one could add some tests.

I got GPU CI on Azure or AWS working in https://github.com/ukri-bench/benchmark-dolfinx. Will soon add GPU examples with CI to https://github.com/FEniCS/dolfinx-gpu-solvers.

cpp/dolfinx/la/Vector.h

chrisrichardson · 2025-08-22T10:11:02Z

cpp/dolfinx/la/Vector.h

-
-    unpack(_buffer_remote, _scatterer->remote_indices(), x_remote,
-           [](auto /*a*/, auto b) { return b; });
+    this->scatter_fwd_end(get_unpack());


Remove this-> (not needed, and inconsistent usage) ?

I added this for readability to be really clear then the function comes from Vector vs when it comes from Scatterer.

chrisrichardson · 2025-08-22T10:11:35Z

cpp/dolfinx/la/Vector.h

-/// Compute the squared L2 norm of vector
-/// @note Collective MPI operation
+/// @brief Compute the squared L2 norm of vector.
+///


chrisrichardson · 2025-08-22T10:12:17Z

cpp/dolfinx/la/Vector.h

-/// Compute the inner product of two vectors. The two vectors must have
-/// the same parallel layout
+/// @brief Compute the inner product of two vectors.
+///


Probably, but I don't actually know. Would need testing of how GPU-aware backends behave.

…or-container

cpp/dolfinx/common/Scatterer.h

cpp/dolfinx/fem/assemble_vector_impl.h

Co-authored-by: Paul T. Kühner <56360279+schnellerhase@users.noreply.github.com>

Currently doesn't work due to: FEniCS/dolfinx#3868

* Apply api changes from FEniCS/dolfinx#3855 Currently doesn't work due to: FEniCS/dolfinx#3868 * Fix communication

garth-wells added 20 commits August 15, 2025 13:05

Updates

2d0a6fe

Scatter updates

da3255c

Generalise

a13c130

Merge remote-tracking branch 'origin/main' into garth/generalise-vect…

9763c21

…or-container

Update tests

01ed0aa

Generalise

c6a23b9

Generalise

8fe0e53

Updates

680f286

Update assemblers

6bd8fa2

Small updates

017eab6

Format

91ce866

Merge remote-tracking branch 'origin/main' into garth/generalise-vect…

a69b422

…or-container

Simplify

e57d69d

Remove resize

641a7b2

Move code outside loops

7099378

Fix size

1a9cbdd

Lint

a8d9644

Fix

a7a043a

Reinstate demo

b861ccd

Add size check

34c3671

garth-wells added the enhancement New feature or request label Aug 18, 2025

Simplify

88b1d9c

garth-wells added the gpu label Aug 18, 2025

Tidy up

463e709

garth-wells added 2 commits August 18, 2025 15:01

Merge branch 'main' into garth/generalise-vector-container

c947c55

Merge branch 'main' into garth/generalise-vector-container

e68418d

Remove check

901112d

Remove commented code

29e2755