Skip to content

Conversation

@garth-wells
Copy link
Member

@garth-wells garth-wells commented Aug 17, 2025

Support GPUs by allowing the user to specify the container types for la::Vector and common::Scatterer. For GPUs, the containers can, for example, be thrust::device_vector for on-device storage of data for a multi-GPU Vector.

  • Changes the data return type for la::Vector::array from std::span to container_type&.
  • Changes some common::Scatterer std::span args to plain pointers, which can be device pointers. Using spans was misleading because device entries can't be accessed by operator[].
  • Substantially reduce the number of common::Scatterer member functions. It was confusing. common::Scatterer is not normally called by a user, so it's fine to keep it simple and low-level.
  • Improves common::Scatterer documentation.
  • Improve type deduction in vector assembly functions, no need for the user to provide <foo> to assembler functions.
  • Communication of on-device data requires a GPU-ware MPI build.

Used with GPU backend in https://github.com/ukri-bench/benchmark-dolfinx.

@garth-wells garth-wells added the enhancement New feature or request label Aug 18, 2025
@chrisrichardson
Copy link
Contributor

Does the pack and unpack work (efficiently) on GPU... I thought we would need to pass a GPU kernel for that?

@garth-wells
Copy link
Member Author

garth-wells commented Aug 18, 2025

Does the pack and unpack work (efficiently) on GPU... I thought we would need to pass a GPU kernel for that?

Yes, you need to (and can) pass the pack and unpack kernels.

We could eventually add default GPU pack/unpack kernels, but no point until we have CI for GPUs.

@chrisrichardson
Copy link
Contributor

OK, just looking at e.g. void scatter_fwd(const T* local_data, T* remote_data, GetPtr get_ptr) - which has built-in pack/unpack but docs suggest works with GPU. My understanding is that I would need to call the begin/end methods separarely with custom kernels?

@jorgensd
Copy link
Member

In general looks good to me. Good documentation of deprecated functions helps a lot! Looking forward to having GPU CI at some point so one could add some tests.

@garth-wells
Copy link
Member Author

In general looks good to me. Good documentation of deprecated functions helps a lot! Looking forward to having GPU CI at some point so one could add some tests.

I got GPU CI on Azure or AWS working in https://github.com/ukri-bench/benchmark-dolfinx. Will soon add GPU examples with CI to https://github.com/FEniCS/dolfinx-gpu-solvers.


unpack(_buffer_remote, _scatterer->remote_indices(), x_remote,
[](auto /*a*/, auto b) { return b; });
this->scatter_fwd_end(get_unpack());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this-> (not needed, and inconsistent usage) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this for readability to be really clear then the function comes from Vector vs when it comes from Scatterer.

/// Compute the squared L2 norm of vector
/// @note Collective MPI operation
/// @brief Compute the squared L2 norm of vector.
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU only?

/// Compute the inner product of two vectors. The two vectors must have
/// the same parallel layout
/// @brief Compute the inner product of two vectors.
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU only?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but I don't actually know. Would need testing of how GPU-aware backends behave.

@garth-wells garth-wells enabled auto-merge August 22, 2025 14:46
Co-authored-by: Paul T. Kühner <56360279+schnellerhase@users.noreply.github.com>
@garth-wells garth-wells disabled auto-merge August 22, 2025 15:11
@garth-wells garth-wells enabled auto-merge August 22, 2025 15:14
@garth-wells garth-wells added this pull request to the merge queue Aug 22, 2025
Merged via the queue into main with commit 213bf7c Aug 22, 2025
30 checks passed
@garth-wells garth-wells deleted the garth/generalise-vector-container branch August 22, 2025 15:54
jorgensd added a commit to jorgensd/dolfinx_mpc that referenced this pull request Aug 22, 2025
jorgensd added a commit to jorgensd/dolfinx_mpc that referenced this pull request Aug 23, 2025
* Apply api changes from FEniCS/dolfinx#3855
Currently doesn't work due to:
FEniCS/dolfinx#3868

* Fix communication
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request gpu

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants