-
-
Notifications
You must be signed in to change notification settings - Fork 228
Generalise la::Vector to support GPUs
#3855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Does the pack and unpack work (efficiently) on GPU... I thought we would need to pass a GPU kernel for that? |
Yes, you need to (and can) pass the pack and unpack kernels. We could eventually add default GPU pack/unpack kernels, but no point until we have CI for GPUs. |
|
OK, just looking at e.g. |
|
In general looks good to me. Good documentation of deprecated functions helps a lot! Looking forward to having GPU CI at some point so one could add some tests. |
I got GPU CI on Azure or AWS working in https://github.com/ukri-bench/benchmark-dolfinx. Will soon add GPU examples with CI to https://github.com/FEniCS/dolfinx-gpu-solvers. |
|
|
||
| unpack(_buffer_remote, _scatterer->remote_indices(), x_remote, | ||
| [](auto /*a*/, auto b) { return b; }); | ||
| this->scatter_fwd_end(get_unpack()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this-> (not needed, and inconsistent usage) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this for readability to be really clear then the function comes from Vector vs when it comes from Scatterer.
| /// Compute the squared L2 norm of vector | ||
| /// @note Collective MPI operation | ||
| /// @brief Compute the squared L2 norm of vector. | ||
| /// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPU only?
| /// Compute the inner product of two vectors. The two vectors must have | ||
| /// the same parallel layout | ||
| /// @brief Compute the inner product of two vectors. | ||
| /// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPU only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but I don't actually know. Would need testing of how GPU-aware backends behave.
Co-authored-by: Paul T. Kühner <56360279+schnellerhase@users.noreply.github.com>
Currently doesn't work due to: FEniCS/dolfinx#3868
* Apply api changes from FEniCS/dolfinx#3855 Currently doesn't work due to: FEniCS/dolfinx#3868 * Fix communication
Support GPUs by allowing the user to specify the container types for
la::Vectorandcommon::Scatterer. For GPUs, the containers can, for example, bethrust::device_vectorfor on-device storage of data for a multi-GPUVector.la::Vector::arrayfromstd::spantocontainer_type&.common::Scattererstd::spanargs to plain pointers, which can be device pointers. Using spans was misleading because device entries can't be accessed byoperator[].common::Scatterermember functions. It was confusing.common::Scattereris not normally called by a user, so it's fine to keep it simple and low-level.common::Scattererdocumentation.<foo>to assembler functions.Used with GPU backend in https://github.com/ukri-bench/benchmark-dolfinx.