-
-
Notifications
You must be signed in to change notification settings - Fork 14.2k
Add intrinsic for dynamic group-shared memory on GPUs #146181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3436,6 +3436,43 @@ pub(crate) const fn miri_promise_symbolic_alignment(ptr: *const (), align: usize | |
| ) | ||
| } | ||
|
|
||
| /// Returns the pointer to dynamic group-shared memory on GPUs. | ||
| /// | ||
| /// Group-shared memory is a memory region that is shared between all threads in | ||
| /// the same work-group. It is faster to access than other memory but pointers do not | ||
| /// work outside the work-group where they were obtained. | ||
| /// Dynamic group-shared memory is in the group-shared memory region, the allocated | ||
| /// size is specified late, after compilation, when launching a gpu-kernel. | ||
| /// The size can differ between launches of a gpu-kernel, therefore it is called dynamic. | ||
Flakebi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| /// However, the alignment is fixed by the kernel itself (at compile-time). | ||
| /// | ||
| /// The returned pointer is the start of the dynamic group-shared memory region. | ||
| /// All calls to `gpu_dynamic_groupshared_mem` in a work-group, independent of the | ||
| /// generic type, return the same address, so alias the same memory. | ||
| /// The returned pointer is aligned by at least the alignment of `T`. | ||
RalfJung marked this conversation as resolved.
Show resolved
Hide resolved
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there some prior discussion of the design decision to determine the alignment by giving a type parameter? I could also be a const generic parameter, for instance. I don't have an opinion on the matter since I am an outsider to the GPU world, but as a compiler team member it'd be good to know if this is something you thought about for 5 minutes or whether there's some sort of larger design by a team that has a vision of how all these things will fit together.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is some discussion in #135516. I don’t mind either way, I thought (for 5 minutes ;)) that specifying the type of the returned pointer makes sense. For just a struct, static shared memory would make more sense, though we don’t support that yet (there’s some discussion in the tracking issue, but I think that’s more complicated to design and implement). |
||
| /// | ||
| /// # Safety | ||
| /// | ||
| /// The pointer is safe to dereference from the start (the returned pointer) up to the | ||
| /// size of dynamic group-shared memory that was specified when launching the current | ||
| /// gpu-kernel. | ||
| /// | ||
| /// The user must take care of synchronizing access to group-shared memory between | ||
| /// threads in a work-group. The usual data race requirements apply. | ||
| /// | ||
| /// # Other APIs | ||
| /// | ||
| /// CUDA and HIP call this shared memory, shared between threads in a block. | ||
| /// OpenCL and SYCL call this local memory, shared between threads in a work-group. | ||
| /// GLSL calls this shared memory, shared between invocations in a work group. | ||
| /// DirectX calls this groupshared memory, shared between threads in a thread-group. | ||
| #[must_use = "returns a pointer that does nothing unless used"] | ||
| #[rustc_intrinsic] | ||
| #[rustc_nounwind] | ||
| #[unstable(feature = "gpu_dynamic_groupshared_mem", issue = "135513")] | ||
| #[cfg(any(target_arch = "amdgpu", target_arch = "nvptx64"))] | ||
| pub fn gpu_dynamic_groupshared_mem<T>() -> *mut T; | ||
|
|
||
| /// Copies the current location of arglist `src` to the arglist `dst`. | ||
| /// | ||
| /// # Safety | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| // Checks that the GPU dynamic group-shared memory intrinsic works. | ||
|
|
||
| //@ revisions: amdgpu nvptx | ||
| //@ compile-flags: --crate-type=rlib | ||
| // | ||
| //@ [amdgpu] compile-flags: --target amdgcn-amd-amdhsa -Ctarget-cpu=gfx900 | ||
| //@ [amdgpu] needs-llvm-components: amdgpu | ||
| //@ [nvptx] compile-flags: --target nvptx64-nvidia-cuda | ||
| //@ [nvptx] needs-llvm-components: nvptx | ||
| //@ add-minicore | ||
| #![feature(intrinsics, no_core, rustc_attrs)] | ||
| #![no_core] | ||
|
|
||
| extern crate minicore; | ||
|
|
||
| #[rustc_intrinsic] | ||
| #[rustc_nounwind] | ||
| fn gpu_dynamic_groupshared_mem<T>() -> *mut T; | ||
|
|
||
| // CHECK-DAG: @[[SMALL:[^ ]+]] = external addrspace(3) global [0 x i8], align 4 | ||
| // CHECK-DAG: @[[BIG:[^ ]+]] = external addrspace(3) global [0 x i8], align 8 | ||
| // CHECK: ret { ptr, ptr } { ptr addrspacecast (ptr addrspace(3) @[[SMALL]] to ptr), ptr addrspacecast (ptr addrspace(3) @[[BIG]] to ptr) } | ||
| #[unsafe(no_mangle)] | ||
| pub fn fun() -> (*mut i32, *mut f64) { | ||
| let small = gpu_dynamic_groupshared_mem::<i32>(); | ||
| let big = gpu_dynamic_groupshared_mem::<f64>(); // Increase alignment to 8 | ||
| (small, big) | ||
| } |
Uh oh!
There was an error while loading. Please reload this page.