[AMDGPU] Refactor address-space tagging to source visitors + fix block_dim coercion#7
Merged
yaoliu13 merged 2 commits intoamd-integrationfrom Apr 22, 2026
Conversation
jamesETsmith
requested changes
Apr 22, 2026
Collaborator
jamesETsmith
left a comment
There was a problem hiding this comment.
Just a couple of questions on this @deepsek. This is great, happy to have these fixed
Comment on lines
+393
to
+394
| if (current && current->getType()->isPointerTy() && | ||
| current->getType()->getPointerAddressSpace() != 1) { |
Collaborator
There was a problem hiding this comment.
Probably a noob question here, but this seems to assume it will always receive an AS0 or AS1 pointer. Is that guaranteed? Is there any change this actually casts AS3 -> AS1?
| std::to_string(block_dim); | ||
| // Note: hardcoded wavefront size of 64 matches CDNA3. RDNA in wave32 | ||
| // mode would need a runtime query; revisit if Quadrants ever supports | ||
| // RDNA AMDGPU targets. |
Collaborator
There was a problem hiding this comment.
Upstream already supports RDNA. We should add a warning to this that we're breaking RDNA support for now
2 tasks
Collaborator
|
benchmarks look good |
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
a class of bugs where base
TaskCodeGenLLVMhelpers (atomic_op_using_cas,optimized_reductionruntime calls, etc.) leaked into AMDGPU-addrspace-tagged pointers and
generated invalid IR or triggered
HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATIONat launch. Now the pointer arrives with the correct addrspace at the moment
it's materialized,
InferAddressSpacespropagates it downstream, and basevisitors emit plain load/store that lowers to the right ISA per addrspace.
block_dimworkaround to live at the right layer. 793966arounded
block_dimup to wavefront size (64) on both the LLVM kernelattribute and the HSA dispatch packet, which avoided the aperture fault
but silently coerced the user's
block_dim, breaking LDS sizing anditeration distribution (e.g.,
test_shared_array_atomicsproducing2×the expected sum because 64 lanes hit a 32-slot LDS array). The fault was
caused by the kernel attribute being undersized for backend scratch
allocation, NOT by the dispatch packet. Rounding now happens inside
mark_function_as_amdgpu_kernelonly; HSA dispatch keeps the user'svalue and uses EXEC masking natively. User's
block_dimsemantics arefully preserved — DSL abstraction holds.
Extension::blsdeclaration for AMDGPU. perf: codegen, llvm, host_api #1 addedExtension::blswithout the prerequisiteExtension::sparse, causingsparse-BLS tests (
test_bls.py,test_bls_assume_in_range.py) to RUN andfail with
Pointer SNode is not supported on this backendat field-buildertime instead of cleanly skipping at the
@test_utils.test(require=qd.extension.bls)decorator. Reverts to the
0.4.5.amd0baseline behavior.