Skip to content

[AMDGPU] Refactor address-space tagging to source visitors + fix block_dim coercion#7

Merged
yaoliu13 merged 2 commits intoamd-integrationfrom
refactor/deepsek/amdgpu-addrspace-at-source
Apr 22, 2026
Merged

[AMDGPU] Refactor address-space tagging to source visitors + fix block_dim coercion#7
yaoliu13 merged 2 commits intoamd-integrationfrom
refactor/deepsek/amdgpu-addrspace-at-source

Conversation

@deepsek
Copy link
Copy Markdown
Collaborator

@deepsek deepsek commented Apr 22, 2026

  • Moves LLVM address-space tagging from consumers to pointer-source visitors. Removes
    a class of bugs where base
    TaskCodeGenLLVM helpers (atomic_op_using_cas, optimized_reduction
    runtime calls, etc.) leaked into AMDGPU-addrspace-tagged pointers and
    generated invalid IR or triggered HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION
    at launch. Now the pointer arrives with the correct addrspace at the moment
    it's materialized, InferAddressSpaces propagates it downstream, and base
    visitors emit plain load/store that lowers to the right ISA per addrspace.
  • Fixes the block_dim workaround to live at the right layer. 793966a
    rounded block_dim up to wavefront size (64) on both the LLVM kernel
    attribute and the HSA dispatch packet, which avoided the aperture fault
    but silently coerced the user's block_dim, breaking LDS sizing and
    iteration distribution (e.g., test_shared_array_atomics producing
    the expected sum because 64 lanes hit a 32-slot LDS array). The fault was
    caused by the kernel attribute being undersized for backend scratch
    allocation, NOT by the dispatch packet. Rounding now happens inside
    mark_function_as_amdgpu_kernel only; HSA dispatch keeps the user's
    value and uses EXEC masking natively. User's block_dim semantics are
    fully preserved — DSL abstraction holds.
  • Reverts premature Extension::bls declaration for AMDGPU. perf: codegen, llvm, host_api #1 added
    Extension::bls without the prerequisite Extension::sparse, causing
    sparse-BLS tests (test_bls.py, test_bls_assume_in_range.py) to RUN and
    fail with Pointer SNode is not supported on this backend at field-builder
    time instead of cleanly skipping at the @test_utils.test(require=qd.extension.bls)
    decorator. Reverts to the 0.4.5.amd0 baseline behavior.

Copy link
Copy Markdown
Collaborator

@jamesETsmith jamesETsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of questions on this @deepsek. This is great, happy to have these fixed

Comment on lines +393 to +394
if (current && current->getType()->isPointerTy() &&
current->getType()->getPointerAddressSpace() != 1) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a noob question here, but this seems to assume it will always receive an AS0 or AS1 pointer. Is that guaranteed? Is there any change this actually casts AS3 -> AS1?

std::to_string(block_dim);
// Note: hardcoded wavefront size of 64 matches CDNA3. RDNA in wave32
// mode would need a runtime query; revisit if Quadrants ever supports
// RDNA AMDGPU targets.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upstream already supports RDNA. We should add a warning to this that we're breaking RDNA support for now

@yaoliu13
Copy link
Copy Markdown
Collaborator

yaoliu13 commented Apr 22, 2026

benchmarks look good

@yaoliu13 yaoliu13 merged commit a732b47 into amd-integration Apr 22, 2026
35 of 43 checks passed
@gpinkert gpinkert mentioned this pull request Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants