Skip to content

feat(amdgpu): per-kernel LLVM function attributes via @qd.kernel(fn_attrs=...)#11

Merged
yaoliu13 merged 1 commit intoamd-integrationfrom
kejoseph/feature/expose_function_attr_to_dsl
Apr 30, 2026
Merged

feat(amdgpu): per-kernel LLVM function attributes via @qd.kernel(fn_attrs=...)#11
yaoliu13 merged 1 commit intoamd-integrationfrom
kejoseph/feature/expose_function_attr_to_dsl

Conversation

@kevinjosephamd
Copy link
Copy Markdown

@kevinjosephamd kevinjosephamd commented Apr 23, 2026

Lets users override AMDGPU function attributes per kernel without editing the JIT pipeline. Attributes must be pre-registered in quadrants/program/fn_attrs_registry.h; unknown backend or attribute names raise QuadrantsSyntaxError at decoration time. Currently registered: amdgpu-max-num-workgroups, amdgpu-agpr-alloc, amdgpu-waves-per-eu, amdgpu-flat-work-group-size.

Examples:

@qd.kernel(fn_attrs={"amdgpu": {"amdgpu-max-num-workgroups": "128,1,1"}})
def k(...): ...

@qd.kernel(fn_attrs={"amdgpu": {"amdgpu-waves-per-eu": "1,2"}})
def k(...): ...

Plumbed Python decorator -> Kernel.fn_attrs -> set_fn_attrs pybind -> codegen_llvm.cpp addFnAttr -> jit_amdgpu.cpp (defaults gated by hasFnAttribute so user values win). Included in both fastcache and frontend offline cache keys so changing fn_attrs forces a rebuild.

@kevinjosephamd
Copy link
Copy Markdown
Author

kevinjosephamd commented Apr 23, 2026

This change should have no impact on performance and is purely a nice-to-have feature for developers trying to optimize existing kernels(especially occupancy related issues). By exposing these attributes at the python DSL layer we can keep the JIT C++ backend relatively clean and not have to hard-code intricate rules to conditionally apply attributes based on things like the kernel name.

@yaoliu13
Copy link
Copy Markdown
Collaborator

/run-ci

Copy link
Copy Markdown
Collaborator

@jamesETsmith jamesETsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks @kevinjosephamd!

@kevinjosephamd kevinjosephamd force-pushed the kejoseph/feature/expose_function_attr_to_dsl branch 2 times, most recently from 10c8e42 to 077e8e6 Compare April 27, 2026 00:26
Copy link
Copy Markdown
Collaborator

@yaoliu13 yaoliu13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-submit is 776,886

@kevinjosephamd
Copy link
Copy Markdown
Author

kevinjosephamd commented Apr 28, 2026

pre-submit is 776,886

@yaoliu13 I don't expect this PR in isolation to have any impact on this number, and is a new feature that can be used from Genesis. See: ROCm/Genesis#39

@yaoliu13
Copy link
Copy Markdown
Collaborator

yaoliu13 commented Apr 29, 2026

pre-submit is 776,886

@yaoliu13 I don't expect this PR in isolation to have any impact on this number, and is a new feature that can be used from Genesis. See: ROCm/Genesis#39

Sounds good. Let's make sure this PR doesn't hurt pre-submit throughput.

kevinjosephamd added a commit to ROCm/Genesis that referenced this pull request Apr 29, 2026
Depends on ROCm/quadrants#11 (per-kernel fn_attrs support).
Values picked from a per-kernel sweep of (min,max) occupancy hints:
  kernel_step_1:               3,4
  kernel_step_2:               1,4
  func_solve_init:             2,4
…ttrs=...)

Lets users override AMDGPU codegen attributes per kernel without editing
the JIT pipeline. Attributes must be pre-registered in
quadrants/program/fn_attrs_registry.h; unknown backend or attribute names
raise QuadrantsSyntaxError at decoration time. Currently registered:
amdgpu-max-num-workgroups, amdgpu-agpr-alloc, amdgpu-waves-per-eu,
amdgpu-flat-work-group-size.

Examples:
```python
@qd.kernel(fn_attrs={"amdgpu": {"amdgpu-max-num-workgroups": "128,1,1"}})
def k(...): ...

@qd.kernel(fn_attrs={"amdgpu": {"amdgpu-waves-per-eu": "1,2"}})
def k(...): ...
```

Plumbed Python decorator -> Kernel.fn_attrs -> set_fn_attrs pybind ->
codegen_llvm.cpp addFnAttr -> jit_amdgpu.cpp (defaults gated by
hasFnAttribute so user values win). Included in both fastcache and
frontend offline cache keys so changing fn_attrs forces a rebuild.
@kevinjosephamd kevinjosephamd force-pushed the kejoseph/feature/expose_function_attr_to_dsl branch from 077e8e6 to 033b9c5 Compare April 30, 2026 05:38
@kevinjosephamd
Copy link
Copy Markdown
Author

/run-ci

Copy link
Copy Markdown
Collaborator

@yaoliu13 yaoliu13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yaoliu13 yaoliu13 merged commit f71121d into amd-integration Apr 30, 2026
38 of 46 checks passed
kevinjosephamd added a commit to ROCm/Genesis that referenced this pull request Apr 30, 2026
Depends on ROCm/quadrants#11 (per-kernel fn_attrs support).
Values picked from a per-kernel sweep of (min,max) occupancy hints:
  kernel_step_1:               3,4
  kernel_step_2:               1,4
  func_solve_init:             2,4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants