feat(amdgpu): per-kernel LLVM function attributes via @qd.kernel(fn_attrs=...)#11
Conversation
|
This change should have no impact on performance and is purely a nice-to-have feature for developers trying to optimize existing kernels(especially occupancy related issues). By exposing these attributes at the python DSL layer we can keep the JIT C++ backend relatively clean and not have to hard-code intricate rules to conditionally apply attributes based on things like the kernel name. |
|
/run-ci |
jamesETsmith
left a comment
There was a problem hiding this comment.
This looks great, thanks @kevinjosephamd!
10c8e42 to
077e8e6
Compare
@yaoliu13 I don't expect this PR in isolation to have any impact on this number, and is a new feature that can be used from Genesis. See: ROCm/Genesis#39 |
Sounds good. Let's make sure this PR doesn't hurt pre-submit throughput. |
Depends on ROCm/quadrants#11 (per-kernel fn_attrs support). Values picked from a per-kernel sweep of (min,max) occupancy hints: kernel_step_1: 3,4 kernel_step_2: 1,4 func_solve_init: 2,4
…ttrs=...)
Lets users override AMDGPU codegen attributes per kernel without editing
the JIT pipeline. Attributes must be pre-registered in
quadrants/program/fn_attrs_registry.h; unknown backend or attribute names
raise QuadrantsSyntaxError at decoration time. Currently registered:
amdgpu-max-num-workgroups, amdgpu-agpr-alloc, amdgpu-waves-per-eu,
amdgpu-flat-work-group-size.
Examples:
```python
@qd.kernel(fn_attrs={"amdgpu": {"amdgpu-max-num-workgroups": "128,1,1"}})
def k(...): ...
@qd.kernel(fn_attrs={"amdgpu": {"amdgpu-waves-per-eu": "1,2"}})
def k(...): ...
```
Plumbed Python decorator -> Kernel.fn_attrs -> set_fn_attrs pybind ->
codegen_llvm.cpp addFnAttr -> jit_amdgpu.cpp (defaults gated by
hasFnAttribute so user values win). Included in both fastcache and
frontend offline cache keys so changing fn_attrs forces a rebuild.
077e8e6 to
033b9c5
Compare
|
/run-ci |
Depends on ROCm/quadrants#11 (per-kernel fn_attrs support). Values picked from a per-kernel sweep of (min,max) occupancy hints: kernel_step_1: 3,4 kernel_step_2: 1,4 func_solve_init: 2,4
Lets users override AMDGPU function attributes per kernel without editing the JIT pipeline. Attributes must be pre-registered in quadrants/program/fn_attrs_registry.h; unknown backend or attribute names raise QuadrantsSyntaxError at decoration time. Currently registered: amdgpu-max-num-workgroups, amdgpu-agpr-alloc, amdgpu-waves-per-eu, amdgpu-flat-work-group-size.
Examples:
Plumbed Python decorator -> Kernel.fn_attrs -> set_fn_attrs pybind -> codegen_llvm.cpp addFnAttr -> jit_amdgpu.cpp (defaults gated by hasFnAttribute so user values win). Included in both fastcache and frontend offline cache keys so changing fn_attrs forces a rebuild.