perf(amdgpu): amdgpu perf + force_inline#15
Conversation
deepsek
commented
Apr 25, 2026
- kernel launch improvements
- force_inline support into the AST
7c8ba7e to
789d90d
Compare
789d90d to
95b5708
Compare
|
/run-ci |
|
/run-ci |
1 similar comment
|
/run-ci |
|
Strong evidence to land. AMDGCN dumps on gfx942 from the Genesis hot kernels show the launcher kernels are thin (≤74 VGPR, 0 scratch) but each calls an outlined
Prologue in each is ~60 contiguous Confirmed via paired AMDGCN dumps that source-level changes don't clear this floor: e.g. a loop-fusion candidate that cut -690 asm lines from
|
|
Good eye @lohiaj! That's the main reason I'm exposing this as a variable.. Thanks for validating the same too! |
|
/run-ci |
9d2a6cb to
caf2c1f
Compare
|
/run-ci |
|
1370057 and 4968 |
|
Need one more approval |
lohiaj
left a comment
There was a problem hiding this comment.
reviewed and approved, force_inline removes the outlined callee save restore boundary i validated on the genesis hot kernels and the launcher hot path cleanup looks clean
|
/run-ci |
1 similar comment
|
/run-ci |