Skip to content

Conversation

@k50112113
Copy link
Contributor

This PR includes:

  1. Triton RoPE API refactoring
  2. Triton RoPE kernel refactoring
  3. Triton RoPE kernel optimization
  4. Triton RoPE backward kernels

Note, the "_gqa" API will be deprecated and will be merged to "_2c" API.
TODO: more optimization, 2D RoPE dev

@k50112113 k50112113 requested a review from rahulbatra85 July 2, 2025 19:07
@rahulbatra85 rahulbatra85 merged commit f492799 into main Jul 8, 2025
13 checks passed
@rahulbatra85 rahulbatra85 deleted the shaoclee/triton_rope_dev branch July 8, 2025 19:45
fsx950223 pushed a commit that referenced this pull request Jul 11, 2025
* rename test_rope_triton.py to test_rope.py

* enable all API in bench, apply generate_rope_inputs to all test func in test except 2d cases

* (rebase) re-order

* (rebase) re-order

* new _rope_fwd_kernel_neox kernel

* merge _rope_fwd_kernel_gptj and _rope_fwd_kernel_neox into _rope_fwd_kernel

* (rebase) get rid of nope kernel

* (rebase) merge all _thd kernels into one kernel

* change test_rope_fwd to test_rope_fwd_sbhd

* (rebase) merge all cached/positions/offsets kernels into one kernel

* resolve failing test cases for cached kernels

* clean

* clean

* (rebase) merge cached_2c kernels

* add nope into kernels

* rebase clean up

* add rope bwd for sbhd

* add rope bwd for thd

* add rope bwd for cached thd and two_input cached thd

* merge gqa into two_input wrapper and add rope bwd for two_input gqa

* update bench rope and clean up

* clean up

* add backward hooks

* black reformatting
cagrikymk pushed a commit that referenced this pull request Jul 30, 2025
* rename test_rope_triton.py to test_rope.py

* enable all API in bench, apply generate_rope_inputs to all test func in test except 2d cases

* (rebase) re-order

* (rebase) re-order

* new _rope_fwd_kernel_neox kernel

* merge _rope_fwd_kernel_gptj and _rope_fwd_kernel_neox into _rope_fwd_kernel

* (rebase) get rid of nope kernel

* (rebase) merge all _thd kernels into one kernel

* change test_rope_fwd to test_rope_fwd_sbhd

* (rebase) merge all cached/positions/offsets kernels into one kernel

* resolve failing test cases for cached kernels

* clean

* clean

* (rebase) merge cached_2c kernels

* add nope into kernels

* rebase clean up

* add rope bwd for sbhd

* add rope bwd for thd

* add rope bwd for cached thd and two_input cached thd

* merge gqa into two_input wrapper and add rope bwd for two_input gqa

* update bench rope and clean up

* clean up

* add backward hooks

* black reformatting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants