gfx908 optimizations#8082
gfx908 optimizations#8082IMbackK wants to merge 1 commit intoggml-org:masterfrom IMbackK:gfx908_small_v2
Conversation
|
Which specific GPU are you using? |
|
gfx908 aka MI100, gfx90a aka mi200 family should have completely identical performance characteristics. |
|
If this gets merged I'm going to have to fire up my server with the 2X MI100s and give it another try. I never understood why they were pretty much the same speed as my W6800 Pro's. |
Well this ofc dosent help at all with token generation as gemv is the most time consumeing kenrel thair. In generall looking at omniperf, there is quite some distance to go there for decent performance |
|
closed due to obsoletion for now, i will reopen with a rebased version at some point in the future |
This minor optimization work increases CDNA performance by around 10x.
Current master:
This pr:
As now most of the of the remaining time is spent in attn kernels, merging #7011 further increases performance by 2x