gfx1010 optimizations#8085
Conversation
1d1754f to
46923c6
Compare
|
|
Thanks for the information. I'll test it better in a couple of hours but for now using a value of 64 instead of the default 128 of the master branch I manage 275 t/s for the prompt processing. |
|
The performance boost is consistent to 275 t/s and the output works fine, however I'm having some trouble to add a check in @JohannesGaessler any idea about how to solve this? |
|
Does this pr affect RDNA3? I really can use some optimizations. |
Not at all, this PR just tunes some parameters on Navi 10 that are already tuned in the 7000 series. |
|
I would patch it like this: |
|
@JohannesGaessler I've applied the change. I still think this isn't the best way to do it because if different values are needed for different cards this can result in something messy, maybe using a normal if statement so that the row isn't too long, however I haven't been able to make it work that way. EDIT: Apparently all the issues I've been having are caused by the check on int8_mma_available not working as intended. Just removing it in the if check makes everything work again. |
Yes,
It is working as intended, you are just not using it as intended. The lowercase |
|
I see. In this case would it be better to keep it like this: Or go like this: ? |
|
You cannot do the second one. |
|
Okay, I actually wanted to specify for RDNA1 because I wasn't sure of the effects it could have on the pr #8082. |
c4005a9 to
e4accb8
Compare
|
Sorry, actually it has to be done the way you had it with an RDNA1 check. On AMD you cannot do a simple check against a number because there is no sensible value for |
e4accb8 to
68b57ed
Compare
|
I think this may be ready to merge, once all the checks are completed. Thanks for the tips on how to improve it. |
Reading @IMbackK 's PR #8082 I've noticed that RDNA1 cards can also benefit from a small performance gain just by adjusting the same values as that PR.
This is still far from the performance pre #7716 (RDNA1 cards suffered a 50% performance drop with that) but it's still a good performance improvement.
Thanks again to @IMbackK for his PR as I wouldn't have noticed this without it.