vulkan: Use spec constants for conv2d s/d/p and kernel W/H#16978
vulkan: Use spec constants for conv2d s/d/p and kernel W/H#16978jeffbolznv merged 2 commits intoggml-org:masterfrom
Conversation
|
What's with that auroralabs-loci bot repeatedly mirroring all our PRs? |
|
That whole account seems to be managed by bots so I guess it's malfunctioning? Here there are the performance numbers on my AMD gpus: |
8267cc2 to
ca455a3
Compare
|
Changed the outer loop to |
|
I was curious how this affects compilation and run time, so I compared some vision models (RTX 4070, CM2): Before:
After:
The compile time probably fluctuates quite a bit, and caching works well anyway. The mixed Transformer+Conv2D models didn't really improve (for various reasons I suspect). ESRGAN is pure Conv2D and it shows. My takeaway is that spec-const-all-the-things is probably good, or at least not bad :) |
|
|
||
| vk_pipeline pipeline = nullptr; | ||
|
|
||
| auto it = pipelines->find(conv2d_pipeline_state); |
There was a problem hiding this comment.
This will need to be rebased on #17024 and hold a lock when searching the map.
There was a problem hiding this comment.
I merged that, you can finish this PR and merge when you're done.
|
Here's how it looks on my RX 470: PR: Master: |
|
Looks good, improvements all around. Performance results |
Also add some additional unroll hints, which seems to help.
ca455a3 to
777bc35
Compare
…16978) * vulkan: Use spec constants for conv2d s/d/p and kernel W/H Also add some additional unroll hints, which seems to help. * lock around map lookup
* vulkan: Use spec constants for conv2d s/d/p and kernel W/H Also add some additional unroll hints, which seems to help. * lock around map lookup
…16978) * vulkan: Use spec constants for conv2d s/d/p and kernel W/H Also add some additional unroll hints, which seems to help. * lock around map lookup
Also add some additional unrolling, which seems to help.