Skip to content

Conversation

@masahi
Copy link
Member

@masahi masahi commented Apr 17, 2021

I found that #7833 introduced severe perf regression for vk + dGPU. Running cuda_gemm_square.py on R9 nano, I found that vk result is extremely slow:

Device opencl                                                                                                                                       
average time cost of 10 runs = 3.57431 ms, 4806.48 GFLOPS.

Device vulkan                   
average time cost of 10 runs = 39.8278 ms, 431.354 GFLOPS.

After bisecting, I found that I had a typo in #7833 resulting in using a wrong memory type for device buffers 🤦‍♂️ Interestingly, there is no regression for APU, so I didn't notice during development. Now this has been fixed and vk performance is great as before:

Device opencl
average time cost of 10 runs = 3.56969 ms, 4812.7 GFLOPS.

Device vulkan
average time cost of 10 runs = 3.04882 ms, 5634.93 GFLOPS.

cc @tqchen

@masahi masahi changed the title [Hotfix] Typo in Vulkan runtime change causing severe perf regression [Hotfix] Typo in Vulkan runtime change causing severe perf regression for dGPU Apr 17, 2021
@tqchen tqchen merged commit 6d0386b into apache:main Apr 17, 2021
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Apr 22, 2021
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants