-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[OpenCL] Fix OpenCL get_valid_counts errors due to intrinsic atomic_add #5857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
93c2d33 to
dc6b48e
Compare
wpan11nv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any unit test?
We would need to add Currently the CI doesn't test anything for opencl which is why we don't find out about these errors until much later. Do we know why we don't test opencl? |
9a19371 to
8f657a0
Compare
|
@wpan11nv Any more comments? |
wpan11nv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
@kazum can you take a look and manage the PR? Thanks. |
| if target == 'cuda': | ||
| # get_valid_count for cuda, opencl doesn't do data rearrangement | ||
| if target in ['cuda', 'opencl']: | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning here looks wrong to me. The test in the below link doesn't work for OpenCL too because we don't do data rearrangement for GPU nms implementation.
https://discuss.tvm.ai/t/nms-compile-fails-for-cuda-target-but-works-fine-for-llvm-target/7045/2
Probably, we should fix non_max_suppression for GPU first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenCL uses the same implementation as CUDA. The CUDA implementation of get_valid_counts was changed to no longer rearrange the output of get_valid_counts because it will be rearranged by NMS later anyway. This gives the correct output for NMS. See #5339
That issue with NMS looks to be a separate issue where the CUDA implementation wasn't fully updated to match changes to CPU implementation by #4312
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your explanation. Actually, I've successfully build NMS if I revert the change in #4312.
8f657a0 to
fa183ce
Compare
kazum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I'll merge this after CI is passed.
Thanks! |
|
@trevor-m please rebase against the master |
fa183ce to
1db9f1a
Compare
…dd (apache#5857) * [OpenCL] Fix atomic add used by get_valid_counts * Rename l -> load, add flag to enable atomics * Opencl doesn't do data rearrangement
…dd (apache#5857) * [OpenCL] Fix atomic add used by get_valid_counts * Rename l -> load, add flag to enable atomics * Opencl doesn't do data rearrangement
Some fixes a few months ago to the
get_valid_countsCUDA implementation broke OpenCL because of the atomic add intrinsic which was added.This PR fixes
get_valid_countsfor OpenCL with the following changes:intrinsic::tvm_address_ofto include storage scope (e.g.__global).cl_khr_global_int32_base_atomics. This isn't required for OpenCL 1.1+ because atomic_add became a core feature. I'm happy to remove this if we don't care about OpenCL 1.0. Alternatively we can overrideop->call_type == CallNode::PureExternand set a flag to enable this only whenatomic_addis actually used.Original error messages before this fix: