-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[TVM][CUDA] NVIDIA GPU Int8 Support #1503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@vinx13 since you are working on related stuff, can you take a look? |
src/codegen/codegen_cuda.cc
Outdated
| // directly 4 8 bit int in integer. | ||
| os << "int"; return; | ||
| enable_int8_ = true; | ||
| os << "char4"; return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tqchen do we need to support other lanes size? e.g. int4 if lanes == 16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, this would actually be very helpful, @vinx13 can you elaborate what are possible things we need to support and value translation rules in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rule here will be:
lanes() == 8 => int2 (aligned by 8)
lanes() == 16 => int4 (aligned by 16)
|
@nishi-t can you act on @vinx13 's comment? being able to perform full vector load will be very helpful to get the full perf. Specifically, we want to be able to load 4 words from memory at a time, unfortunately there is no char16 struct so we have two solutions:
int2 and int4 might be a easier path for now if we only need save/load and get nvidia's native support |
|
Sounds good, can you add test case to cover the load, possibly by a shared memory load vectorize? |
|
looks good from my side |
|
@tqchen I addressed. Please review again. |
This PR adds int8 support for nvidia gpus