Skip to content

Conversation

@nishi-t
Copy link
Contributor

@nishi-t nishi-t commented Jul 28, 2018

This PR adds int8 support for nvidia gpus

@tqchen
Copy link
Member

tqchen commented Jul 28, 2018

@vinx13 since you are working on related stuff, can you take a look?

// directly 4 8 bit int in integer.
os << "int"; return;
enable_int8_ = true;
os << "char4"; return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tqchen do we need to support other lanes size? e.g. int4 if lanes == 16

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, this would actually be very helpful, @vinx13 can you elaborate what are possible things we need to support and value translation rules in here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rule here will be:

lanes() == 8 => int2 (aligned by 8)
lanes() == 16 => int4 (aligned by 16)

@tqchen tqchen added status: need update need update based on feedbacks and removed status: need review labels Jul 29, 2018
@tqchen
Copy link
Member

tqchen commented Jul 30, 2018

@nishi-t can you act on @vinx13 's comment? being able to perform full vector load will be very helpful to get the full perf.

Specifically, we want to be able to load 4 words from memory at a time, unfortunately there is no char16 struct so we have two solutions:

  • Use int2 and int4 for these vectors, while not support arithmatics for these as @vinx13 suggested.
  • Try to deifne a char16 and support arithmetics

int2 and int4 might be a easier path for now if we only need save/load and get nvidia's native support

@nishi-t
Copy link
Contributor Author

nishi-t commented Jul 30, 2018

@vinx13 Thank you for the suggestion. Could you please review again?
@tqchen For now, I addressed the comment by first solution. And, I'll try second solution seperately in another PR, because it may take a little while. Is this okay?

@tqchen
Copy link
Member

tqchen commented Jul 30, 2018

Sounds good, can you add test case to cover the load, possibly by a shared memory load vectorize?

@vinx13
Copy link
Member

vinx13 commented Jul 30, 2018

looks good from my side

@nishi-t
Copy link
Contributor Author

nishi-t commented Jul 31, 2018

@tqchen Yes, I'll address it.
@vinx13 Thank you.

@nishi-t
Copy link
Contributor Author

nishi-t commented Aug 1, 2018

@tqchen I addressed. Please review again.

@tqchen tqchen merged commit 40bf5e6 into apache:master Aug 1, 2018
@tqchen
Copy link
Member

tqchen commented Aug 1, 2018

Thanks @nishi-t @vinx13 , this is now merged

@tqchen tqchen added status: accepted and removed status: need update need update based on feedbacks labels Aug 1, 2018
tqchen pushed a commit to tqchen/tvm that referenced this pull request Aug 4, 2018
sergei-mironov pushed a commit to sergei-mironov/tvm that referenced this pull request Aug 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants