[TOPI] Add dp4a intrinsic to CUDA #1707

vinx13 · 2018-09-12T08:21:52Z

Added dp4a intrinsic to TOPI, and refactored gemm_int8 recipe. And then I will send int8 conv2d using dp4a in the next PR.

cc @tqchen @merrymercy

tqchen · 2018-09-12T16:14:32Z

topi/python/topi/cuda/int8_intrinsics.py

+import tvm
+
+
+def _intrin_dp4a_reduce(x_scope, y_scope, z_scope):


let us put it under cuda/tensor_intrin.py and rename to dp4a

vinx13 · 2018-09-13T03:20:26Z

@tqchen I have renamed the filename and dp4a, please review.

tqchen · 2018-09-13T17:04:25Z

topi/python/topi/cuda/tensor_intrin.py

+
+    Parameters
+    ----------
+    x_scope: The storage scope of buffer for lhs


add data type, see https://docs.tvm.ai/contribute/document.html#document-python

Maybe it makes sense to default everything to local

topi/python/topi/cuda/tensor_intrin.py

tqchen · 2018-09-13T17:05:44Z

topi/python/topi/cuda/tensor_intrin.py

+
+    Parameters
+    ----------
+    x_scope: The storage scope of buffer for lhs


Maybe it makes sense to default everything to local

tqchen · 2018-09-14T16:19:45Z

Thanks @vinx13 this is now merged!

vinx13 force-pushed the topi/dp4a_intrin branch from a4fa7f6 to 9eb96b9 Compare September 12, 2018 08:23

[TOPI] Add dp4a intrinsic to CUDA

a17e5c7

vinx13 force-pushed the topi/dp4a_intrin branch from 9eb96b9 to a17e5c7 Compare September 12, 2018 08:59

tqchen requested changes Sep 12, 2018

View reviewed changes

vinx13 added 2 commits September 13, 2018 10:06

Rename int8_intrinsics.py -> tensor_intrin.py

603afa3

Rename variable to fix lint

77f6560

tqchen requested changes Sep 13, 2018

View reviewed changes

tqchen self-assigned this Sep 13, 2018

tqchen added the status: need update need update based on feedbacks label Sep 13, 2018

Improve doc

23c65be

tqchen approved these changes Sep 14, 2018

View reviewed changes

tqchen merged commit edf0967 into apache:master Sep 14, 2018

tqchen added status: accepted and removed status: need update need update based on feedbacks labels Sep 14, 2018

tqchen mentioned this pull request Sep 17, 2018

INT8 conv operator implementation with NCHWc data layout for Intel machines #1680

Merged

FrozenGene pushed a commit to FrozenGene/tvm that referenced this pull request Dec 27, 2018

[TOPI] Add dp4a intrinsic to CUDA (apache#1707)

25c216b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TOPI] Add dp4a intrinsic to CUDA #1707

[TOPI] Add dp4a intrinsic to CUDA #1707

Uh oh!

vinx13 commented Sep 12, 2018

Uh oh!

tqchen Sep 12, 2018

Uh oh!

vinx13 commented Sep 13, 2018

Uh oh!

tqchen Sep 13, 2018

Uh oh!

tqchen Sep 13, 2018

Uh oh!

Uh oh!

tqchen Sep 13, 2018

Uh oh!

tqchen commented Sep 14, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		import tvm


		def _intrin_dp4a_reduce(x_scope, y_scope, z_scope):

[TOPI] Add dp4a intrinsic to CUDA #1707

[TOPI] Add dp4a intrinsic to CUDA #1707

Uh oh!

Conversation

vinx13 commented Sep 12, 2018

Uh oh!

tqchen Sep 12, 2018

Choose a reason for hiding this comment

Uh oh!

vinx13 commented Sep 13, 2018

Uh oh!

tqchen Sep 13, 2018

Choose a reason for hiding this comment

Uh oh!

tqchen Sep 13, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tqchen Sep 13, 2018

Choose a reason for hiding this comment

Uh oh!

tqchen commented Sep 14, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants