[WebGPU] Implement `tir.dp4a` with WGSL built-in function `dot4I8Packed` #16976

Jiawei-Shao · 2024-05-08T07:56:56Z

This patch implements tir.dp4a with WGSL built-in function dot4I8Packed() on WebGPU backend.

Here is an example to use tir.dp4a in WebGPU target:

n = te.var("n")
A = te.placeholder((n,), "int8x4", name="A")
B = te.placeholder((n,), "int8x4", name="B")
C = te.compute(A.shape, lambda i: tvm.tir.dp4a(A[i], B[i]), name="C")
s = te.create_schedule(C.op)
    
bx, tx = s[C].split(C.op.axis[0], factor=64)

s[C].bind(bx, te.thread_axis("blockIdx.x"))
s[C].bind(tx, te.thread_axis("threadIdx.x"))
mod = tvm.build(s, [A, B, C], tgt, name="dp4aTest")

Issue: #16627

This patch adds the support of `__dp4a(int8x4, int8x4)` as a pure extern method of WebGPU target. In the generated WGSL shader, `int8x4` will be translated into `u32`, and `__dp4a(int8x4, int8x4)` will be translated into the WGSL built-in function `dot4I8Packed(u32, u32)`. Here is an example to use `__dp4a` in WebGPU target: ``` n = te.var("n") A = te.placeholder((n,), "int8x4", name="A") B = te.placeholder((n,), "int8x4", name="B") C = te.compute(A.shape, lambda i: tvm.tir.call_pure_extern("int32", "__dp4a", A[i], B[i]), name="C") s = te.create_schedule(C.op) bx, tx = s[C].split(C.op.axis[0], factor=64) s[C].bind(bx, te.thread_axis("blockIdx.x")) s[C].bind(tx, te.thread_axis("threadIdx.x")) mod = tvm.build(s, [A, B, C], tgt, name="dp4aTest") ``` Issue: apache#16627

src/target/source/codegen_webgpu.cc

tqchen · 2024-05-09T12:44:30Z

src/target/source/intrin_rule_webgpu.cc

 // extra dispatch
 TVM_REGISTER_OP("tir.erf").set_attr<FLowerIntrinsic>("webgpu.FLowerIntrinsic", DispatchFastErf);

+TVM_REGISTER_OP("tir.dot4I8Packed").set_attr<FLowerIntrinsic>("webgpu.FLowerIntrinsic", DispatchPureExtern<Direct>);


sorry i was not being clear, for tir, it is better to have a common name dp4a (as this intrinsic shared across backends)

Oh I find there is no tir.dp4a in TVM right now, and I see in TVM dp4a is all called through call_pure_extern(): vulkan cuda

Do you mean we add tir.dp4a in TVM or still support dp4a as a pure external call like what dp4a is supported in codegen_spirv.cc?

we can add tir.dp4a intrinsic, and use it to lower to various places

@Jiawei-Shao do u mind followup

Oh sorry I was busy on some other urgent stuffs these days. I will go back to work on this next week. I will follow the steps to add tir.dp4a first.

Hi @tqchen,

Sorry for my late response. I've updated this PR. PTAL, thanks!

Jiawei-Shao · 2024-07-03T08:52:19Z

@tqchen Now the PR has passed all the tests. PTAL, thanks!

Jiawei-Shao added 2 commits May 8, 2024 15:54

Add validation

868b720

tqchen approved these changes May 8, 2024

View reviewed changes

tqchen requested changes May 8, 2024

View reviewed changes

src/target/source/codegen_webgpu.cc Outdated Show resolved Hide resolved

Add dot4I8Packed to WebGPU lower intrinsic

7743caf

Jiawei-Shao changed the title ~~[WebGPU] Support __dp4a(int8x4, int8x4) as a pure extern method~~ [WebGPU] Support dot4I8Packed(int8x4, int8x4) as a pure extern method May 9, 2024

tqchen reviewed May 9, 2024

View reviewed changes

Jiawei-Shao added 4 commits July 2, 2024 10:38

Merge branch 'main' into impl-webgpu-dp4a

4196b72

Implement builtin dp4a with dot4I8Packed

e736abc

Small fix

dd3b6d5

Add missing comment

056e18d

Jiawei-Shao changed the title ~~[WebGPU] Support dot4I8Packed(int8x4, int8x4) as a pure extern method~~ [WebGPU] Implement builtin::dp4a with WGSL built-in function dot4I8Packed Jul 2, 2024

Jiawei-Shao changed the title ~~[WebGPU] Implement builtin::dp4a with WGSL built-in function dot4I8Packed~~ [WebGPU] Implement tir.dp4a with WGSL built-in function dot4I8Packed Jul 2, 2024

Merge branch 'main' into impl-webgpu-dp4a

4600097

tqchen approved these changes Jul 4, 2024

View reviewed changes

tqchen merged commit 3e08e70 into apache:main Jul 4, 2024

ysh329 mentioned this pull request Jul 20, 2024

[Release] v0.17.0 Release Candidate Notes #17178

Closed

kurisu6912 mentioned this pull request Sep 5, 2025

kurisu add assume attr patch 1 tile-ai/tvm#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WebGPU] Implement `tir.dp4a` with WGSL built-in function `dot4I8Packed` #16976

[WebGPU] Implement `tir.dp4a` with WGSL built-in function `dot4I8Packed` #16976

Uh oh!

Jiawei-Shao commented May 8, 2024 •

edited

Loading

Uh oh!

Uh oh!

tqchen May 9, 2024 •

edited

Loading

Uh oh!

Jiawei-Shao May 10, 2024

Uh oh!

tqchen May 10, 2024 •

edited

Loading

Uh oh!

tqchen May 30, 2024

Uh oh!

Jiawei-Shao May 31, 2024

Uh oh!

Jiawei-Shao Jul 2, 2024

Uh oh!

Jiawei-Shao commented Jul 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WebGPU] Implement tir.dp4a with WGSL built-in function dot4I8Packed #16976

[WebGPU] Implement tir.dp4a with WGSL built-in function dot4I8Packed #16976

Uh oh!

Conversation

Jiawei-Shao commented May 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tqchen May 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jiawei-Shao May 10, 2024

Choose a reason for hiding this comment

Uh oh!

tqchen May 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tqchen May 30, 2024

Choose a reason for hiding this comment

Uh oh!

Jiawei-Shao May 31, 2024

Choose a reason for hiding this comment

Uh oh!

Jiawei-Shao Jul 2, 2024

Choose a reason for hiding this comment

Uh oh!

Jiawei-Shao commented Jul 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WebGPU] Implement `tir.dp4a` with WGSL built-in function `dot4I8Packed` #16976

[WebGPU] Implement `tir.dp4a` with WGSL built-in function `dot4I8Packed` #16976

Jiawei-Shao commented May 8, 2024 •

edited

Loading

tqchen May 9, 2024 •

edited

Loading

tqchen May 10, 2024 •

edited

Loading