Skip to content

[webgpu] Add Matmul8bits Support#24546

Merged
sushraja-msft merged 22 commits intomainfrom
matmul8bits
May 6, 2025
Merged

[webgpu] Add Matmul8bits Support#24546
sushraja-msft merged 22 commits intomainfrom
matmul8bits

Conversation

@qjia7
Copy link
Contributor

@qjia7 qjia7 commented Apr 25, 2025

Description

This PR adds the support for 8-bit quantization in the MatMulNBits operation in WebGPU.

It does below things:

  1. Unify to use MatMulNBitsProgram as the fallback path which is the original generation path for block size = 32. Now make it support any blocks size without limitations. And remove the original complicated programs.
  2. Enable MatMulNBitsWideTileProgram for all platforms.

@qjia7 qjia7 marked this pull request as ready for review April 28, 2025 14:39
@qjia7 qjia7 requested review from guschmue and sushraja-msft April 28, 2025 14:39
@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Apr 28, 2025
@qjia7 qjia7 marked this pull request as draft April 29, 2025 08:58
@qjia7
Copy link
Contributor Author

qjia7 commented Apr 29, 2025

@sushanthr As we discussed offline, split the int8 support for dp4/subgroupMatrix into a separate PR #24590 to simplify the review. The corresponding comments have been resolved in that PR. Thanks.

guschmue pushed a commit that referenced this pull request Apr 30, 2025
This PR enables matmul8bits for the dp4/subgroupMatrix path in webgpu.

This PR is separated from #24546 for easier review.
@qjia7 qjia7 marked this pull request as ready for review April 30, 2025 12:38
sushraja-msft
sushraja-msft previously approved these changes May 2, 2025
@sushraja-msft sushraja-msft merged commit 5160c67 into main May 6, 2025
91 of 98 checks passed
@sushraja-msft sushraja-msft deleted the matmul8bits branch May 6, 2025 16:38
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request May 12, 2025
)

This PR enables matmul8bits for the dp4/subgroupMatrix path in webgpu.

This PR is separated from microsoft#24546 for easier review.
baijumeswani pushed a commit that referenced this pull request May 14, 2025
This PR enables matmul8bits for the dp4/subgroupMatrix path in webgpu.

This PR is separated from #24546 for easier review.
baijumeswani pushed a commit that referenced this pull request May 14, 2025
This PR enables matmul8bits for the dp4/subgroupMatrix path in webgpu.

This PR is separated from #24546 for easier review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants