[WebNN] Support MatMulNBits op by Honry · Pull Request #24142 · microsoft/onnxruntime

Honry · 2025-03-24T07:53:20Z

MatMulNBits op can be simply emulated by DequantizeLinear + Transpose + MatMul and currently only 4-bit quantization is supported.

Thus the B and zero_points (if present) inputs must be known as initializers with data type 'uint8' and we need to register them as 'uint4' WebNN constant.

Typically, all initializers are registered as WebNN constants in one step via ModelBuilder::RegisterInitializers before constructing the WebNN graph. However, due to WebNN doesn't support cast to 'uint4', we need to defer the registration of these two inputs until the MatMulNBitsBuilder::AddToModelBuilderImpl is invoked.

Honry · 2025-03-24T07:53:47Z

@fdwr, @guschmue, PTAL, thanks!

fdwr · 2025-04-07T04:57:05Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline,Windows GPU WebGPU CI Pipeline,Windows OpenVINO CI Pipeline

fdwr · 2025-04-07T04:57:08Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

fdwr · 2025-04-07T04:57:10Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI

fdwr · 2025-04-07T04:57:12Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2025-04-07T04:57:14Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2025-04-07T04:57:21Z

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines · 2025-04-07T04:57:25Z

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines · 2025-04-07T04:57:25Z

Azure Pipelines successfully started running 4 pipeline(s).

fdwr

Few comments, else LGTM.

onnxruntime/core/providers/webnn/builders/helper.cc

onnxruntime/core/providers/webnn/builders/impl/matMulNBits_op_builder.cc

onnxruntime/core/providers/webnn/builders/model_builder.cc

MatMulNBits op can be simply emulated by DequantizeLinear + Transpose + MatMul and currently only 4-bit quantization is supported. Thus the B and zero_points (if present) inputs must be known as initializers with data type 'uint8' and we need to register it them as 'uint4' WebNN constant. Typically, all initializers are registered as WebNN constants in one step via `ModelBuilder::RegisterInitializers` before constructing the WebNN graph. However, due to WebNN doesn't support cast to 'uint4', we need to defer the registration of these two inputs until the `MatMulNBitsBuilder::AddToModelBuilderImpl` is invoked.

Honry · 2025-04-07T06:43:46Z

@fdwr, thanks for your comments, I've fixed them in the new commit, PTAL again.

fdwr

👍

fdwr · 2025-04-08T01:25:16Z

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline,Windows GPU WebGPU CI Pipeline,Windows OpenVINO CI Pipeline

fdwr · 2025-04-08T01:25:19Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

fdwr · 2025-04-08T01:25:22Z

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI

fdwr · 2025-04-08T01:25:23Z

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

azure-pipelines · 2025-04-08T01:25:26Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2025-04-08T01:25:34Z

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines · 2025-04-08T01:25:38Z

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines · 2025-04-08T01:25:43Z

Azure Pipelines successfully started running 4 pipeline(s).

MatMulNBits op can be simply emulated by DequantizeLinear + Transpose + MatMul and currently only 4-bit quantization is supported. Thus the B and zero_points (if present) inputs must be known as initializers with data type 'uint8' and we need to register them as 'uint4' WebNN constant. Typically, all initializers are registered as WebNN constants in one step via `ModelBuilder::RegisterInitializers` before constructing the WebNN graph. However, due to WebNN doesn't support cast to 'uint4', we need to defer the registration of these two inputs until the `MatMulNBitsBuilder::AddToModelBuilderImpl` is invoked.

guschmue added the ep:WebNN WebNN execution provider label Mar 24, 2025

fdwr reviewed Apr 7, 2025

View reviewed changes

Honry added 3 commits April 7, 2025 14:07

Fix lint error

25676fe

Addressed comments

f372f79

Honry force-pushed the support-matmulnbits branch from 7693135 to f372f79 Compare April 7, 2025 06:42

fdwr approved these changes Apr 8, 2025

View reviewed changes

fdwr merged commit 5612ce5 into microsoft:main Apr 9, 2025
70 of 76 checks passed

peishenyan mentioned this pull request Apr 10, 2025

[WebNN EP] Support GroupQueryAttention(GQA) #23416

Merged

Conversation

Honry commented Mar 24, 2025

Uh oh!

Honry commented Mar 24, 2025

Uh oh!

fdwr commented Apr 7, 2025

Uh oh!

fdwr commented Apr 7, 2025

Uh oh!

fdwr commented Apr 7, 2025

Uh oh!

fdwr commented Apr 7, 2025

Uh oh!

azure-pipelines bot commented Apr 7, 2025

Uh oh!

azure-pipelines bot commented Apr 7, 2025

Uh oh!

azure-pipelines bot commented Apr 7, 2025

Uh oh!

azure-pipelines bot commented Apr 7, 2025

Uh oh!

fdwr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Honry commented Apr 7, 2025

Uh oh!

fdwr left a comment

Choose a reason for hiding this comment

Uh oh!

fdwr commented Apr 8, 2025

Uh oh!

fdwr commented Apr 8, 2025

Uh oh!

fdwr commented Apr 8, 2025

Uh oh!

fdwr commented Apr 8, 2025

Uh oh!

azure-pipelines bot commented Apr 8, 2025

Uh oh!

azure-pipelines bot commented Apr 8, 2025

Uh oh!

azure-pipelines bot commented Apr 8, 2025

Uh oh!

azure-pipelines bot commented Apr 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants