Conversation
|
/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline,Windows GPU WebGPU CI Pipeline,Windows OpenVINO CI Pipeline |
|
/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline |
|
/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI |
|
/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 2 pipeline(s). |
|
Azure Pipelines successfully started running 3 pipeline(s). |
|
Azure Pipelines successfully started running 4 pipeline(s). |
onnxruntime/core/providers/webnn/builders/impl/matMulNBits_op_builder.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/webnn/builders/impl/matMulNBits_op_builder.cc
Outdated
Show resolved
Hide resolved
MatMulNBits op can be simply emulated by DequantizeLinear + Transpose + MatMul and currently only 4-bit quantization is supported. Thus the B and zero_points (if present) inputs must be known as initializers with data type 'uint8' and we need to register it them as 'uint4' WebNN constant. Typically, all initializers are registered as WebNN constants in one step via `ModelBuilder::RegisterInitializers` before constructing the WebNN graph. However, due to WebNN doesn't support cast to 'uint4', we need to defer the registration of these two inputs until the `MatMulNBitsBuilder::AddToModelBuilderImpl` is invoked.
7693135 to
f372f79
Compare
|
@fdwr, thanks for your comments, I've fixed them in the new commit, PTAL again. |
|
/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline,Windows GPU WebGPU CI Pipeline,Windows OpenVINO CI Pipeline |
|
/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline |
|
/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI |
|
/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 2 pipeline(s). |
|
Azure Pipelines successfully started running 3 pipeline(s). |
|
Azure Pipelines successfully started running 4 pipeline(s). |
MatMulNBits op can be simply emulated by DequantizeLinear + Transpose + MatMul and currently only 4-bit quantization is supported. Thus the B and zero_points (if present) inputs must be known as initializers with data type 'uint8' and we need to register them as 'uint4' WebNN constant. Typically, all initializers are registered as WebNN constants in one step via `ModelBuilder::RegisterInitializers` before constructing the WebNN graph. However, due to WebNN doesn't support cast to 'uint4', we need to defer the registration of these two inputs until the `MatMulNBitsBuilder::AddToModelBuilderImpl` is invoked.
MatMulNBits op can be simply emulated by DequantizeLinear + Transpose + MatMul and currently only 4-bit quantization is supported.
Thus the B and zero_points (if present) inputs must be known as initializers with data type 'uint8' and we need to register them as 'uint4' WebNN constant.
Typically, all initializers are registered as WebNN constants in one step via
ModelBuilder::RegisterInitializersbefore constructing the WebNN graph. However, due to WebNN doesn't support cast to 'uint4', we need to defer the registration of these two inputs until theMatMulNBitsBuilder::AddToModelBuilderImplis invoked.