Integrate high-performance x64 gemm library to MLAS#17669
Integrate high-performance x64 gemm library to MLAS#17669yufenglee merged 115 commits intomicrosoft:mainfrom
Conversation
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
|
/azp run Windows ARM64 QNN CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows x64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed |
|
/azp run Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 8 pipeline(s). |
|
/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 7 pipeline(s). |
|
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline |
|
Azure Pipelines successfully started running 9 pipeline(s). |
|
Thanks Louyu! |
…icrosoft#19015) Allow MatMulNBits `accuracy_level` attribute (added in microsoft#17669) to be set to a particular value when the model is quantized.
### Description <!-- Describe your changes. --> Revert PR#19016 microsoft/onnxruntime#19016 Revert PR#17669 microsoft/onnxruntime#17669
### Description <!-- Describe your changes. --> Revert PR#19016 microsoft/onnxruntime#19016 Revert PR#17669 microsoft/onnxruntime#17669
Description
Improve MLAS to support high-performance x64 INT4 kernels
Motivation and Context
Tasks
Benchmark
Ubuntu 20.22 + Intel(R) Xeon(R) Platinum 8480+ 56 cores
Reference:
Win11+12900K 8 cores: