Support mixed precision in quantization for RTN by jiafatom · Pull Request #24401 · microsoft/onnxruntime

jiafatom · 2025-04-12T00:29:54Z

Description

Support mixed precision in quantization for RTN

Motivation and Context

More flexible for quantization
Usage:

customized_weight_config = {}

for i in layers_to_exclude:
    customized_weight_config["/model/layers."+str(i)+"/MatMul"] = {"bits": 8}

algo_config = matmul_4bits_quantizer.RTNWeightOnlyQuantConfig(customized_weight_config=customized_weight_config)
quant = MatMul4BitsQuantizer(
    model=onnx_model,
    block_size=32,
    is_symmetric=False,
    accuracy_level=4,
    nodes_to_exclude=nodes_to_exclude,
    algo_config=algo_config,
)

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py

fajin-corp

how do you plan to pass the custom configs from command line?

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py

jiafatom · 2025-04-14T17:57:29Z

how do you plan to pass the custom configs from command line?

Since we need pass a dictionary, it is not easy to pass custom config from command line.
I was thinking that this is an internal tool for now.
Do we have the requirement to pass the custom config from command line? Any similar examples?

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py

fajin-corp

### Description Support mixed precision in quantization for RTN ### Motivation and Context More flexible for quantization Usage: ``` customized_weight_config = {} for i in layers_to_exclude: customized_weight_config["/model/layers."+str(i)+"/MatMul"] = {"bits": 8} algo_config = matmul_4bits_quantizer.RTNWeightOnlyQuantConfig(customized_weight_config=customized_weight_config) quant = MatMul4BitsQuantizer( model=onnx_model, block_size=32, is_symmetric=False, accuracy_level=4, nodes_to_exclude=nodes_to_exclude, algo_config=algo_config, ) ```

github-actions bot reviewed Apr 12, 2025

View reviewed changes

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Outdated Show resolved Hide resolved

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Apr 12, 2025

View reviewed changes

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Fixed Show fixed Hide fixed

jiafatom force-pushed the mixed branch from eb458b7 to 96d9bbe Compare April 12, 2025 02:02

github-actions bot reviewed Apr 12, 2025

View reviewed changes

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Show resolved Hide resolved

jiafatom force-pushed the mixed branch from 96d9bbe to 574646e Compare April 12, 2025 02:06

jiafatom requested a review from fajin-corp April 12, 2025 02:07

jiafatom force-pushed the mixed branch 2 times, most recently from e9e8ed6 to 0f8b88a Compare April 14, 2025 17:52

fajin-corp reviewed Apr 14, 2025

View reviewed changes

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Outdated Show resolved Hide resolved

fajin-corp reviewed Apr 14, 2025

View reviewed changes

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Show resolved Hide resolved

jiafatom force-pushed the mixed branch from 0f8b88a to 2414dda Compare April 14, 2025 18:25

github-actions bot reviewed Apr 14, 2025

View reviewed changes

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Show resolved Hide resolved

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Show resolved Hide resolved

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Show resolved Hide resolved

github-advanced-security bot found potential problems Apr 14, 2025

View reviewed changes

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Fixed Show fixed Hide fixed

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Fixed Show fixed Hide fixed

jiafatom force-pushed the mixed branch 2 times, most recently from 65f8bd4 to 9f0f408 Compare April 14, 2025 18:31

github-actions bot reviewed Apr 14, 2025

View reviewed changes

onnxruntime/python/tools/quantization/matmul_4bits_quantizer.py Show resolved Hide resolved

jiafatom force-pushed the mixed branch from 9f0f408 to d63bfd8 Compare April 14, 2025 19:58

Support mixed precision in quantization for RTN

8461b8a

jiafatom force-pushed the mixed branch from d63bfd8 to 8461b8a Compare April 14, 2025 20:19

fajin-corp approved these changes Apr 14, 2025

View reviewed changes

jiafatom merged commit d205bb7 into main Apr 14, 2025
85 of 89 checks passed

jiafatom deleted the mixed branch April 14, 2025 23:44

Conversation

jiafatom commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fajin-corp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jiafatom commented Apr 14, 2025

Uh oh!

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fajin-corp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiafatom commented Apr 12, 2025 •

edited

Loading