Skip to content

Support mixed precision in quantization for RTN#24401

Merged
jiafatom merged 1 commit intomainfrom
mixed
Apr 14, 2025
Merged

Support mixed precision in quantization for RTN#24401
jiafatom merged 1 commit intomainfrom
mixed

Conversation

@jiafatom
Copy link
Contributor

@jiafatom jiafatom commented Apr 12, 2025

Description

Support mixed precision in quantization for RTN

Motivation and Context

More flexible for quantization
Usage:

customized_weight_config = {}

for i in layers_to_exclude:
    customized_weight_config["/model/layers."+str(i)+"/MatMul"] = {"bits": 8}

algo_config = matmul_4bits_quantizer.RTNWeightOnlyQuantConfig(customized_weight_config=customized_weight_config)
quant = MatMul4BitsQuantizer(
    model=onnx_model,
    block_size=32,
    is_symmetric=False,
    accuracy_level=4,
    nodes_to_exclude=nodes_to_exclude,
    algo_config=algo_config,
)

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Copy link
Contributor

@fajin-corp fajin-corp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you plan to pass the custom configs from command line?

@jiafatom
Copy link
Contributor Author

how do you plan to pass the custom configs from command line?

Since we need pass a dictionary, it is not easy to pass custom config from command line.
I was thinking that this is an internal tool for now.
Do we have the requirement to pass the custom config from command line? Any similar examples?

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

@jiafatom jiafatom force-pushed the mixed branch 2 times, most recently from 65f8bd4 to 9f0f408 Compare April 14, 2025 18:31
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Copy link
Contributor

@fajin-corp fajin-corp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@jiafatom jiafatom merged commit d205bb7 into main Apr 14, 2025
85 of 89 checks passed
@jiafatom jiafatom deleted the mixed branch April 14, 2025 23:44
ashrit-ms pushed a commit that referenced this pull request Apr 24, 2025
### Description
Support mixed precision in quantization for RTN



### Motivation and Context
More flexible for quantization
Usage:
```
customized_weight_config = {}

for i in layers_to_exclude:
    customized_weight_config["/model/layers."+str(i)+"/MatMul"] = {"bits": 8}

algo_config = matmul_4bits_quantizer.RTNWeightOnlyQuantConfig(customized_weight_config=customized_weight_config)
quant = MatMul4BitsQuantizer(
    model=onnx_model,
    block_size=32,
    is_symmetric=False,
    accuracy_level=4,
    nodes_to_exclude=nodes_to_exclude,
    algo_config=algo_config,
)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants