Add MatMulNBits accuracy_level parameter to quantization utilities. by edgchen1 · Pull Request #19015 · microsoft/onnxruntime

edgchen1 · 2024-01-05T01:59:33Z

Description

Add MatMulNBits accuracy_level parameter to quantization utilities.

Motivation and Context

Allow MatMulNBits accuracy_level attribute (added in #17669) to be set to a particular value when the model is quantized.

onnxruntime/python/tools/transformers/models/llama/convert_to_onnx.py

edgchen1 · 2024-01-05T19:09:16Z

I verified that an int4 model can be produced with the expected MatMulNBits accuracy_level attributes present.

…icrosoft#19015) Allow MatMulNBits `accuracy_level` attribute (added in microsoft#17669) to be set to a particular value when the model is quantized.

edgchen1 added 2 commits January 4, 2024 16:05

Add int4 accuracy_level configuration to quantization utilities.

71c609d

use opset 14 for export

f78e280

edgchen1 requested review from kunal-vaishnavi and yufenglee January 5, 2024 01:59

edgchen1 commented Jan 5, 2024

View reviewed changes

onnxruntime/python/tools/transformers/models/llama/convert_to_onnx.py Show resolved Hide resolved

update other llama scripts to use 'from __future__ import annotations'

729f0b8

kunal-vaishnavi reviewed Jan 5, 2024

View reviewed changes

onnxruntime/python/tools/transformers/models/llama/convert_to_onnx.py Show resolved Hide resolved

yufenglee approved these changes Jan 5, 2024

View reviewed changes

kunal-vaishnavi approved these changes Jan 5, 2024

View reviewed changes

edgchen1 merged commit 4190c29 into main Jan 5, 2024

edgchen1 deleted the edgchen1/int4_quantization_accuracy_level branch January 5, 2024 22:51

kunal-vaishnavi mentioned this pull request Jan 8, 2024

[Documentation] LLaMa2 tutorial runs into ONNX opset version error #19051

Closed

kunal-vaishnavi mentioned this pull request Jan 26, 2024

llama_v2_7b_16h stopped working with torch.jit.trace pytorch/pytorch#117752

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MatMulNBits accuracy_level parameter to quantization utilities.#19015

Add MatMulNBits accuracy_level parameter to quantization utilities.#19015
edgchen1 merged 3 commits intomainfrom
edgchen1/int4_quantization_accuracy_level

edgchen1 commented Jan 5, 2024

Uh oh!

Uh oh!

Uh oh!

edgchen1 commented Jan 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

edgchen1 commented Jan 5, 2024

Description

Motivation and Context

Uh oh!

Uh oh!

Uh oh!

edgchen1 commented Jan 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants