[ONNX] Add converter for QAttention from Microsoft onnxruntime contrib opset #13654

KJlaccHoeUM9l · 2022-12-23T12:00:19Z

This PR adds support for QAttention - quantized version of Attention from Microsoft onnxruntime contrib opset.
An explanation and illustration of how this layer works can be found, for example, in @lena-voita NLP course.

tvm-bot · 2022-12-23T12:00:22Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

cc @ehsanmok _{See #10317 for details}

_{Generated by tvm-bot}

…atmul_tensorcore_cuda

KJlaccHoeUM9l · 2022-12-26T17:15:14Z

Hello @vvchernov, @echuraev, @AndrewZhaoLuo!
Could you review this PR?

echuraev

LGTM. But I don't have a lot of knowledge in this codebase. @jwfromm, @AndrewZhaoLuo could you please take a look at this PR?

echuraev · 2022-12-28T06:26:13Z

python/tvm/relay/frontend/onnx.py

+        #  Currently only (batch, past_seq_length + seq_length) shape is supported.
+        mask_index = inputs[5]
+
+        # Scalar, which means a per-tensor/layer quantization


nit: You have absolutely the same comment for input[3] and input[7]

echuraev · 2022-12-28T06:35:06Z

python/tvm/relay/frontend/onnx.py

+                result,
+                _op.multiply(lhs_scale, rhs_scale),
+                zero_point_zero,
+                axis=-1,  # TODO(agladyshev): what is 'axis' parameter for?


Do you still need this todo comment?

AndrewZhaoLuo · 2022-12-28T07:53:17Z

Apologies, I've been quite sick. I'll try to look at this Thursday.

AndrewZhaoLuo

Thanks! Sorry for late review. LGTM

…b opset (apache#13654) * init QAttention converter * add type and shape checking * add test for QAttention * add tests for optional parameters * change mask_index shape * add support for 'past' input * add support for 'unidirectional' attribute * expand test coverage * fix lint * fix pylint * fix batch dimension for topi/cuda/batch_matmul_tensorcore.py::batch_matmul_tensorcore_cuda * code review fix

KJlaccHoeUM9l added 9 commits December 20, 2022 14:25

init QAttention converter

9f2b84c

add type and shape checking

635bef1

add test for QAttention

848775e

add tests for optional parameters

18057fe

change mask_index shape

e64ffc2

add support for 'past' input

ae3c53f

add support for 'unidirectional' attribute

7eb6fff

expand test coverage

7ac15e9

fix lint

6c38a20

KJlaccHoeUM9l added 2 commits December 23, 2022 15:55

fix pylint

27aadeb

fix batch dimension for topi/cuda/batch_matmul_tensorcore.py::batch_m…

9d2a99f

…atmul_tensorcore_cuda

echuraev reviewed Dec 28, 2022

View reviewed changes

code review fix

6f92a1b

AndrewZhaoLuo approved these changes Jan 3, 2023

View reviewed changes

AndrewZhaoLuo merged commit e24d4fb into apache:main Jan 3, 2023

KJlaccHoeUM9l deleted the agladyshev/dev/qattention branch January 10, 2023 12:02

This was referenced Jan 17, 2023

[ONNX] Extend converter for Attention from Microsoft onnxruntime contrib opset #13797

Merged

[Bug] Attention and QAttention don't work properly in some cases microsoft/onnxruntime#14363

Closed

ysh329 mentioned this pull request Apr 17, 2023

[Release] v0.12.0 Release Candidate Notes #14645

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ONNX] Add converter for QAttention from Microsoft onnxruntime contrib opset #13654

[ONNX] Add converter for QAttention from Microsoft onnxruntime contrib opset #13654

Uh oh!

KJlaccHoeUM9l commented Dec 23, 2022

Uh oh!

tvm-bot commented Dec 23, 2022

Uh oh!

KJlaccHoeUM9l commented Dec 26, 2022

Uh oh!

echuraev left a comment

Uh oh!

echuraev Dec 28, 2022

Uh oh!

KJlaccHoeUM9l Dec 28, 2022

Uh oh!

echuraev Dec 28, 2022

Uh oh!

KJlaccHoeUM9l Dec 28, 2022

Uh oh!

AndrewZhaoLuo commented Dec 28, 2022

Uh oh!

AndrewZhaoLuo left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ONNX] Add converter for QAttention from Microsoft onnxruntime contrib opset #13654

[ONNX] Add converter for QAttention from Microsoft onnxruntime contrib opset #13654

Uh oh!

Conversation

KJlaccHoeUM9l commented Dec 23, 2022

Uh oh!

tvm-bot commented Dec 23, 2022

Uh oh!

KJlaccHoeUM9l commented Dec 26, 2022

Uh oh!

echuraev left a comment

Choose a reason for hiding this comment

Uh oh!

echuraev Dec 28, 2022

Choose a reason for hiding this comment

Uh oh!

KJlaccHoeUM9l Dec 28, 2022

Choose a reason for hiding this comment

Uh oh!

echuraev Dec 28, 2022

Choose a reason for hiding this comment

Uh oh!

KJlaccHoeUM9l Dec 28, 2022

Choose a reason for hiding this comment

Uh oh!

AndrewZhaoLuo commented Dec 28, 2022

Uh oh!

AndrewZhaoLuo left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants