Skip to content

Conversation

@masahi
Copy link
Member

@masahi masahi commented Sep 22, 2022

The logic in

# How to convert data to int8
# Original --> C = A (conv) B
# A and B are int8
# C = (A + 128 - 128) (conv) B
# C = (A' conv B) - 128 (conv) B
# where A' = A + 128
# and 128 (conv) B is basically a reduce on CRS axis for weights.
#
# How to convert data to uint8
# C = (A - 128 + 128) (conv) B
# C = (A' conv B) + 128 (conv) B
# where A' = A - 128
if data_dtype == "int8":
# shift data to int8
before_shift = relay.add
after_shift = relay.subtract
else:
# shift data to uint8
before_shift = relay.subtract
after_shift = relay.add
is supposed to legalize the input dtype to be able to apply target-specific intrinsics that only support one of int8 or uint8. For example, the x86 VNNI instruction only supports uint8 activation.

But the logic is incorrect (two cases are flipped) and leads to incorrect result in the following case:

  • The input activation is int8, and we want to use the x86 VNNI intrinsic which only supports uint8 activations.
  • The input activation is uint8, and we want to use the ARM sdot intrinsic which only supports int8 activations.

The first case also applies to the Hexagon vrmpy intrinsic. I found this bug while testing vrmpy conv2d on int8 input.

To test this on CI, we need to be running on a cascadelake or ARM v8.2 (with dot product support) instance. I cannot find a way to detect such cpu feature from a python script. try / catch doesn't work because the error is raised from LLVM (LLVM ERROR: Do not know how to split the result of this operator) that I don't know how to catch. So for now the test is skipped. Any suggestion on this? @areusch @driazati

cc @tkonolige @mbrookhart

Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @masahi for fixing my mistakes!

@tkonolige tkonolige merged commit 195ae72 into apache:main Sep 22, 2022
@driazati
Copy link
Member

To test this on CI

Could we have a short test that triggers the LLVM error that we run in a subprocess? That way we could just read the stdout of the process to get the C++ exception and detect the feature.

xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
…he#12865)

The logic in `python/tvm/topi/generic/conv2d.py#L480-L499` is supposed to legalize the input dtype to be able to apply target-specific intrinsics that only support one of int8 or uint8. For example, the x86 VNNI instruction only supports uint8 activation. 

But the logic is incorrect (two cases are flipped) and leads to incorrect result in the following case:

* The input activation is int8, and we want to use the x86 VNNI intrinsic which only supports uint8 activations.
* The input activation is uint8, and we want to use the ARM `sdot` intrinsic which only supports int8 activations.

The first case also applies to the Hexagon `vrmpy` intrinsic. I found this bug while testing `vrmpy` conv2d on int8 input.

To test this on CI, we need to be running on a cascadelake or ARM v8.2 (with dot product support) instance. I cannot find a way to detect such cpu feature from a python script. `try / catch` doesn't work because the error is raised from LLVM (`LLVM ERROR: Do not know how to split the result of this operator`) that I don't know how to catch. So for now the test is skipped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants