[Relay] QLinearMatMul allows 1D weight_scale, weight_zero_point inputs #9946

yuanfz98 · 2022-01-16T14:28:11Z

Hello,

This PR is to fix #9908. As _qnn.op.dense can already accept 1D vectors (

Line 552 in 6eb4ed8

and optionally the kernel_zero_point will be 1-D vectors instead of scalars.

), we need to enable this feature in QLinearMatMul.

In #9908, the expected behaviour is tested by following code:

import onnx
import torch
from tvm import relay
import tvm 
import onnxruntime
 
from tvm.testing import assert_allclose

onnx_model = onnx.load("dense_h32_w32_c3_sNone_pNone_kNone.pt_quant.onnx")
onnx.checker.check_model(onnx_model)

ort_session = onnxruntime.InferenceSession("dense_h32_w32_c3_sNone_pNone_kNone.pt_quant.onnx")
x = torch.randn(1, 3, 32, 32, requires_grad=True) 

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()
 
# compute ONNX Runtime output prediction
ort_inputs = {
    ort_session.get_inputs()[0].name: to_numpy(x)
    }
ort_outs = ort_session.run(None, ort_inputs)

model, params = relay.frontend.from_onnx(onnx_model)

with tvm.transform.PassContext(opt_level=1):
    executor = relay.build_module.create_executor(
        "graph", model, tvm.cpu(0), "llvm", params=params
    ).evaluate()

tvm_output = executor(tvm.nd.array(x.detach())).numpy()

assert_allclose(tvm_output, ort_outs[0])
print(tvm_output, ort_outs[0])

Output:

[[-0.539178   -0.3405335  -0.04540446  0.488098   -0.5505291  -0.54485357
  -0.56188023  0.22134677  0.09648448  0.19864453 -0.27810234 -0.13621339
  -0.3235068   0.18161786 -0.5278269   0.46539575 -0.5278269  -0.4483691
  -0.09648448  0.01702667]] [[-0.539178   -0.3405335  -0.04540446  0.488098   -0.5505291  -0.54485357
  -0.56188023  0.22134677  0.09648448  0.19864453 -0.27810234 -0.13621339
  -0.3235068   0.18161786 -0.5278269   0.46539575 -0.5278269  -0.4483691
  -0.09648448  0.01702667]]

Thank reviewers for your time and have a nice day !

cc @alnah005 @cconvey

AndrewZhaoLuo

Thanks for the contribution, LGTM mostly, just one set of comments comment

AndrewZhaoLuo · 2022-01-24T18:54:10Z

python/tvm/relay/frontend/common.py

+        return _op.squeeze(x)
+    else:
+        if force_assert:
+            assert num_elem == 1, "Cannot squeeze tensor shape {} to scalar form.".format(x_shape)


In this codepath, you will return None

AndrewZhaoLuo · 2022-01-24T18:57:39Z

python/tvm/relay/frontend/common.py

+        if force_assert:
+            assert num_elem == 1, "Cannot squeeze tensor shape {} to scalar form.".format(x_shape)
+        else:
+            return x


This changes the behavior of ensure_scalar_shape -- now we can return non-scalar elements, I would instead in QLinearMatMul's try_resolve_to_const simply, wrap ensure_scalar_shape calls with a check if the number of elements is 1.

AndrewZhaoLuo · 2022-01-24T18:58:08Z

python/tvm/relay/frontend/onnx.py

-        def try_resolve_to_const_scalar(x, dtype_override=None):
+        def try_resolve_to_const(x, dtype_override=None, allow1D=False):
            x2 = try_resolve_var_to_const(x, params)
-            x3 = ensure_scalar_shape(x2)


echoing above, instead of adding a new flag to ensure_scalar_shape that breaks invariants, take the check you wrote and push it out here.

AndrewZhaoLuo · 2022-01-24T18:59:17Z

Also, for each file, we use black as our python formatter. Make sure to run black on all python files or else you will fail our linter.

* [Caffe Frontend] supporting group > 1 cases for Deconv op - Handling group > 1 cases, assuming group == output channels - Simply decomposed into Relay split, conv2d_transposed, and multi-leveled concatenate ops - Added some test cases Signed-off-by: zotanika <zotanika@gmail.com> * [Caffe Frontend] amending a test case for Deconv op Signed-off-by: zotanika <zotanika@gmail.com> * explicit importing tvm.testing * changing split axis to 0, according to PR apache#9336

…8136) * [Caffe Frontend] adding Reduction op * reformatting Reduction op test script * reformatting Reduction test script * [Caffe frontend] Reduction op - adding more test cases; handling '0 < axis < num_axes - 1' case to give the result equivalent to Caffe framework - skipping Relay multiplication if coeff is 1 Signed-off-by: zotanika <zotanika@gmail.com> * linting test script * linting * [Caffe Frontend] Supporting multiple grouped(channel-wise) Deconv op * Handling group > 1 cases, assuming group == output channels * Decomposed into Relay split, transposed conv, and multi-leveled concatenation. * Added some test cases. Signed-off-by: zotanika <zotanika@gmail.com> * [Caffe Frontend] supporting variable number of inputs for Eltwise * extra handling of rest inputs for PROD, SUM, MAX operations * extra testcases Signed-off-by: zotanika <zotanika@gmail.com> * formatting fix * [Caffe Frontend] reverting codes related Reduction for splitting PR * Revert "[Caffe Frontend] Supporting multiple grouped(channel-wise) Deconv op" This reverts commit 43e25e5. * instant fix against docker format error * instant fix against docker format error * instant fix against docker format error

Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Xiyou Zhou <xiyou@octoml.ai> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Xiyou Zhou <xiyou@octoml.ai> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org>

* [microNPU] Remove remaining UnsupportedLayout checks In apache#9508 the decision was made to remove the UnsupportedLayout exception and the checks that throw it, this PR is cleaning up some that remained. Change-Id: I83bfe233381b83af886343c9569db753e33f9059 * fix lint Change-Id: I67c1a5371f0b2e51b6cd39435ef4073d8d17af51

* [microNPU][2c] Initial Performance Model * Added the pre-computed performance modelling per block. * Added the aggregation of cycles given a stripe config. * Implemented the op-specific performance code for conv2d. * Created a DeviceConfig class to hold constant performance related data that is dependent on the accelerator configuration * Added generation of all valid block configs. This is pre-computed and given as an argument when constructing EthosuParts. * Implemented selection of the block config that gives the least amount of data read given a StripeConfig. * Add test guards * Extended block config testing

Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Xiyou Zhou <xiyou@octoml.ai> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Xiyou Zhou <xiyou@octoml.ai> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Wuwei Lin <wuwei@apache.org>

…_shape_from_cudnn` (apache#9948) * Introduce SetConvdescriptors to refactor cudnn/conv_forward.cc * more refactor * remove cudnn get output * cpplint

* [microNPU] Add support for scalar values PR apache#9515 enabled support for scalar constants, but didn't consider the case of a scalar value where the underlying constant data does not have a shape i.e. `constant.shape == []`. See the test case for a visual differece when the scalar value is 1. Change-Id: Id7a238cb5bf999dd5a8428c097202f9fb940a5f0 * Fix failing test by removing constant Before this PR scalar constants were handled differently so this test was able to pass. Now that scalar constants are handled in the same manner as tensor constants, the test fails since unexpected tir is produced in the compilation pipeline. Since the relay used in this test case is not expected to be produced by higher levels of the compiler, removing this constant for now. Change-Id: I4ea5155778809041339e6faac05af3f72c3e3ea5 * clean up finding tensor from inputs Change-Id: Ideccf84f8c9149148ff23e2406229cf637c982a3

These are required for running the demos under ci_qemu in combination with Zephyr

… modules parameters map. (apache#9846) * [Runtime][Pipeline executor] Global parameters group name and runtime modules parameters map. Solution: To support on the fly parameters setting for each runtime module in pipeline executor, we create a feature that use global parameters group name to map the runtime module parameter, after such map relation get created user can do the on the fly parameters setting by using the parameters group name. trigger build. fix ut issue. polish comments. Update python/tvm/contrib/pipeline_executor.py Co-authored-by: Cody Yu <comaniac0422@gmail.com> Update python/tvm/contrib/pipeline_executor.py Co-authored-by: Cody Yu <comaniac0422@gmail.com> Update python/tvm/contrib/pipeline_executor.py Co-authored-by: Cody Yu <comaniac0422@gmail.com> Update python/tvm/contrib/pipeline_executor.py Co-authored-by: Cody Yu <comaniac0422@gmail.com> Update src/runtime/pipeline/pipeline_executor.h Co-authored-by: Cody Yu <comaniac0422@gmail.com> Update src/runtime/pipeline/pipeline_struct.h Co-authored-by: Cody Yu <comaniac0422@gmail.com> Update python/tvm/contrib/pipeline_executor.py Co-authored-by: Cody Yu <comaniac0422@gmail.com> address review comments. * Update python/tvm/contrib/pipeline_executor.py Co-authored-by: Cody Yu <comaniac0422@gmail.com> * fix plint issue. Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* jenkinsfile and one test * formatting * swtich to proper repo for docker * fix missing - with _ * jostle * upgrade to latest images * jenkinsfile and one test * formatting * swtich to proper repo for docker * fix missing - with _ * upgrade to latest images * jostle ci * update with official images * jostle ci

Appended fused operations in cov2d for int8 were computed in a separate loop from the main conv2d computation: ``` for i in ... parallel for j in ... accumulator = 0 for k in .. vectorized_multiply_add(accumulator, data, kernel) out = accumulator for k in .. out = out + fused subsequent ops ``` This patch moves the fused ops one more loop nesting inwards to get ``` for i in ... parallel for j in ... accumulator = 0 for k in .. vectorized_multiply_add(accumulator, data, kernel) out = accumulator + fused subsequent ops ``` On quantized mobilenetv2, this results in approximately a 30% speedup.

Type annotations don't do anything, the type conversion needs to be explicit.

yuanfz98 added 6 commits January 16, 2022 14:25

rebase

65afd05

Merge branch 'QLinearMatMul-enhance' into qlinearmatmul_merge_fix

66629cc

fix logic

a6343c5

format

98216ab

fix logic

962fa36

fix logic2

c5742ad

yuanfz98 requested review from Huyuwei, jwfromm, kazum, mbrookhart, siju-samuel and srkreddy1238 as code owners January 16, 2022 14:28

AndrewZhaoLuo self-assigned this Jan 24, 2022

AndrewZhaoLuo requested changes Jan 24, 2022

View reviewed changes

zotanika and others added 15 commits January 24, 2022 21:21

[MetaSchedule] PostProcessor: Verify GPU Code (apache#9945)

9cc80e9

[CUDNN] Refactor descriptor initialization, remove `cudnn.conv.output…

109b551

…_shape_from_cudnn` (apache#9948) * Introduce SetConvdescriptors to refactor cudnn/conv_forward.cc * more refactor * remove cudnn get output * cpplint

[HotFix] Skip the flaky MetaSchedule Auto-Unroll test (apache#9956)

f720191

Enable NPU and CMSIS in ci_qemu (apache#9957)

6928f29

These are required for running the demos under ci_qemu in combination with Zephyr

[Hexagon] Return pathlib.Path from get_hexagon_rpc_path() (apache#9969)

cd5c4a6

Type annotations don't do anything, the type conversion needs to be explicit.

yuanfz98 closed this Jan 24, 2022

yuanfz98 mentioned this pull request Jan 24, 2022

[Relay] QLinearMatMul allows 1D weight_scale, weight_zero_point inputs #10047

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relay] QLinearMatMul allows 1D weight_scale, weight_zero_point inputs #9946

[Relay] QLinearMatMul allows 1D weight_scale, weight_zero_point inputs #9946

Uh oh!

yuanfz98 commented Jan 16, 2022

Uh oh!

AndrewZhaoLuo left a comment •

edited

Loading

Uh oh!

AndrewZhaoLuo Jan 24, 2022

Uh oh!

AndrewZhaoLuo Jan 24, 2022

Uh oh!

AndrewZhaoLuo Jan 24, 2022

Uh oh!

AndrewZhaoLuo commented Jan 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

[Relay] QLinearMatMul allows 1D weight_scale, weight_zero_point inputs #9946

[Relay] QLinearMatMul allows 1D weight_scale, weight_zero_point inputs #9946

Uh oh!

Conversation

yuanfz98 commented Jan 16, 2022

Uh oh!

AndrewZhaoLuo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndrewZhaoLuo Jan 24, 2022

Choose a reason for hiding this comment

Uh oh!

AndrewZhaoLuo Jan 24, 2022

Choose a reason for hiding this comment

Uh oh!

AndrewZhaoLuo Jan 24, 2022

Choose a reason for hiding this comment

Uh oh!

AndrewZhaoLuo commented Jan 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

AndrewZhaoLuo left a comment •

edited

Loading