I am getting the following error while compiling a quantized Onnx model -
File "python/tvm/relay/frontend/onnx.py" -
in _impl_v10
assert (a_rank == 2) and (b_rank == 2), (AssertionError: QLinearMatMul importer currently requires both 'a' and 'b' tensors to be 2D, but ran(a)=3, rank(b)=3
It looks like this is occurring due to quantized matmul operation not being supported for >2d inputs.
In the ToDo comments, it is mentioned that this limitation would be removed in future updates.
Is there any plan to enable this feature currently ?
We need this to deploy a model, and if it's not in priority, is there a workaround to this problem? Or something we can do to resolve it .