Skip to content

Conversation

@sunshinemyson
Copy link

workflow:
1. Define special pattern can be fused by NPU in python.
2. Generate compiled model with BYOC implementation.
3. Execute compiled model with simple runtime impl.

Signed-off-by: xiang.zhang xiang.zhang@verisilicon.com

Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

workflow:
    1. Define special pattern can be fused by NPU in python.
    2. Generate compiled model with BYOC implementation.
    3. Execute compiled model with simple runtime impl.

Signed-off-by: xiang.zhang <xiang.zhang@verisilicon.com>
@sunshinemyson
Copy link
Author

Hi @mbaret ,

Could you help review our PR for new NPU support? Thanks

@areusch
Copy link
Contributor

areusch commented Sep 21, 2021

@comaniac would you like to help review this too?

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunshinemyson thanks for the PR. did an initial review. could you post up an RFC (eg. open a PR to github.com/apache/tvm-rfcs) or pre-RFC to discuss.tvm.ai giving the general approach and some background on your accelerator so we can better understand this PR? it's quite large and some context would be helpful. you might also consider committing just a couple ops initially so we can see/validate the approach, then accept the rest as follow-on PRs.

@@ -0,0 +1,30 @@
if(NOT USE_VSI_NPU STREQUAL "OFF")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update cmake/config.cmake to contain the default and an example docstring? also, can you make this work for the case that USE_VSI_NPU is not set? i thiiink STREQUAL "OFF" would fail in that case (but i am not a cmake expert)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix it.

endif()

set(OVXLIB_API_ATTR "__attribute__\(\(visibility\(\"default\"\)\)\)")
add_definitions(-DOVXLIB_API=${OVXLIB_API_ATTR})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add this to a specific target? same for the below definition

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not required. Will remove it.


import tvm._ffi

tvm._ffi._init_api("relay.vsi_npu.support", __name__) No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a newline at the end of the file. here and elsewhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix it.

@@ -0,0 +1,97 @@
# Versilicon NPU solution on TVM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you like to contribute some of this content as a tutorial in tutorials/ somewhere?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to use GitHub wiki if possible.

cd target_runtime_build
cp ../cmake/config.cmake ./
# add set(USE_VSI_NPU ON) to config.cmake, you can do it with cmake command option too
cmake -DCMAKE_BUILD_TYPE=Debug -DTIM_VX_INSTALL_DIR=<full_path_to_tim_vx_target_build_install_dir> \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering how come you do a Debug build to deploy?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use debug in deployment. I'll refine it.

// TODO
};

inline int32_t ConvertAxis(int32_t axisIn, uint32_t dimNum) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add comments about what this is converting from and to?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix.

});
};

std::shared_ptr<tvx::Tensor> createVxOPerand(TensorInfoTable tensor_info,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you follow the C++ style guide here https://google.github.io/styleguide/cppguide.html#Function_Names? e.g.CreateVxOperand

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix.

return specs;
}

using namespace backend;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix.


return relay.testing.init.create_workload(net)

print("Testing {0: <50}".format("ADD"), end="")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest to use logging package for compat with pytest

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acutally, we didn't use pytest in the test scripts. I think we need refactor all the tests with pytest?

verify_vsi_result(inputs, quantize, [], data_shape, data_shape, output_dtype)

if __name__ == "__main__":
#test_qnn_add()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace all of this with:
sys.exit(pytest.main([__file__] + sys.argv[1:]))

@comaniac
Copy link
Contributor

Echo @areusch. It's impossible to review a PR with 5k code changes.

@masahi
Copy link
Member

masahi commented Sep 22, 2021

@areusch @comaniac I'm happy to help review this PR and the RFC (I'm familiar with the BYOC flow)

@masahi masahi self-assigned this Sep 22, 2021
@comaniac
Copy link
Contributor

Thanks @masahi. It's no doubt that you are very familiar with BYOC and I would be comfortable to merge the PR if you approve.

Meanwhile, following the process of filing an RFC with an upstream plan that gradually adds features is always a better practice. Small size PRs could easily get the second or even third reviews, and can make the final framework more robust and bug-free. I would rather, and be happy to spend one hour to review and merge one small/medium size PR every day, instead of spending days or even months on a PR with many iterations.

@sunshinemyson
Copy link
Author

@sunshinemyson thanks for the PR. did an initial review. could you post up an RFC (eg. open a PR to github.com/apache/tvm-rfcs) or pre-RFC to discuss.tvm.ai giving the general approach and some background on your accelerator so we can better understand this PR? it's quite large and some context would be helpful. you might also consider committing just a couple ops initially so we can see/validate the approach, then accept the rest as follow-on PRs.

https://discuss.tvm.apache.org/t/byoc-add-verisilicons-npu-support/11097?u=sven

zhouheng.zheng and others added 4 commits November 26, 2021 22:45
* Add readme for VSI-NPU

Signed-off-by: xiang.zhang <xiang.zhang@verisilicon.com>

* Update README.VSI.md

Fix wrong env variable name

* pytorch model support

* add dropout op

* add type constraint for RemoveClipAfterRequantize

* fix Conv2D op for depthwise conv2d

Co-authored-by: xiang.zhang <xiang.zhang@verisilicon.com>
Co-authored-by: zhouheng.zheng <zhouheng.zheng@ouotlook.com>
1. fix per-axis quantized conv2d/dense export issue on vsi_npu

model: pytorch quantized googlenet
@areusch
Copy link
Contributor

areusch commented Mar 24, 2022

hey @sunshinemyson what's the status of this PR? is it something you want to contribute still? we have a weekly TVM Community Meeting now which could be a great forum to present your design in a high-bandwidth setting.

@areusch
Copy link
Contributor

areusch commented Apr 8, 2022

closing due to inactivity, but feel free to reopen!

@areusch areusch closed this Apr 8, 2022
@ChingHan0921
Copy link

Hello, we are using the ai-benchmark model (mobilenet_v3_quant.tflite) for inference testing, but we are unable to run the inference successfully. May I ask if there are any plans to support inference with MobileNetV3?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants