Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Conversation

@Zha0q1
Copy link
Contributor

@Zha0q1 Zha0q1 commented Jan 12, 2021

This PR is used to reproduce the nan output issue when running inference of mxnet's bert onnx export on tensor rt in fp 16 mode #19747

Steps to export bert to onnx:

  1. build mxnet and run pip install -e python/
  2. install dependencies with pip onnx==1.6.0 onnx-simplifier onnxruntime==1.6.0 gluonnlp
  3. cd export_bert
  4. edit export_bert.py to toggle pretrained=True/False and specify batch and seq_length
  5. python export_bert.py --layer=3 to export the model.
  6. look into bert_model folder and use the *_sim.onnx onnx export file. This is the final output graph which has been optimized by onnx-simplifier

Steps to build trt engine from onnx file and run inference

  1. use this image https://ngc.nvidia.com/catalog/containers/nvidia:tensorrt
  2. pip install onnxruntime
  3. copy build_trt.py and infer_trt.py and the onnx files into the container
  4. edit build_trt.py to specify onnx input and trt output path and do python build_trt_py
  5. edit infer_trt.py to specify onnx and trt path and batch and seq_length and do python infer_trt_py
  6. step 5 will run both trt and onnxruntime and output the results by each

Provided trt files
I have also included two trt files that I generated by following the above steps so you can cut to step 5 above.
mx_bert_layer3_simp_1_16_pre.trt means BS 1 SeqLen 16 pretrained=True
mx_bert_layer3_simp_1_17_pre.trt means BS 1 SeqLen 17 pretrained=True

@Zha0q1 Zha0q1 requested a review from szha as a code owner January 12, 2021 23:49
@mxnet-bot
Copy link

Hey @Zha0q1 , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [centos-gpu, unix-cpu, sanity, clang, windows-cpu, centos-cpu, website, windows-gpu, miscellaneous, edge, unix-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 12, 2021
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 13, 2021
@Zha0q1 Zha0q1 closed this Feb 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

pr-work-in-progress PR is still work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants