-
Notifications
You must be signed in to change notification settings - Fork 6.7k
[MKLDNN] Support quantized rnn #18001
Conversation
|
Hey @zixuanweeei , Thanks for submitting the PR
CI supported jobs: [windows-cpu, website, sanity, miscellaneous, unix-gpu, centos-gpu, clang, unix-cpu, edge, centos-cpu, windows-gpu] Note: |
eric-haibin-lin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the performance?
We have verified the accuracy and performance using a pre-trained language model provided by gluon-nlp (a link). Accuracy (PPL, lower is better)
The accuracy results of INT8 is very close to that of FP32. PerformanceProfiler Dumps of FP32 End-to-End
Profiler Dumps of INT8 End-to-End
End-to-End latency got ~1.1x speedup (22735.04 vs 20201.01) which is not that good. However, Besides, the quantization flow of LSTM only takes some gemm operations into INT8 calculation. Others, such as gates' additions, bias additions, element-wise activations, are remain as FP32. So the speedup of |
eric-haibin-lin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for sharing!
|
Is there a plan to improve |
|
@eric-haibin-lin we'll enable DNNL primitive for log_softmax to improve its performance on CPU, but not in this PR:) |
|
@zixuanweeei could you rebase and resolve the conflict? |
Currently, we are focusing on adding the feature on v1.6.x branch, as well as the quantized LSTMP operator. I will port the changes there to this PR soon. Thanks for the reminder. |
|
@mxnet-bot run ci [all] |
|
Jenkins CI successfully triggered : [unix-cpu, windows-gpu, centos-cpu, sanity, miscellaneous, website, clang, windows-cpu, centos-gpu, unix-gpu, edge] |
|
@mxnet-bot run ci [windows-gpu] |
|
Jenkins CI successfully triggered : [windows-gpu] |
|
@mxnet-bot run ci [windows-gpu] |
1 similar comment
|
@mxnet-bot run ci [windows-gpu] |
|
Jenkins CI successfully triggered : [windows-gpu] |
* Add _contrib_quantized_rnn op * Add asymmetric quantization - _contrib_quantized_asym op * Add MXNET_USE_WEIGHT_CACHE to control rnn init behavior * Support data layout in NDArrayIter * Move MKLDNNRnnMgr to individual layer
|
Closing since we need to refactor quantization flow in master |
Description
In this PR, we add support of quantization flow of the rnn operator. Currently, only the LSTM mode supports INT8 inference.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
NCHWlayout by default, and there is no way to support other layouts, like sequentialTNClayout. This PR makes some changes to NDArrayIter to leverage the feature (assuming that N represents the batch).@ciyongch @TaoLv @pengzhao-intel