Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Conversation

@Zha0q1
Copy link
Contributor

@Zha0q1 Zha0q1 commented May 6, 2021

Add model (BERT) specific logic to un-interleave the self attention mat mul. This can potentially speed up inference with trt 8.0 whose compiler can recognize the new pattern.

default usage model_specific_logics='gluonnlp_bert'

converted_model_path = mx.onnx.export_model(sym_file, params_file, input_shapes,
                                            input_types, onnx_file,
                                            model_specific_logics='gluonnlp_bert_uninterleaved')

When the model is not bert base (meaning hidden != 768 or num_heads != 12), e.g. bert large
the usage is:

cheat_sheet = {
    'qkv_hidden': 1024,
    'num_heads': 16,
    'head_dim': 64
}
converted_model_path = mx.onnx.export_model(sym_file, params_file, input_shapes,
                                            input_types, onnx_file,
                                            model_specific_logics='gluonnlp_bert_uninterleaved',
                                            cheat_sheet=cheat_sheet)

This option to un-interleave self-attention would also work with bert-variants such as roberta, distilbert, and 'ernie'

The first screenshot is the old graph, the second is the new graph. Note that use of onnx-sim is required
image
image

@TristonC @MoisesHer @waytrue17 @josephevans

@Zha0q1 Zha0q1 requested a review from szha as a code owner May 6, 2021 01:46
@mxnet-bot
Copy link

Hey @Zha0q1 , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [clang, miscellaneous, unix-gpu, edge, centos-cpu, unix-cpu, centos-gpu, windows-cpu, website, windows-gpu, sanity]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels May 6, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels May 6, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels May 6, 2021
# coding: utf-8
"""ONNX export op translation"""

from . import _gluonnlp_bert
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this logic benefits specifically for TRT? If so, shall we consider to name it like '_gluonnlp_bert_trt'?

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels May 6, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress labels May 6, 2021
@szha
Copy link
Member

szha commented May 7, 2021

It probably makes more sense to have gluonnlp export a BERT graph that doesn't involve interleaving than doing it in mxnet. framework shouldn't have knowledge about or rely on the implementation of its ecosystem packages.

Comment on lines +86 to +90
model_specific_logics : str
Specifies if model-specific conversion logic should be used. Refer to ./_op_translations/
cheat_sheet : dict of str to str
This is a dict that stors some hyperparameters values or additional info about the model that
would be used in model-specific conversion functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these options are semantically unclear and hard to maintain

@Zha0q1
Copy link
Contributor Author

Zha0q1 commented May 7, 2021

It probably makes more sense to have gluonnlp export a BERT graph that doesn't involve interleaving than doing it in mxnet. framework shouldn't have knowledge about or rely on the implementation of its ecosystem packages.

I agree. Alternatively we might be able to make mx2onnx support loading custom conversion functions dynamically and put these functions to gluonnlp? Or if even that is too hacky this pr can serve as a poc to evaluate the performance benefit with trt 8.0 by un-interleaving the matrix multiplication

@Zha0q1 Zha0q1 changed the title [v1.x] ONNX Add an option to un-interleave BERT [poc][v1.x] ONNX Add an option to un-interleave BERT May 7, 2021
@Zha0q1
Copy link
Contributor Author

Zha0q1 commented May 7, 2021

@szha @ptrendx for insights

@mseth10 mseth10 added pr-awaiting-review PR is waiting for code review and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels May 7, 2021
@ptrendx
Copy link
Member

ptrendx commented May 12, 2021

@Zha0q1 What is the difficulty in doing this transformation after the export? You should be able to modify the ONNX graph itself, right?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

pr-awaiting-review PR is waiting for code review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants