Bert ONNX export produces nan on Tensor RT in fp16 mode

I am trying to enable the path mxnet/gluon-nlp --> onnx --> tensorrt.
There is a bug that if I use a pretrained bert model, then running inference with tensor rt in fp16 mode will produce `nan`'s.

Using pretrained weights:
```
bert, _ = nlp.model.get_model(
    name=model_name,
    ctx=ctx,
    dataset_name=dataset,
    **pretrained=True,**
    use_pooler=True,
    use_decoder=False,
    num_layers=3, # hardcode this as 3 layer since this is what the customer uses
    use_classifier=False,
    hparam_allow_override=True)
model = bert
```
Not using pretrained weights:
```
bert, _ = nlp.model.get_model(
    name=model_name,
    ctx=ctx,
    dataset_name=dataset,
    **pretrained=False,**
    use_pooler=True,
    use_decoder=False,
    num_layers=3, # hardcode this as 3 layer since this is what the customer uses
    use_classifier=False,
    hparam_allow_override=True)
model = bert
**model.initialize(ctx=ctx)**
```

More specifically, WITHOUT pretrained weights, tensor rt can produce reasonable outputs in both fp16 mode and regular fp32 mode. However, WITH pretrained weights, tensor rt will produce nan ouputs in fp16 mode, but fp32 mode seems to work fine. Furthermore, it seems like this nan issue is triggered by the size of `seq_length`: when `seq_length<=16` even fp16 mode will produce reasonable outputs; when `seq_length>17`, fp 16 mode will start to produce `nan`'s.  `batch` batch size seems to not affect the nan behavior.

Reproducible code and steps can be found here https://github.com/apache/incubator-mxnet/pull/19746. Because we have a customer requesting this feature, it would be great if friends at Nvidia can help look into this issue. Please let me know how I can provide further info/help

@sandeep-krishnamurthy @MoisesHer @Kh4L @chinakook 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bert ONNX export produces nan on Tensor RT in fp16 mode #19747

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bert ONNX export produces nan on Tensor RT in fp16 mode #19747

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions