Export TensorFlow models to ONNX with dynamic input shapes#19255
Export TensorFlow models to ONNX with dynamic input shapes#19255sgugger merged 8 commits intohuggingface:mainfrom
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
lewtun
left a comment
There was a problem hiding this comment.
Thanks a lot for enabling dynamic input shapes and ensuring our tests actually test the TF exports @dwyatte !
Can you please confirm that the slow tests pass by running:
RUN_SLOW=1 pytest tests/onnx/test_onnx_v2.py
It would also be interesting to know how much slower the TF exports are compared to the PyTorch ones, e.g. can you share some timings for a few models?
| reference_model_inputs = config.generate_dummy_inputs(preprocessor, framework=TensorType.PYTORCH) | ||
| reference_model_inputs = config.generate_dummy_inputs( | ||
| preprocessor, | ||
| batch_size=config.default_fixed_batch + 1, |
There were a few failures here (
A couple of options:
What do you think / any other ideas? |
So TF is around 2-8x slower on my machine ( |
|
@lewtun any further thoughts on this PR with the goal of supporting dynamic input shapes in ONNX models exported from TensorFlow? It's not clear to me how transformers/tests/onnx/test_onnx_v2.py Lines 300 to 303 in 6b36673 I suppose part of the answer is whether we want users to experience export failures related to dynamic shapes (which the current code in this PR would do) vs removing explicit dynamic shape validation from the user experience and limiting it to tests. |
|
Hey @dwyatte, thanks for sharing the timings! I'm currently working on dramatically shrinking all the ONNX models we use for internal testing, so a 2-8x slowdown for some models is probably OK. Regarding how to handle the model validation:
I am in favour of option (1) and creating a separate issue to figure out what's wrong in the ONNX export of these 3 models. You can skip these tests by following the same logic you linked to above :) |
Created #19357 to track this. |
lewtun
left a comment
There was a problem hiding this comment.
Thanks a lot for opening an issue to track the problematic ONNX models @dwyatte 🔥 !
This PR LGTM, so gently pinging @sgugger for final approval.
For context in the review: @dwyatte uncovered some edge cases that our ONNX tests didn't cover. This PR currently skips the problematic model heads and we decided to tackle them in a separate issue, since this one is focused on enabling dynamic shapes for TF models
sgugger
left a comment
There was a problem hiding this comment.
LGTM, thanks a lot for working on this!
|
There was some problem with CircleCI which only ran part of the test suite (and I can't manually re-run it). Could you push an empty commit on your branch ( |
I think I was having the same problem described here #18351 (comment) 9496836 ran the CI under the huggingface org, so should be good to go now |
|
Thanks! |
What does this PR do?
This PR exports TensorFlow models to ONNX with dynamic input shapes. Previously they were being exported with static input shapes with a batch size of 2 and sequence length of 8. This should bring TensorFlow to ONNX export mostly into parity with PyTorch Models.
Fixes #19238
FeaturesManager.get_model_class_for_featurereturns a PyTorch model class by default. I've exposed aframeworkargument on these tests so thatFeaturesManager.get_model_class_for_featurecan return TensorFlow models. NOTE: Exporting TensorFlow to ONNX seems to be much slower than exporting PyTorch to ONNX so CI duration will increasevalidate_model_outputsto check with a batch size/sequence length different than used during export (now 3 and 9 respectively). There was a TODO about this, but it surfaced an error for BERT, CamemBERT, and RoBERTa multiple-choice tasksonnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Add node. Name:'tf_bert_for_multiple_choice/bert/encoder/layer_._0/attention/self/add_1' Status Message: /Users/runner/work/1/s/onnxruntime/core/providers/cpu/math/element_wise_ops.h:503 void onnxruntime::BroadcastIterator::Init(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1, I suspect due to the way these models are defined (tracing fails to properly infer shape somewhere). IMO this is still a net improvement since the ONNX models exported under TensorFlow were previously non-functional except with their static input shapes. I'm skipping these specific configurations during testing for now, but someone should look into thisBefore submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of who to tag.
Please tag fewer than 3 people.
@Rocketknight1, @LysandreJik, @lewtun