-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
This is a feature request.
Can we make this kind of integration easy to achieve for end users?
https://tvm.ai/2018/03/23/nmt-transformer-optimize.html
whereby TVM is used to create a kernel that is then exported into the TF runtime. After searching, I was unable to find open source code of this kind of integration. For someone who knows both systems at the appropriate layers this is probably easy, but for someone who doesn't, its a big friction to figure out all the details.
There are two main scenerios where this is beneficial for TF users:
- When it is not practical to adopt TVM wholesale for model execution, because of unsupported ops, or because of preexisting investments in TF infra.
- For initial exploration and assessment of TVM, it would be a lighter weight and lower risk onramp to be able to move ops a la carte to TVM.
To achieve this, there are 2 forms of support that would be useful:
- Tutorials / example templates that demonstrate the necessary glue that can simply be copied and lightly modified (changing paths, changing the TF op definition/registration code)
- Fully automated system that generates all the necessary C++ wrappers & support functions from your TVM function specification
Support level 1 would be sufficient for folks like me who are very interested in the results reported by Alibaba, but for whom there is too much friction or lack the experience to easily try this out and see if its worth investing in further.
Support level 2 would be useful for the median TF model developer, folks that don't want to touch/be aware of the C++ level but do want to try out optimizations.
Further background:
Currently there is no way to create TF ops besides C++ programming. The closest contender, XLA, only allows compiling existing ops, but does not allow creating new ops based on novel combinations of the XLA primitives. Furthermore XLA is highly restricted in what primitives it supports, for example convolution is only supported for floating point.
Eventually it will be possible to create TF ops in Swift and other MLIR-targeting systems, but this will likely take years, whereas the TVM infra is ready to go today. Therefore TVM is uniquely positioned to fill a significant gap in the TF ecosystem.
Thank you for your consideration.