-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[ETHOSN] Inline non-compute-intensive partitions #13092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
| /*! \brief Whether or not the partitioned function is consdiered compute intensive. */ | ||
| bool is_compute_intensive; | ||
| /*! \brief A set of operators considered compute intensive. */ | ||
| const std::unordered_set<std::string> compute_intensive_operators{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pass looks extremely useful. In order to make this pass more generic, would it make sense to accept this list as an input to the pass? In case of npu, this can be passed from the python partitioner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
me overthinking 🤔 : maybe in future, this list can be accepted as a tvmc command line argument to make the pass even more generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit hesitant to make this pass more generic as different hardware might require a different heuristic all together (rather than just a different list of operators). That said, it does seem useful for the user to customise/tune the pass from TVMC if necessary, especially since the heuristic is not optimal. But yep, think this would be good as a followup
| } | ||
|
|
||
| if (op_name != "") { | ||
| if (compute_intensive_operators.find(op_name) != compute_intensive_operators.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: in this mechanism, a partitioned function containing lot of non compute intensive ops could be inlined too. is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes currently that's possible. It's difficult to come up with a sensible limit here without being able to estimate the performance, perhaps if needed we can expose this option to the user in the future?
Adds a pass that analyzes functions partitioned for the NPU and inlines those that are deemed "non-compute-intensive" back to the main function so that they can be considered for other backends. The current heurisic for deciding a non-compute-intensive function is to collectively check all of the operations in the function have no multiply accumulate operations. This heuristic is not optimial; optimization is left for future exploration. This pass is inspired by the "IsComputeIntensiveGraph" pass in the TensorRT integration. Change-Id: I20c197702f5252f102cfc1e4b4635ab836aa7835
4fa075e to
50eed63
Compare
* 'inline_non_compute_intensive_partitions' -> 'is_inline_non_compute _intensive_partitions_enabled'. * remove no MAC operations. * fix network test. Change-Id: Ie1015b27f37e47544bed6f0aff819ee4649de579
50eed63 to
3defac8
Compare
Change-Id: I0ee0af071dc77c91e0ef0f6753506cb40d1d1859
Change-Id: Ie918d7f1059f032282f1f5eeffda38f4febcd59c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @lhutton1 for making it as generic as possible. It can be used by many other targets with little modifications.
leandron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @lhutton1 @ashutosh-arm
* [ETHOSN] Inline non-compute-intensive partitions Adds a pass that analyzes functions partitioned for the NPU and inlines those that are deemed "non-compute-intensive" back to the main function so that they can be considered for other backends. The current heurisic for deciding a non-compute-intensive function is to collectively check all of the operations in the function have no multiply accumulate operations. This heuristic is not optimial; optimization is left for future exploration. This pass is inspired by the "IsComputeIntensiveGraph" pass in the TensorRT integration. Change-Id: I20c197702f5252f102cfc1e4b4635ab836aa7835 * Address comments * 'inline_non_compute_intensive_partitions' -> 'is_inline_non_compute _intensive_partitions_enabled'. * remove no MAC operations. * fix network test. Change-Id: Ie1015b27f37e47544bed6f0aff819ee4649de579 * Fix failing unit tests due to optimization Change-Id: I0ee0af071dc77c91e0ef0f6753506cb40d1d1859 * Add future exploration suggestions Change-Id: Ie918d7f1059f032282f1f5eeffda38f4febcd59c
* [ETHOSN] Inline non-compute-intensive partitions Adds a pass that analyzes functions partitioned for the NPU and inlines those that are deemed "non-compute-intensive" back to the main function so that they can be considered for other backends. The current heurisic for deciding a non-compute-intensive function is to collectively check all of the operations in the function have no multiply accumulate operations. This heuristic is not optimial; optimization is left for future exploration. This pass is inspired by the "IsComputeIntensiveGraph" pass in the TensorRT integration. Change-Id: I20c197702f5252f102cfc1e4b4635ab836aa7835 * Address comments * 'inline_non_compute_intensive_partitions' -> 'is_inline_non_compute _intensive_partitions_enabled'. * remove no MAC operations. * fix network test. Change-Id: Ie1015b27f37e47544bed6f0aff819ee4649de579 * Fix failing unit tests due to optimization Change-Id: I0ee0af071dc77c91e0ef0f6753506cb40d1d1859 * Add future exploration suggestions Change-Id: Ie918d7f1059f032282f1f5eeffda38f4febcd59c
Adds a pass that analyzes functions partitioned for the NPU and inlines those that are deemed "non-compute-intensive" back to the main function so that they can be considered for other backends. The current heuristic for deciding a non-compute-intensive function is to collectively check all of the operations in the function have no multiply accumulate operations. This heuristic is not optimal; optimization is left for future exploration.
This pass is inspired by the "IsComputeIntensiveGraph" pass in the TensorRT integration.
cc @ashutosh-arm @leandron