Skip to content

Conversation

@lhutton1
Copy link
Contributor

Adds a pass that analyzes functions partitioned for the NPU and inlines those that are deemed "non-compute-intensive" back to the main function so that they can be considered for other backends. The current heuristic for deciding a non-compute-intensive function is to collectively check all of the operations in the function have no multiply accumulate operations. This heuristic is not optimal; optimization is left for future exploration.

This pass is inspired by the "IsComputeIntensiveGraph" pass in the TensorRT integration.

cc @ashutosh-arm @leandron

@tvm-bot
Copy link
Collaborator

tvm-bot commented Oct 17, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

  • cc @ashutosh-arm See #10317 for details
  • Built docs for commit bd0bf6c can be found here.

Generated by tvm-bot

@github-actions github-actions bot requested a review from leandron October 17, 2022 08:41
@areusch areusch added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Oct 19, 2022
/*! \brief Whether or not the partitioned function is consdiered compute intensive. */
bool is_compute_intensive;
/*! \brief A set of operators considered compute intensive. */
const std::unordered_set<std::string> compute_intensive_operators{
Copy link
Contributor

@asparkhi asparkhi Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pass looks extremely useful. In order to make this pass more generic, would it make sense to accept this list as an input to the pass? In case of npu, this can be passed from the python partitioner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

me overthinking 🤔 : maybe in future, this list can be accepted as a tvmc command line argument to make the pass even more generic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit hesitant to make this pass more generic as different hardware might require a different heuristic all together (rather than just a different list of operators). That said, it does seem useful for the user to customise/tune the pass from TVMC if necessary, especially since the heuristic is not optimal. But yep, think this would be good as a followup

}

if (op_name != "") {
if (compute_intensive_operators.find(op_name) != compute_intensive_operators.end()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: in this mechanism, a partitioned function containing lot of non compute intensive ops could be inlined too. is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes currently that's possible. It's difficult to come up with a sensible limit here without being able to estimate the performance, perhaps if needed we can expose this option to the user in the future?

Adds a pass that analyzes functions partitioned for the NPU and inlines
those that are deemed "non-compute-intensive" back to the main function
so that they can be considered for other backends. The current heurisic
for deciding a non-compute-intensive function is to collectively check
all of the operations in the function have no multiply accumulate
operations. This heuristic is not optimial; optimization is left for
future exploration.

This pass is inspired by the "IsComputeIntensiveGraph" pass in the
TensorRT integration.

Change-Id: I20c197702f5252f102cfc1e4b4635ab836aa7835
@lhutton1 lhutton1 force-pushed the unpartition-subgraphs branch from 4fa075e to 50eed63 Compare October 20, 2022 11:15
* 'inline_non_compute_intensive_partitions' -> 'is_inline_non_compute
_intensive_partitions_enabled'.
* remove no MAC operations.
* fix network test.

Change-Id: Ie1015b27f37e47544bed6f0aff819ee4649de579
@lhutton1 lhutton1 force-pushed the unpartition-subgraphs branch from 50eed63 to 3defac8 Compare October 20, 2022 11:58
Change-Id: I0ee0af071dc77c91e0ef0f6753506cb40d1d1859
Change-Id: Ie918d7f1059f032282f1f5eeffda38f4febcd59c
Copy link
Contributor

@asparkhi asparkhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @lhutton1 for making it as generic as possible. It can be used by many other targets with little modifications.

Copy link
Contributor

@leandron leandron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @lhutton1 @ashutosh-arm

@leandron leandron merged commit 75921fb into apache:main Nov 3, 2022
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 10, 2022
* [ETHOSN] Inline non-compute-intensive partitions

Adds a pass that analyzes functions partitioned for the NPU and inlines
those that are deemed "non-compute-intensive" back to the main function
so that they can be considered for other backends. The current heurisic
for deciding a non-compute-intensive function is to collectively check
all of the operations in the function have no multiply accumulate
operations. This heuristic is not optimial; optimization is left for
future exploration.

This pass is inspired by the "IsComputeIntensiveGraph" pass in the
TensorRT integration.

Change-Id: I20c197702f5252f102cfc1e4b4635ab836aa7835

* Address comments

* 'inline_non_compute_intensive_partitions' -> 'is_inline_non_compute
_intensive_partitions_enabled'.
* remove no MAC operations.
* fix network test.

Change-Id: Ie1015b27f37e47544bed6f0aff819ee4649de579

* Fix failing unit tests due to optimization

Change-Id: I0ee0af071dc77c91e0ef0f6753506cb40d1d1859

* Add future exploration suggestions

Change-Id: Ie918d7f1059f032282f1f5eeffda38f4febcd59c
@lhutton1 lhutton1 deleted the unpartition-subgraphs branch November 15, 2022 15:40
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
* [ETHOSN] Inline non-compute-intensive partitions

Adds a pass that analyzes functions partitioned for the NPU and inlines
those that are deemed "non-compute-intensive" back to the main function
so that they can be considered for other backends. The current heurisic
for deciding a non-compute-intensive function is to collectively check
all of the operations in the function have no multiply accumulate
operations. This heuristic is not optimial; optimization is left for
future exploration.

This pass is inspired by the "IsComputeIntensiveGraph" pass in the
TensorRT integration.

Change-Id: I20c197702f5252f102cfc1e4b4635ab836aa7835

* Address comments

* 'inline_non_compute_intensive_partitions' -> 'is_inline_non_compute
_intensive_partitions_enabled'.
* remove no MAC operations.
* fix network test.

Change-Id: Ie1015b27f37e47544bed6f0aff819ee4649de579

* Fix failing unit tests due to optimization

Change-Id: I0ee0af071dc77c91e0ef0f6753506cb40d1d1859

* Add future exploration suggestions

Change-Id: Ie918d7f1059f032282f1f5eeffda38f4febcd59c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants