Qualcomm AI Engine Direct - multi-method support#10584
Qualcomm AI Engine Direct - multi-method support#10584facebook-github-bot merged 2 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10584
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 0abe0cc with merge base ed718a8 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "release notes: qualcomm" |
0dd1e57 to
8468fa7
Compare
exir/backend/backend_api.py
Outdated
| owning_graph_module = call_submodule_node.graph.owning_module | ||
| # call delegate args should only use user_inputs | ||
| call_delegate_args = [] | ||
| # handle getitem node in multi-method scenario |
There was a problem hiding this comment.
Can you share what issues you run into?
There was a problem hiding this comment.
The scenario happens when there are common nodes shared by multiple delegated subgraphs (e.g. frequency sin/cos in sharded llama):
The example graph looks like below:
When replacing submodule fused_qnn_1, the original name finding mechanism are trying to match between:
- submodule_program.graph_signature.user_inputs:
('aten_mean_dim', 'aten_select_int', 'aten_select_int_1') - call_submodule_node.all_input_nodes:
[aten_mean_dim, getitem_1, getitem_2]
Which makes getitem node dangling as following, since the names could not be correctly mapped:
The patch here is trying to find the original graph with real output names and use index of getitem to have them in correct order.
And I think another issue is about validation in _unsafe_adjust_original_program. Since the partitioned sub graphs in multi-method scenario have already been turned into submodules (like first diagram). That behavior will make original_program._validate() fail in _unsafe_adjust_original_program.
This does not happen in single method lowering because sub graphs are turned into submodules one by one and replaced into executorch_call_delegate.
But I cannot find an appropriate way to pass official CI, could you give me some hint? thank you!
There was a problem hiding this comment.
@haowhsu-quic thanks for finding this edge case here. Let me add a test for this internally, so that it is captured by CI. I'm surprised that the exported_program's call signature doesn't have the updated get_items. This suggests to me that they aren't be created in topological order. I'll take a look and see if there is a fix that needs to be applied higher up to enforce this.
Thanks for the debugging and finding the root cause!
There was a problem hiding this comment.
Thank you! There is another issue mentioned in last section, could you help check it as well?
And I think another issue is about validation in
_unsafe_adjust_original_program. Since the partitioned sub graphs in multi-method scenario have already been turned into submodules (like first diagram). That behavior will makeoriginal_program._validate()fail in_unsafe_adjust_original_program. This does not happen in single method lowering because sub graphs are turned into submodules one by one and replaced intoexecutorch_call_delegate. But I cannot find an appropriate way to pass official CI, could you give me some hint? thank you!
There was a problem hiding this comment.
Thank you for helping resolve the issues.
There was a problem hiding this comment.
Which makes getitem node dangling as following, since the names could not be correctly mapped:
I'm curious if DCE can prune the dangling getitem nodes. Have you tried to run the DCE pass right after the mapping is done?
There was a problem hiding this comment.
I think the graph with dangling getitem nodes is not the correct behavior, so it's not about removing them.
30883fd to
c2e6f5a
Compare
|
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
It needs to rebase.. |
Summary - refactor to adopt multi-method change - framwork change to meet use case
8fc3fc5 to
6761ffb
Compare
|
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Forward fix for pytorch#10584 Reviewed By: kirklandsign Differential Revision: D75231046
Summary: Forward fix for pytorch#10584 Reviewed By: digantdesai, kirklandsign Differential Revision: D75231046
Summary
Test plan
python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedUtils.test_qnn_backend_multi_graphs -s $device_sn -b build-android -m SM8750