Qualcomm AI Engine Direct - [DO NOT MERGE] PTE size and Inference Speed Verification#7569
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7569
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New FailuresAs of commit f228d74 with merge base e00eaea ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
Hi @cccclai, For the runner, I have commented out the EOT condition so it can generate all the tokens, making it easier for us to keep track of the inference speed. I have also sent you the PTE I used via email.
Please let me know if you cannot reproduce or run into any other issues. |
|
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
I'm getting the following perf number with this commit and the .pte shared from you... |
ffd7e8b to
a6aee94
Compare
Summary
This is a draft to verify that hybrid mode models: