Skip to content

Multi-partition support for context binary cache feature#18865

Merged
HectorSVC merged 32 commits intomainfrom
qnn_ctx_multi_partition_support
Feb 1, 2024
Merged

Multi-partition support for context binary cache feature#18865
HectorSVC merged 32 commits intomainfrom
qnn_ctx_multi_partition_support

Conversation

@HectorSVC
Copy link
Contributor

Description

Multi-partition support for context binary cache feature

  1. Add EP level API GetEpContextNodes to get the list of EPContext nodes pointer
  2. Move some provider options to session option for ep_context_enable, ep_context_file_path, ep_context_embed_mode
  3. In QNNEP create the list of EPContext nodes if ep_context_enable is enabled, so that it can dump the model with multiple partitions
  4. Extend context loading part to support multiple EPContext nodes
  5. Enhance the context cache validation part to support multiple EPContext nodes

Motivation and Context

It only support single partition before this changes. There's graph partition limitation for context cache feature after this change.

-- Validate QNN context model with graph partition result from QDQ model
-- In Compile(), load the QNN context model, get all the EPContext node, create QNN context from context binary, create QNN graph from the binary, and execute
…th, qnn_context_embed_mode. Add session option accordingly.
@HectorSVC HectorSVC added the ep:QNN issues related to QNN exeution provider label Dec 18, 2023
HectorSVC added a commit that referenced this pull request Dec 20, 2023
Move QNN EP provider options to session options

### Description
Need to use session option to support multi-partition for context cache feature. To smooth the transaction, move the provider options to session options first.

This is the first step for PR:
PR #18865
HectorSVC added a commit that referenced this pull request Jan 19, 2024
… is not guaranteed (#19195)

Fix issue that the generated context cache model inputs/outputs order is not guaranteed

### Description
Currently, QNN EP generate the context cache model in Compile() method which only get access to the partitioned graph. And the inputs/outputs order for the partitioned graph is not guaranteed. And EP doesn't have the view of the input user model. Have to move the context cache model generation to a higher level in GraphPartitioner which has the view of the partitioned model.
This is also a break down of PR for multi-partition support.
#18865
YUNQIUGUO pushed a commit that referenced this pull request Jan 23, 2024
… is not guaranteed (#19195)

Fix issue that the generated context cache model inputs/outputs order is not guaranteed

### Description
Currently, QNN EP generate the context cache model in Compile() method which only get access to the partitioned graph. And the inputs/outputs order for the partitioned graph is not guaranteed. And EP doesn't have the view of the input user model. Have to move the context cache model generation to a higher level in GraphPartitioner which has the view of the partitioned model.
This is also a break down of PR for multi-partition support.
#18865
@HectorSVC HectorSVC merged commit 0fa88bc into main Feb 1, 2024
@HectorSVC HectorSVC deleted the qnn_ctx_multi_partition_support branch February 1, 2024 23:04
rohan11235813 pushed a commit to quadric-io/onnxruntime that referenced this pull request Aug 19, 2025
… is not guaranteed (#19195)

Fix issue that the generated context cache model inputs/outputs order is not guaranteed

### Description
Currently, QNN EP generate the context cache model in Compile() method which only get access to the partitioned graph. And the inputs/outputs order for the partitioned graph is not guaranteed. And EP doesn't have the view of the input user model. Have to move the context cache model generation to a higher level in GraphPartitioner which has the view of the partitioned model.
This is also a break down of PR for multi-partition support.
microsoft/onnxruntime#18865
rohan11235813 pushed a commit to quadric-io/onnxruntime that referenced this pull request Sep 15, 2025
… is not guaranteed (#19195)

Fix issue that the generated context cache model inputs/outputs order is not guaranteed

### Description
Currently, QNN EP generate the context cache model in Compile() method which only get access to the partitioned graph. And the inputs/outputs order for the partitioned graph is not guaranteed. And EP doesn't have the view of the input user model. Have to move the context cache model generation to a higher level in GraphPartitioner which has the view of the partitioned model.
This is also a break down of PR for multi-partition support.
microsoft/onnxruntime#18865
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:QNN issues related to QNN exeution provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants