Skip to content

Set shared memory type based on options during the compilation phase#24196

Merged
HectorSVC merged 1 commit intomicrosoft:mainfrom
CodeLinaro:dev/ashigarg/qmem
Apr 26, 2025
Merged

Set shared memory type based on options during the compilation phase#24196
HectorSVC merged 1 commit intomicrosoft:mainfrom
CodeLinaro:dev/ashigarg/qmem

Conversation

@quic-ashigarg
Copy link
Contributor

@quic-ashigarg quic-ashigarg commented Mar 26, 2025

Description

During inference, using the QNN EP option to set enable_htp_shared_memory_allocator gives a hint that we use RPC allocated buffers to avoid buffer copy between CPU and NPU.

With the current PR, we add hints in the compilation phase that if RPC memory is going to be used, any additional allocations done on the CPU can be avoided.

Motivation and Context

This should help reduce the peak CPU memory consumption while running AI work loads using shared memory.

Related PR: #23136

@HectorSVC
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows ARM64 QNN CI Pipeline,Linux Android Emulator QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link
Contributor

@edgchen1 edgchen1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During inference, using the QNN EP option to set enable_htp_shared_memory_allocator ensures that we use RPC allocated buffers to avoid buffer copy between CPU and NPU.

Technically, it does not ensure shared buffer usage. enable_htp_shared_memory_allocator only makes the allocator available. It is up to the user to use the allocator or not for graph inputs and outputs.

@HectorSVC HectorSVC added the ep:QNN issues related to QNN exeution provider label Mar 31, 2025
@yuslepukhin
Copy link
Member

Please, be advised of this PR: #24371

@HectorSVC
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows ARM64 QNN CI Pipeline,Linux Android Emulator QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@HectorSVC
Copy link
Contributor

/azp run Win_TRT_Minimal_CUDA_Test_C, Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@HectorSVC
Copy link
Contributor

/azp run Win_TRT_Minimal_CUDA_Test_CI

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@HectorSVC HectorSVC merged commit 138a3a3 into microsoft:main Apr 26, 2025
68 of 71 checks passed
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request May 12, 2025
…icrosoft#24196)

### Description
During inference, using the QNN EP option to set enable_htp_shared_memory_allocator gives a hint that we use RPC allocated buffers to avoid buffer copy between CPU and NPU.

With the current PR, we add hints in the compilation phase that if RPC memory is going to be used, any additional  allocations done on the CPU can be avoided.

### Motivation and Context
This should help reduce the peak CPU memory consumption while running AI work loads using shared memory.

Related PR: microsoft#23136

Co-authored-by: Ashish Garg (AISW) <ashigarg@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:QNN issues related to QNN exeution provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants