Set shared memory type based on options during the compilation phase#24196
Set shared memory type based on options during the compilation phase#24196HectorSVC merged 1 commit intomicrosoft:mainfrom
Conversation
|
/azp run Linux QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows ARM64 QNN CI Pipeline,Linux Android Emulator QNN CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
edgchen1
left a comment
There was a problem hiding this comment.
During inference, using the QNN EP option to set enable_htp_shared_memory_allocator ensures that we use RPC allocated buffers to avoid buffer copy between CPU and NPU.
Technically, it does not ensure shared buffer usage. enable_htp_shared_memory_allocator only makes the allocator available. It is up to the user to use the allocator or not for graph inputs and outputs.
|
Please, be advised of this PR: #24371 |
5d5db8a to
1faa0d4
Compare
1faa0d4 to
d21cec8
Compare
|
/azp run Linux QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows ARM64 QNN CI Pipeline,Linux Android Emulator QNN CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
|
/azp run Win_TRT_Minimal_CUDA_Test_C, Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run Win_TRT_Minimal_CUDA_Test_CI |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…icrosoft#24196) ### Description During inference, using the QNN EP option to set enable_htp_shared_memory_allocator gives a hint that we use RPC allocated buffers to avoid buffer copy between CPU and NPU. With the current PR, we add hints in the compilation phase that if RPC memory is going to be used, any additional allocations done on the CPU can be avoided. ### Motivation and Context This should help reduce the peak CPU memory consumption while running AI work loads using shared memory. Related PR: microsoft#23136 Co-authored-by: Ashish Garg (AISW) <ashigarg@qti.qualcomm.com>
Description
During inference, using the QNN EP option to set enable_htp_shared_memory_allocator gives a hint that we use RPC allocated buffers to avoid buffer copy between CPU and NPU.
With the current PR, we add hints in the compilation phase that if RPC memory is going to be used, any additional allocations done on the CPU can be avoided.
Motivation and Context
This should help reduce the peak CPU memory consumption while running AI work loads using shared memory.
Related PR: #23136