Add cuda graph implementation for NV TRT RTX EP#25787
Add cuda graph implementation for NV TRT RTX EP#25787chilo-ms merged 28 commits intomicrosoft:mainfrom umangb-09:umang/cuda_graph_msr_main
Conversation
…there is no impact on perf
There was a problem hiding this comment.
Pull Request Overview
This PR adds comprehensive CUDA Graph support to the NV TensorRT RTX Execution Provider to improve inference performance by reducing per-kernel launch overhead and enabling better GPU throughput for repeated inference runs.
- Implements graph annotation ID-based CUDA graph management for multi-graph support
- Adds automatic detection and disabling of CUDA graphs for unsupported scenarios (shape tensors)
- Refactors stream management to support both user-provided and internally created streams
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| nv_execution_provider.h | Adds CUDA graph method declarations and per-thread context data structures |
| nv_execution_provider.cc | Implements core CUDA graph logic with capture/replay functionality and stream management |
| cuda_graph.h | Adds overloaded Replay method signature for sync flag support |
| cuda_graph.cc | Implements sync flag support in CUDA graph replay functionality |
| nv_provider_options.h | Updates CUDA graph enable option name for consistency |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.cc
Outdated
Show resolved
Hide resolved
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
|
@jywu-msft can you mark this for 1.23 release? |
|
please fix ClangFormat errors and also resolve conflicts |
FIxed |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
|
Please help resolve the conflicts |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
|
open and reopen for the specific CI to run |
|
@microsoft-github-policy-service agree company="NVIDIA" |
### Description This change adds CUDA Graph support to the NV TensorRT RTX Execution Provider (EP). ### Motivation and Context Integrating CUDA Graphs into the NV TRT RTX EP provides: Lower latency by minimizing per-kernel launch overhead. Better throughput for repeated inference runs. Improved efficiency on GPUs with high kernel launches overhead sensitivity. --------- Co-authored-by: Maximilian Mueller <maximilianm@nvidia.com> Co-authored-by: Gaurav Garg <gaugarg@nvidia.com>
- **Relax WeightBiasQuantization constraint for larger QDQ node group (#25673)** - **Add cuda graph implementation for NV TRT RTX EP (#25787)** - **python GPU IO Bindings for NVIDIA (#25776)** - **Fixes for DynamicQuantizeMatMul and Attention3D tests (#25814)** - **Fix a long standing bug on file memory mapping on windows. (#25833)** - **Add API for precompiled model compatibility check using just the compat info (#25841)** - **Enable ABSL_FLAGS flag registration for onnxruntime_perf_test for mobile build (#25849)** - **Add default constructor to Ort::Status. (#25860)** - #25871 - #25878 - #25884 - #25886 - #25866
|
The change is added to the release branch |
### Description This change adds CUDA Graph support to the NV TensorRT RTX Execution Provider (EP). ### Motivation and Context Integrating CUDA Graphs into the NV TRT RTX EP provides: Lower latency by minimizing per-kernel launch overhead. Better throughput for repeated inference runs. Improved efficiency on GPUs with high kernel launches overhead sensitivity. --------- Co-authored-by: Maximilian Mueller <maximilianm@nvidia.com> Co-authored-by: Gaurav Garg <gaugarg@nvidia.com>
Description
This change adds CUDA Graph support to the NV TensorRT RTX Execution Provider (EP).
Motivation and Context
Integrating CUDA Graphs into the NV TRT RTX EP provides:
Lower latency by minimizing per-kernel launch overhead.
Better throughput for repeated inference runs.
Improved efficiency on GPUs with high kernel launches overhead sensitivity.