Skip to content

Add cuda graph implementation for NV TRT RTX EP#25787

Merged
chilo-ms merged 28 commits intomicrosoft:mainfrom
umangb-09:umang/cuda_graph_msr_main
Aug 27, 2025
Merged

Add cuda graph implementation for NV TRT RTX EP#25787
chilo-ms merged 28 commits intomicrosoft:mainfrom
umangb-09:umang/cuda_graph_msr_main

Conversation

@umangb-09
Copy link
Contributor

Description

This change adds CUDA Graph support to the NV TensorRT RTX Execution Provider (EP).

Motivation and Context

Integrating CUDA Graphs into the NV TRT RTX EP provides:
Lower latency by minimizing per-kernel launch overhead.
Better throughput for repeated inference runs.
Improved efficiency on GPUs with high kernel launches overhead sensitivity.

@jywu-msft jywu-msft requested a review from Copilot August 19, 2025 14:46
@jywu-msft jywu-msft added the ep:NvRTX NV RTX execution provider label Aug 19, 2025
@jywu-msft jywu-msft requested a review from chilo-ms August 19, 2025 14:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive CUDA Graph support to the NV TensorRT RTX Execution Provider to improve inference performance by reducing per-kernel launch overhead and enabling better GPU throughput for repeated inference runs.

  • Implements graph annotation ID-based CUDA graph management for multi-graph support
  • Adds automatic detection and disabling of CUDA graphs for unsupported scenarios (shape tensors)
  • Refactors stream management to support both user-provided and internally created streams

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
nv_execution_provider.h Adds CUDA graph method declarations and per-thread context data structures
nv_execution_provider.cc Implements core CUDA graph logic with capture/replay functionality and stream management
cuda_graph.h Adds overloaded Replay method signature for sync flag support
cuda_graph.cc Implements sync flag support in CUDA graph replay functionality
nv_provider_options.h Updates CUDA graph enable option name for consistency

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@chilo-ms
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@umangb-09
Copy link
Contributor Author

@jywu-msft can you mark this for 1.23 release?

@jywu-msft
Copy link
Member

please fix ClangFormat errors and also resolve conflicts

@umangb-09
Copy link
Contributor Author

please fix ClangFormat errors and also resolve conflicts

FIxed

@chilo-ms
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@chilo-ms
Copy link
Contributor

Please help resolve the conflicts

@chilo-ms
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@chilo-ms chilo-ms closed this Aug 26, 2025
@chilo-ms chilo-ms reopened this Aug 26, 2025
@chilo-ms
Copy link
Contributor

open and reopen for the specific CI to run

@umangb-09
Copy link
Contributor Author

umangb-09 commented Aug 27, 2025

@microsoft-github-policy-service agree company="NVIDIA"

@chilo-ms chilo-ms merged commit 16ae99e into microsoft:main Aug 27, 2025
157 checks passed
snnn pushed a commit that referenced this pull request Aug 28, 2025
### Description
This change adds CUDA Graph support to the NV TensorRT RTX Execution
Provider (EP).

### Motivation and Context
Integrating CUDA Graphs into the NV TRT RTX EP provides:
Lower latency by minimizing per-kernel launch overhead.
Better throughput for repeated inference runs.
Improved efficiency on GPUs with high kernel launches overhead
sensitivity.

---------

Co-authored-by: Maximilian Mueller <maximilianm@nvidia.com>
Co-authored-by: Gaurav Garg <gaugarg@nvidia.com>
snnn pushed a commit that referenced this pull request Aug 29, 2025
- **Relax WeightBiasQuantization constraint for larger QDQ node group
(#25673)**
- **Add cuda graph implementation for NV TRT RTX EP (#25787)**
- **python GPU IO Bindings for NVIDIA  (#25776)**
- **Fixes for DynamicQuantizeMatMul and Attention3D tests (#25814)**
- **Fix a long standing bug on file memory mapping on windows.
(#25833)**
- **Add API for precompiled model compatibility check using just the
compat info (#25841)**
- **Enable ABSL_FLAGS flag registration for onnxruntime_perf_test for
mobile build (#25849)**
- **Add default constructor to Ort::Status. (#25860)**
- #25871
- #25878
- #25884
- #25886
- #25866
@snnn
Copy link
Contributor

snnn commented Aug 30, 2025

The change is added to the release branch

gedoensmax added a commit to gedoensmax/onnxruntime that referenced this pull request Sep 2, 2025
### Description
This change adds CUDA Graph support to the NV TensorRT RTX Execution
Provider (EP).

### Motivation and Context
Integrating CUDA Graphs into the NV TRT RTX EP provides:
Lower latency by minimizing per-kernel launch overhead.
Better throughput for repeated inference runs.
Improved efficiency on GPUs with high kernel launches overhead
sensitivity.

---------

Co-authored-by: Maximilian Mueller <maximilianm@nvidia.com>
Co-authored-by: Gaurav Garg <gaugarg@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:NvRTX NV RTX execution provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants