Add cuda graph implementation for NV TRT RTX EP by umangb-09 · Pull Request #25787 · microsoft/onnxruntime

umangb-09 · 2025-08-19T07:55:56Z

Description

This change adds CUDA Graph support to the NV TensorRT RTX Execution Provider (EP).

Motivation and Context

Integrating CUDA Graphs into the NV TRT RTX EP provides:
Lower latency by minimizing per-kernel launch overhead.
Better throughput for repeated inference runs.
Improved efficiency on GPUs with high kernel launches overhead sensitivity.

…nges

…there is no impact on perf

Copilot

Pull Request Overview

This PR adds comprehensive CUDA Graph support to the NV TensorRT RTX Execution Provider to improve inference performance by reducing per-kernel launch overhead and enabling better GPU throughput for repeated inference runs.

Implements graph annotation ID-based CUDA graph management for multi-graph support
Adds automatic detection and disabling of CUDA graphs for unsupported scenarios (shape tensors)
Refactors stream management to support both user-provided and internally created streams

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
nv_execution_provider.h	Adds CUDA graph method declarations and per-thread context data structures
nv_execution_provider.cc	Implements core CUDA graph logic with capture/replay functionality and stream management
cuda_graph.h	Adds overloaded Replay method signature for sync flag support
cuda_graph.cc	Implements sync flag support in CUDA graph replay functionality
nv_provider_options.h	Updates CUDA graph enable option name for consistency

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.cc

chilo-ms · 2025-08-20T22:23:01Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines · 2025-08-20T22:23:22Z

Azure Pipelines successfully started running 5 pipeline(s).

umangb-09 · 2025-08-21T04:58:59Z

@jywu-msft can you mark this for 1.23 release?

jywu-msft · 2025-08-22T04:56:21Z

please fix ClangFormat errors and also resolve conflicts

umangb-09 · 2025-08-22T06:00:13Z

please fix ClangFormat errors and also resolve conflicts

FIxed

chilo-ms · 2025-08-22T16:07:27Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines · 2025-08-22T16:07:48Z

Azure Pipelines successfully started running 5 pipeline(s).

onnxruntime/core/providers/cuda/cuda_graph.cc

onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.cc

onnxruntime/core/providers/cuda/cuda_graph.h

chilo-ms · 2025-08-26T00:13:07Z

Please help resolve the conflicts

chilo-ms · 2025-08-26T15:46:09Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines · 2025-08-26T15:46:34Z

Azure Pipelines successfully started running 5 pipeline(s).

chilo-ms · 2025-08-26T23:22:37Z

open and reopen for the specific CI to run

umangb-09 · 2025-08-27T07:22:03Z

@microsoft-github-policy-service agree company="NVIDIA"

### Description This change adds CUDA Graph support to the NV TensorRT RTX Execution Provider (EP). ### Motivation and Context Integrating CUDA Graphs into the NV TRT RTX EP provides: Lower latency by minimizing per-kernel launch overhead. Better throughput for repeated inference runs. Improved efficiency on GPUs with high kernel launches overhead sensitivity. --------- Co-authored-by: Maximilian Mueller <maximilianm@nvidia.com> Co-authored-by: Gaurav Garg <gaugarg@nvidia.com>

- **Relax WeightBiasQuantization constraint for larger QDQ node group (#25673)** - **Add cuda graph implementation for NV TRT RTX EP (#25787)** - **python GPU IO Bindings for NVIDIA (#25776)** - **Fixes for DynamicQuantizeMatMul and Attention3D tests (#25814)** - **Fix a long standing bug on file memory mapping on windows. (#25833)** - **Add API for precompiled model compatibility check using just the compat info (#25841)** - **Enable ABSL_FLAGS flag registration for onnxruntime_perf_test for mobile build (#25849)** - **Add default constructor to Ort::Status. (#25860)** - #25871 - #25878 - #25884 - #25886 - #25866

snnn · 2025-08-30T05:32:34Z

The change is added to the release branch

### Description This change adds CUDA Graph support to the NV TensorRT RTX Execution Provider (EP). ### Motivation and Context Integrating CUDA Graphs into the NV TRT RTX EP provides: Lower latency by minimizing per-kernel launch overhead. Better throughput for repeated inference runs. Improved efficiency on GPUs with high kernel launches overhead sensitivity. --------- Co-authored-by: Maximilian Mueller <maximilianm@nvidia.com> Co-authored-by: Gaurav Garg <gaugarg@nvidia.com>

gedoensmax and others added 22 commits August 19, 2025 13:21

TRT RTX workspace allocation using ORT Arena allocator

8f4dae0

Reduce CPU overhead of TRT-RTX EP's compute function

37757a3

Adding ORT framework side for CUDA graph in Nv TensorRT RTX

1efc1b5

Enabled CUDA Graph for NV TRT RTX EP

d922edc

Comments addressed

ce054e4

Comments addressed

d4e737a

Added ORT framework support functions and removed cuda_graph file cha…

fd2f0a6

…nges

fixed comments

d689574

removing this chnage as it is breaking the functional validation and …

c141b5a

…there is no impact on perf

Fixed gauravs comments

9a4f393

Fixed flag and context creation

f8e7ca3

Fixed flag name

50f905f

First commit Fix

2ce8f1c

First Initial fix commit

49179b4

Fix for CG with rebase

9a42683

Fixed Max and Ishwars comments

6869151

Post-rebase change

32e6d42

Fixed comments and rebase error

d293b0a

Fixed comment and reverted cuda_graph_enable flag

9e477bc

Fixed comments and recapture break bug

4555198

Fixed comments for false addition and new comments

c27ffda

Fixed sync and cuda_graph flag for ORTGenAI

b96d7a6

jywu-msft requested a review from Copilot August 19, 2025 14:46

jywu-msft added the ep:NvRTX NV RTX execution provider label Aug 19, 2025

jywu-msft requested a review from chilo-ms August 19, 2025 14:47

Copilot AI reviewed Aug 19, 2025

View reviewed changes

Fixed copilot comments

a75ac43

jywu-msft added the release:1.23.0 label Aug 22, 2025

Fixed clang-format errors

49a8b7f

chilo-ms reviewed Aug 22, 2025

View reviewed changes

onnxruntime/core/providers/cuda/cuda_graph.cc Show resolved Hide resolved

chilo-ms reviewed Aug 22, 2025

View reviewed changes

onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.cc Show resolved Hide resolved

The Replay function CUDA EP Commnet

a376b3c

gaugarg-nv reviewed Aug 25, 2025

View reviewed changes

onnxruntime/core/providers/cuda/cuda_graph.h Outdated Show resolved Hide resolved

umangb-09 added 3 commits August 26, 2025 09:02

Fixed the function definition style

127e1ad

Fixed the rebase error

c4fd393

Merge branch 'main' into umang/cuda_graph_msr_main

a91ea79

chilo-ms approved these changes Aug 26, 2025

View reviewed changes

chilo-ms closed this Aug 26, 2025

chilo-ms reopened this Aug 26, 2025

chilo-ms merged commit 16ae99e into microsoft:main Aug 27, 2025
157 checks passed

snnn mentioned this pull request Aug 28, 2025

Cherry-picks for 1.23.0 release #25889

Merged

snnn removed the release:1.23.0 label Aug 30, 2025

Conversation

umangb-09 commented Aug 19, 2025

Description

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chilo-ms commented Aug 20, 2025

Uh oh!

azure-pipelines bot commented Aug 20, 2025

Uh oh!

umangb-09 commented Aug 21, 2025

Uh oh!

jywu-msft commented Aug 22, 2025

Uh oh!

umangb-09 commented Aug 22, 2025

Uh oh!

chilo-ms commented Aug 22, 2025

Uh oh!

azure-pipelines bot commented Aug 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chilo-ms commented Aug 26, 2025

Uh oh!

chilo-ms commented Aug 26, 2025

Uh oh!

azure-pipelines bot commented Aug 26, 2025

Uh oh!

chilo-ms commented Aug 26, 2025

Uh oh!

umangb-09 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

snnn commented Aug 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

umangb-09 commented Aug 27, 2025 •

edited

Loading