[CudaGraph] [SOT] Support spliting static graph into piecewise graph with cuda_graph #3478

zyfncg · 2025-08-19T17:19:57Z

支持SOT模式下计算图切分为子图进行CudaGraph捕获运行。

如需根据attention进行子图切分，可以配置FLAGS_cuda_graph_blacklist="custom_op.static_op_append_attention_"

gongshaotian

split 接口后期需要暴露出来，方便 FD 注册 Attention Layer

gongshaotian · 2025-08-21T02:50:10Z

fastdeploy/model_executor/graph_optimization/cudagraph_piecewise_backend.py

            f"[CUDA GRAPH] CUDAGraph capture list {self.cudagraph_capture_sizes}, " "Created all real shape entry."
        )

+    def run_static_model(self, entry: ConcreteSizeEntry, **kwargs):


subgraph 直接在 Paddle 内管理了吗

是的，暴露到python的实现成本会比较高，但功能上没有差别

gongshaotian

麻烦补充一个静态图的单测，覆盖率没有过

…into cuda_graph

…o cuda_graph

Copilot

Pull Request Overview

This PR adds support for splitting static computation graphs into piecewise subgraphs for CUDA Graph capture and execution in SOT (Static-to-Optimized-Transform) mode. It enables CUDA Graph optimization at the subgraph level when graph optimization is enabled.

Key changes:

Introduces a new Dy2StCudaGraphManager class to manage CUDA Graph state transitions for static graph execution
Adds a new execution path run_static_model for handling static model execution with CUDA Graph capture and replay
Integrates the CUDA Graph manager into the existing CudaGraphPiecewiseBackend class

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-26T04:38:48Z

fastdeploy/model_executor/graph_optimization/cudagraph_piecewise_backend.py

+        self.captrued_batch_size = set()
+        self.batch_size = -1
+
+    def run_impl(self, original_run_impl, inputs, parameters, attrs):
+        run_state = self.state
+        prog_attrs, cuda_graph_attrs = attrs
+        if run_state == CUDAGraphState.REPLAY:
+            if self.batch_size not in self.captrued_batch_size:
+                run_state = CUDAGraphState.DISABLE
+        elif run_state == CUDAGraphState.CAPTURE:
+            self.captrued_batch_size.add(self.batch_size)


There's a typo in the variable name. 'captrued_batch_size' should be 'captured_batch_size'.

Suggested change

self.captrued_batch_size = set()

self.batch_size = -1

def run_impl(self, original_run_impl, inputs, parameters, attrs):

run_state = self.state

prog_attrs, cuda_graph_attrs = attrs

if run_state == CUDAGraphState.REPLAY:

if self.batch_size not in self.captrued_batch_size:

run_state = CUDAGraphState.DISABLE

elif run_state == CUDAGraphState.CAPTURE:

self.captrued_batch_size.add(self.batch_size)

self.captured_batch_size = set()

self.batch_size = -1

def run_impl(self, original_run_impl, inputs, parameters, attrs):

run_state = self.state

prog_attrs, cuda_graph_attrs = attrs

if run_state == CUDAGraphState.REPLAY:

if self.batch_size not in self.captured_batch_size:

run_state = CUDAGraphState.DISABLE

elif run_state == CUDAGraphState.CAPTURE:

self.captured_batch_size.add(self.batch_size)

fastdeploy/model_executor/graph_optimization/cudagraph_piecewise_backend.py

Copilot · 2025-08-26T04:38:49Z

fastdeploy/model_executor/graph_optimization/cudagraph_piecewise_backend.py

+        self.captrued_batch_size = set()
+        self.batch_size = -1
+
+    def run_impl(self, original_run_impl, inputs, parameters, attrs):
+        run_state = self.state
+        prog_attrs, cuda_graph_attrs = attrs
+        if run_state == CUDAGraphState.REPLAY:
+            if self.batch_size not in self.captrued_batch_size:
+                run_state = CUDAGraphState.DISABLE
+        elif run_state == CUDAGraphState.CAPTURE:
+            self.captrued_batch_size.add(self.batch_size)


There's a typo in the variable name. 'captrued_batch_size' should be 'captured_batch_size'.

Suggested change

self.captrued_batch_size = set()

self.batch_size = -1

def run_impl(self, original_run_impl, inputs, parameters, attrs):

run_state = self.state

prog_attrs, cuda_graph_attrs = attrs

if run_state == CUDAGraphState.REPLAY:

if self.batch_size not in self.captrued_batch_size:

run_state = CUDAGraphState.DISABLE

elif run_state == CUDAGraphState.CAPTURE:

self.captrued_batch_size.add(self.batch_size)

self.captured_batch_size = set()

self.batch_size = -1

def run_impl(self, original_run_impl, inputs, parameters, attrs):

run_state = self.state

prog_attrs, cuda_graph_attrs = attrs

if run_state == CUDAGraphState.REPLAY:

if self.batch_size not in self.captured_batch_size:

run_state = CUDAGraphState.DISABLE

elif run_state == CUDAGraphState.CAPTURE:

self.captured_batch_size.add(self.batch_size)

Copilot · 2025-08-26T04:38:49Z

fastdeploy/model_executor/graph_optimization/cudagraph_piecewise_backend.py

+    def run_static_model(self, entry: ConcreteSizeEntry, **kwargs):
+        if not entry.captured:
+            # Warmup the model
+            for n in range(entry.num_finished_warmup, self.warm_up_size):
+                entry.num_finished_warmup += 1
+                entry.runnable(**kwargs)


The entry.captured flag is never set to True after capturing is complete. This will cause the warmup and capture logic to run repeatedly on every call instead of transitioning to replay mode.

…se_backend.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

codecov-commenter · 2025-08-27T11:39:58Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@17b414c). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #3478   +/-   ##
==========================================
  Coverage           ?   93.02%           
==========================================
  Files              ?        1           
  Lines              ?       43           
  Branches           ?        7           
==========================================
  Hits               ?       40           
  Misses             ?        1           
  Partials           ?        2

Flag	Coverage Δ
diff	`93.02% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…into cuda_graph

gongshaotian

LGTM

zyfncg and others added 3 commits August 19, 2025 17:00

support spliting static graph into piecewise graph with cuda_graph

d51a0f4

Merge branch 'develop' into cuda_graph

4dbd2de

Merge branch 'develop' into cuda_graph

97cdbd8

gongshaotian reviewed Aug 21, 2025

View reviewed changes

DrRyanHuang mentioned this pull request Aug 25, 2025

[CudaGraph][SOT] Add unit tests for splitting the static graph into piecewise graphs that support cuda_graph #3590

Merged

zyfncg added 2 commits August 26, 2025 04:35

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

0ce8409

…into cuda_graph

Merge branch 'cuda_graph' of https://github.com/zyfncg/FastDeploy int…

99837db

…o cuda_graph

Copilot AI review requested due to automatic review settings August 26, 2025 04:38

Copilot AI reviewed Aug 26, 2025

View reviewed changes

zyfncg and others added 2 commits August 26, 2025 12:51

Update fastdeploy/model_executor/graph_optimization/cudagraph_piecewi…

6bf262c

…se_backend.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Merge branch 'develop' into cuda_graph

7855a80

zyfncg and others added 4 commits August 28, 2025 02:25

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

8ba82e9

…into cuda_graph

fix merge conflict

de463e8

fix bug

53568c6

Merge branch 'develop' into cuda_graph

4bd9c16

gongshaotian approved these changes Aug 29, 2025

View reviewed changes

gongshaotian merged commit f677c03 into PaddlePaddle:develop Aug 29, 2025
15 of 17 checks passed

zyfncg deleted the cuda_graph branch August 29, 2025 08:30

DrRyanHuang mentioned this pull request Oct 14, 2025

[SOT][CUDAGraph] Add support for custom all-reduce operators under SOT mode #4386

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CudaGraph] [SOT] Support spliting static graph into piecewise graph with cuda_graph #3478

[CudaGraph] [SOT] Support spliting static graph into piecewise graph with cuda_graph #3478

Uh oh!

zyfncg commented Aug 19, 2025

Uh oh!

gongshaotian left a comment

Uh oh!

gongshaotian Aug 21, 2025

Uh oh!

zyfncg Aug 21, 2025

Uh oh!

gongshaotian left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 26, 2025

Uh oh!

Uh oh!

Copilot AI Aug 26, 2025

Uh oh!

Copilot AI Aug 26, 2025

Uh oh!

codecov-commenter commented Aug 27, 2025 •

edited

Loading

Uh oh!

gongshaotian left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[CudaGraph] [SOT] Support spliting static graph into piecewise graph with cuda_graph #3478

[CudaGraph] [SOT] Support spliting static graph into piecewise graph with cuda_graph #3478

Uh oh!

Conversation

zyfncg commented Aug 19, 2025

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

gongshaotian Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

zyfncg Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Aug 27, 2025 •

edited

Loading