Skip to content

Conversation

@zyfncg
Copy link
Collaborator

@zyfncg zyfncg commented Aug 19, 2025

支持SOT模式下计算图切分为子图进行CudaGraph捕获运行。

如需根据attention进行子图切分,可以配置FLAGS_cuda_graph_blacklist="custom_op.static_op_append_attention_"

Copy link
Collaborator

@gongshaotian gongshaotian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split 接口后期需要暴露出来,方便 FD 注册 Attention Layer

f"[CUDA GRAPH] CUDAGraph capture list {self.cudagraph_capture_sizes}, " "Created all real shape entry."
)

def run_static_model(self, entry: ConcreteSizeEntry, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subgraph 直接在 Paddle 内管理了吗

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,暴露到python的实现成本会比较高,但功能上没有差别

Copy link
Collaborator

@gongshaotian gongshaotian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

麻烦补充一个静态图的单测,覆盖率没有过

Copilot AI review requested due to automatic review settings August 26, 2025 04:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for splitting static computation graphs into piecewise subgraphs for CUDA Graph capture and execution in SOT (Static-to-Optimized-Transform) mode. It enables CUDA Graph optimization at the subgraph level when graph optimization is enabled.

Key changes:

  • Introduces a new Dy2StCudaGraphManager class to manage CUDA Graph state transitions for static graph execution
  • Adds a new execution path run_static_model for handling static model execution with CUDA Graph capture and replay
  • Integrates the CUDA Graph manager into the existing CudaGraphPiecewiseBackend class

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 56 to 66
self.captrued_batch_size = set()
self.batch_size = -1

def run_impl(self, original_run_impl, inputs, parameters, attrs):
run_state = self.state
prog_attrs, cuda_graph_attrs = attrs
if run_state == CUDAGraphState.REPLAY:
if self.batch_size not in self.captrued_batch_size:
run_state = CUDAGraphState.DISABLE
elif run_state == CUDAGraphState.CAPTURE:
self.captrued_batch_size.add(self.batch_size)
Copy link

Copilot AI Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo in the variable name. 'captrued_batch_size' should be 'captured_batch_size'.

Suggested change
self.captrued_batch_size = set()
self.batch_size = -1
def run_impl(self, original_run_impl, inputs, parameters, attrs):
run_state = self.state
prog_attrs, cuda_graph_attrs = attrs
if run_state == CUDAGraphState.REPLAY:
if self.batch_size not in self.captrued_batch_size:
run_state = CUDAGraphState.DISABLE
elif run_state == CUDAGraphState.CAPTURE:
self.captrued_batch_size.add(self.batch_size)
self.captured_batch_size = set()
self.batch_size = -1
def run_impl(self, original_run_impl, inputs, parameters, attrs):
run_state = self.state
prog_attrs, cuda_graph_attrs = attrs
if run_state == CUDAGraphState.REPLAY:
if self.batch_size not in self.captured_batch_size:
run_state = CUDAGraphState.DISABLE
elif run_state == CUDAGraphState.CAPTURE:
self.captured_batch_size.add(self.batch_size)

Copilot uses AI. Check for mistakes.
Comment on lines 56 to 66
self.captrued_batch_size = set()
self.batch_size = -1

def run_impl(self, original_run_impl, inputs, parameters, attrs):
run_state = self.state
prog_attrs, cuda_graph_attrs = attrs
if run_state == CUDAGraphState.REPLAY:
if self.batch_size not in self.captrued_batch_size:
run_state = CUDAGraphState.DISABLE
elif run_state == CUDAGraphState.CAPTURE:
self.captrued_batch_size.add(self.batch_size)
Copy link

Copilot AI Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo in the variable name. 'captrued_batch_size' should be 'captured_batch_size'.

Suggested change
self.captrued_batch_size = set()
self.batch_size = -1
def run_impl(self, original_run_impl, inputs, parameters, attrs):
run_state = self.state
prog_attrs, cuda_graph_attrs = attrs
if run_state == CUDAGraphState.REPLAY:
if self.batch_size not in self.captrued_batch_size:
run_state = CUDAGraphState.DISABLE
elif run_state == CUDAGraphState.CAPTURE:
self.captrued_batch_size.add(self.batch_size)
self.captured_batch_size = set()
self.batch_size = -1
def run_impl(self, original_run_impl, inputs, parameters, attrs):
run_state = self.state
prog_attrs, cuda_graph_attrs = attrs
if run_state == CUDAGraphState.REPLAY:
if self.batch_size not in self.captured_batch_size:
run_state = CUDAGraphState.DISABLE
elif run_state == CUDAGraphState.CAPTURE:
self.captured_batch_size.add(self.batch_size)

Copilot uses AI. Check for mistakes.
Comment on lines +102 to +107
def run_static_model(self, entry: ConcreteSizeEntry, **kwargs):
if not entry.captured:
# Warmup the model
for n in range(entry.num_finished_warmup, self.warm_up_size):
entry.num_finished_warmup += 1
entry.runnable(**kwargs)
Copy link

Copilot AI Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entry.captured flag is never set to True after capturing is complete. This will cause the warmup and capture logic to run repeatedly on every call instead of transitioning to replay mode.

Copilot uses AI. Check for mistakes.
zyfncg and others added 2 commits August 26, 2025 12:51
…se_backend.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@codecov-commenter
Copy link

codecov-commenter commented Aug 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@17b414c). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #3478   +/-   ##
==========================================
  Coverage           ?   93.02%           
==========================================
  Files              ?        1           
  Lines              ?       43           
  Branches           ?        7           
==========================================
  Hits               ?       40           
  Misses             ?        1           
  Partials           ?        2           
Flag Coverage Δ
diff 93.02% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@gongshaotian gongshaotian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gongshaotian gongshaotian merged commit f677c03 into PaddlePaddle:develop Aug 29, 2025
15 of 17 checks passed
@zyfncg zyfncg deleted the cuda_graph branch August 29, 2025 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants