[SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp #3694

DrRyanHuang · 2025-08-28T12:46:27Z

本 PR 依赖 Paddle 主框架的两个PR:

自定义算子部分的改动: [CustomOp] Relax output count checks for inplace outputs Paddle#75086
消除lower过程中的 memcpy: [Dy2St][CUDAGraph] Set undefined place for CUDAGraph OP outputs before lowering to avoid unnecessary memcpy && Add CUDAGraph unitest Paddle#75078

#3302 添加了 append_attention_with_output 但是开启后存在打断，本PR消除 full_cuda_graph=false 时的打断

动态图下运行的 cpp_extensions，都是不需要 key_cache_out 和 value_cache_out 的
本PR移除自定义算子注册的 key_cache_out 与 value_cache_out，与动态图对齐

另外静态图没有 place，SOT转静的时候会打断：

故而移除 .to(qkv.place)

PS: 目前CUDAGraph + 子图切分的脚本：

source /workspace/env/.py3.10/bin/activate
export MODEL=/ssd1/EB_MODELS/ERNIE-4.5-0.3B-Paddle
rm -rf log/*

export http_proxy=xxx.com:xx
export https_proxy=xxx.com:xx
export no_proxy=xxxx

export FLAGS_cuda_graph_blacklist="custom_op.static_op_append_attention_with_output_"
export PYTHONPATH=/workspace/FastDeploy:/workspace/Paddle/build/python:$PYTHONPATH
export CUDA_VISIBLE_DEVICES=3
export PYTHON_EXCUTOR=/workspace/env/.py3.10/bin/python
export PORT=9905

python -m fastdeploy.entrypoints.openai.api_server \
  --model $MODEL \
  --metrics-port 9717 \
  --port 9718 \
  --engine-worker-queue-port 9719 \
  --tensor-parallel-size 1 \
  --max-model-len 32768 \
  --max-num-seqs 128 \
  --quantization wint4 \
  --graph-optimization-config '{"graph_opt_level": 1, "use_cudagraph": true, "full_cuda_graph": false}' \

cc @SigureMo @zyfncg @gongshaotian

paddle-bot · 2025-08-28T12:46:33Z

Thanks for your contribution!

gongshaotian

LGTM

codecov-commenter · 2025-08-28T14:44:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@808b548). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff            @@
##             develop   #3694   +/-   ##
=========================================
  Coverage           ?   0.00%           
=========================================
  Files              ?       1           
  Lines              ?       3           
  Branches           ?       0           
=========================================
  Hits               ?       0           
  Misses             ?       3           
  Partials           ?       0

Flag	Coverage Δ
diff	`0.00% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

SigureMo · 2025-09-15T03:29:36Z

custom_ops/gpu_ops/append_attention.cu

             paddle::Optional("q_norm_weight"),
             paddle::Optional("k_norm_weight")})
-    .Outputs({"fmha_out", "key_cache_out", "value_cache_out"})
-    .SetInplaceMap({{"key_cache", "key_cache_out"},


这里的 inplace map 删掉是符合预期的么？

key_cache 和 value_cache 都是没有用到的输出，可以删除

这个是 append_attention 的OP注册，不是 append_attention_with_output 的注册，append_attention 不是 inplace 的输出，是内部创建的

SigureMo

LGTMeow

DrRyanHuang · 2025-10-13T12:27:30Z

已添加单测

目前 fastdeploy/model_executor/layers/attention/append_attn_backend.py 这几行覆盖不到
是因为 append_attention_with_output 目前测试不到，开启 full_cuda_graph=false 就能测到了，后续CI添加

  --graph-optimization-config '{"graph_opt_level": 1, "use_cudagraph": true, "full_cuda_graph": false}'

cc @gongshaotian @SigureMo

gongshaotian

LGTM

rm inplace info && to(gpu)

211c370

gongshaotian previously approved these changes Aug 28, 2025

View reviewed changes

SigureMo approved these changes Aug 29, 2025

View reviewed changes

update append_attention

896d11d

DrRyanHuang dismissed gongshaotian’s stale review via 896d11d September 4, 2025 08:46

DrRyanHuang changed the title ~~[SOT] Eliminate BreakGraph caused by #3302~~ [SOT] Eliminate BreakGraph caused by #3302 && update CustomOp Sep 4, 2025

DrRyanHuang changed the title ~~[SOT] Eliminate BreakGraph caused by #3302 && update CustomOp~~ [SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp Sep 4, 2025

DrRyanHuang added 6 commits September 4, 2025 16:51

Merge branch 'develop' into append_attention

8118aa2

Merge branch 'develop' into append_attention

18c93e4

Merge branch 'develop' into append_attention

13308a9

Merge branch 'PaddlePaddle:develop' into append_attention

cd6292c

Merge branch 'develop' into append_attention

cdaa659

Merge branch 'develop' into append_attention

ed1efd9

SigureMo reviewed Sep 15, 2025

View reviewed changes

SigureMo approved these changes Sep 15, 2025

View reviewed changes

DrRyanHuang and others added 2 commits October 9, 2025 17:20

Merge branch 'develop' into append_attention

b36163b

unpin paddle version

045e763

SigureMo mentioned this pull request Oct 9, 2025

Remove redundant inplace outputs for append_attention #4340

Merged

Merge branch 'develop' into append_attention

e4fbe29

gongshaotian previously approved these changes Oct 13, 2025

View reviewed changes

add full_cuda_graph=False

0730ed6

DrRyanHuang dismissed gongshaotian’s stale review via 0730ed6 October 13, 2025 12:41

add blank line

bb67747

DrRyanHuang mentioned this pull request Oct 14, 2025

[SOT][CUDAGraph] Add support for custom all-reduce operators under SOT mode #4386

Merged

DrRyanHuang added 3 commits October 14, 2025 16:08

Merge branch 'develop' into append_attention

a5fc6fe

Merge branch 'develop' into append_attention

8d4b17a

Merge branch 'develop' into append_attention

d002ef5

EmmonsCurse approved these changes Oct 17, 2025

View reviewed changes

Jiang-Jia-Jun merged commit 49cea8f into PaddlePaddle:develop Oct 17, 2025
32 of 38 checks passed

DrRyanHuang deleted the append_attention branch October 17, 2025 02:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp #3694

[SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp #3694

Uh oh!

DrRyanHuang commented Aug 28, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Aug 28, 2025

Uh oh!

gongshaotian left a comment

Uh oh!

codecov-commenter commented Aug 28, 2025

Uh oh!

SigureMo Sep 15, 2025

Uh oh!

DrRyanHuang Sep 15, 2025 •

edited

Loading

Uh oh!

SigureMo left a comment

Uh oh!

DrRyanHuang commented Oct 13, 2025 •

edited

Loading

Uh oh!

gongshaotian left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp #3694

[SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp #3694

Uh oh!

Conversation

DrRyanHuang commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Aug 28, 2025

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Aug 28, 2025

Codecov Report

Uh oh!

SigureMo Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

DrRyanHuang Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SigureMo left a comment

Choose a reason for hiding this comment

Uh oh!

DrRyanHuang commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

DrRyanHuang commented Aug 28, 2025 •

edited

Loading

DrRyanHuang Sep 15, 2025 •

edited

Loading

DrRyanHuang commented Oct 13, 2025 •

edited

Loading