Add with_output version AppendAttention #3302

Lmywl · 2025-08-11T03:20:55Z

背景：cudagraph 捕获过程中的张量地址管理
目的：将attention模块的输出前置，便于cudagraph捕获时的张量地址处理

paddle-bot · 2025-08-11T03:21:01Z

Thanks for your contribution!

custom_ops/gpu_ops/append_attention.cu

fastdeploy/model_executor/layers/attention/append_attn_backend.py

fastdeploy/model_executor/layers/attention/ops/append_attention.py

fastdeploy/model_executor/layers/attention/append_attn_backend.py

gongshaotian · 2025-08-11T07:16:21Z

麻烦再丰富一下PR描述，说明一下改造的背景、目标

test/layers/test_append_attention.py

custom_ops/gpu_ops/append_attention.cu

fastdeploy/model_executor/layers/attention/append_attn_backend.py

fastdeploy/model_executor/layers/attention/ops/append_attention.py

fastdeploy/model_executor/layers/attention/append_attn_backend.py

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

…into append_attn_pr

DrRyanHuang

除了下面俩直接能改的，还有 append_attention 申明为自定义算子的这部分，Outputs 也要改一下
因为咱 append_attention 不是不需要输出 qkv_out 了嘛，所以删掉它

PD_BUILD_STATIC_OP(append_attention)
    .Inputs({"qkv",
    ......
    .Outputs({"fmha_out", "qkv_out", "key_cache_out", "value_cache_out"})  # <--- 这一行
    .SetInplaceMap({{"key_cache", "key_cache_out"},

改成

    .Outputs({"fmha_out", "key_cache_out", "value_cache_out"})

PS: 自定义算子的注册出问题总是直接抛出这种异常，后续主框架也要添加更多上下文信息

terminate called after throwing an instance of 'std:bad_array_new_length'
	what(): std: ： bad_array_new_length

cc @zyfncg @SigureMo

DrRyanHuang · 2025-08-28T04:58:47Z

custom_ops/gpu_ops/cpp_extensions.cc

+    const paddle::Tensor &decoder_tile_ids_per_batch,
+    const paddle::Tensor &decoder_num_blocks,
+    const paddle::Tensor &set_max_lengths, const paddle::Tensor &max_len_kv,
+    paddle::Tensor &res,


Suggested change

paddle::Tensor &res,

paddle::Tensor &fmha_out,

DrRyanHuang · 2025-08-28T05:02:21Z

custom_ops/gpu_ops/append_attention.cu

+    .Attrs({"compute_type: std::string",
+            "cache_quant_type: std::string",
+            "use_neox_rotary_style: bool",
+            "rope_3d: bool",
+            "max_input_length: int",
+            "quant_max_bound: float",
+            "quant_min_bound: float",
+            "out_linear_in_scale: float",
+            "encoder_block_shape_q: int",
+            "decoder_block_shape_q: int",
+            "max_partition_size: int",
+            "encoder_max_partition_size: int",
+            "speculate_max_draft_token_num: int",
+            "causal: bool",
+            "speculate_decoder: bool",
+            "rms_norm_eps: float"})


这里把 rms_norm_eps 的顺序往前移动一下

Suggested change

.Attrs({"compute_type: std::string",

"cache_quant_type: std::string",

"use_neox_rotary_style: bool",

"rope_3d: bool",

"max_input_length: int",

"quant_max_bound: float",

"quant_min_bound: float",

"out_linear_in_scale: float",

"encoder_block_shape_q: int",

"decoder_block_shape_q: int",

"max_partition_size: int",

"encoder_max_partition_size: int",

"speculate_max_draft_token_num: int",

"causal: bool",

"speculate_decoder: bool",

"rms_norm_eps: float"})

.Attrs({"rms_norm_eps: float",

"compute_type: std::string",

"cache_quant_type: std::string",

"use_neox_rotary_style: bool",

"rope_3d: bool",

"max_input_length: int",

"quant_max_bound: float",

"quant_min_bound: float",

"out_linear_in_scale: float",

"encoder_block_shape_q: int",

"decoder_block_shape_q: int",

"max_partition_size: int",

"encoder_max_partition_size: int",

"speculate_max_draft_token_num: int",

"causal: bool",

"speculate_decoder: bool",

})

gongshaotian

LGTM

* rm inplace info && to(gpu) * update append_attention * unpin paddle version * add full_cuda_graph=False * add blank line --------- Co-authored-by: SigureMo <sigure.qaq@gmail.com>

gongshaotian requested review from yuanlehome and zhoutianzi666 August 11, 2025 06:58

gongshaotian reviewed Aug 11, 2025

View reviewed changes

custom_ops/gpu_ops/append_attention.cu Show resolved Hide resolved

yuanlehome reviewed Aug 11, 2025

View reviewed changes

fastdeploy/model_executor/layers/attention/append_attn_backend.py Show resolved Hide resolved

gongshaotian reviewed Aug 11, 2025

View reviewed changes

fastdeploy/model_executor/layers/attention/ops/append_attention.py Outdated Show resolved Hide resolved

gongshaotian reviewed Aug 11, 2025

View reviewed changes

fastdeploy/model_executor/layers/attention/append_attn_backend.py Show resolved Hide resolved

fastdeploy/model_executor/layers/attention/append_attn_backend.py Show resolved Hide resolved

Lmywl force-pushed the append_attn_pr branch from 68d80b1 to ecbc5fb Compare August 11, 2025 07:37

gongshaotian reviewed Aug 11, 2025

View reviewed changes

test/layers/test_append_attention.py Show resolved Hide resolved

Lmywl force-pushed the append_attn_pr branch from ecbc5fb to b977c0e Compare August 11, 2025 10:54

lizhenyun01 reviewed Aug 12, 2025

View reviewed changes

custom_ops/gpu_ops/append_attention.cu Show resolved Hide resolved

gongshaotian reviewed Aug 12, 2025

View reviewed changes

fastdeploy/model_executor/layers/attention/append_attn_backend.py Outdated Show resolved Hide resolved

gongshaotian reviewed Aug 12, 2025

View reviewed changes

fastdeploy/model_executor/layers/attention/ops/append_attention.py Show resolved Hide resolved

get use_output from fd_config

8572b8a

Lmywl force-pushed the append_attn_pr branch from 01a0957 to 8572b8a Compare August 12, 2025 07:31

Lmywl added 2 commits August 12, 2025 15:39

add clear TODO description

19109e4

resolve conflict

735299f

Lmywl force-pushed the append_attn_pr branch from 69a7c45 to 735299f Compare August 14, 2025 12:49

Lmywl added 2 commits August 14, 2025 22:13

add mask_offset para to align with develop

233d133

fix bug

892963d

gongshaotian reviewed Aug 18, 2025

View reviewed changes

fastdeploy/model_executor/layers/attention/append_attn_backend.py Outdated Show resolved Hide resolved

Lmywl added 2 commits August 18, 2025 10:58

fix use_output logic

4e82af8

resolve conficts

971f81e

gongshaotian approved these changes Aug 21, 2025

View reviewed changes

YuanRisheng previously approved these changes Aug 21, 2025

View reviewed changes

YuanRisheng added the skip-ci: coverage label Aug 26, 2025

Copilot AI review requested due to automatic review settings August 26, 2025 08:02

Copilot AI reviewed Aug 26, 2025

View reviewed changes

gongshaotian previously approved these changes Aug 26, 2025

View reviewed changes

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

22038e6

…into append_attn_pr

DrRyanHuang requested changes Aug 28, 2025

View reviewed changes

fix sot bug

440a44d

Lmywl dismissed stale reviews from gongshaotian and YuanRisheng via 440a44d August 28, 2025 06:21

Lmywl force-pushed the append_attn_pr branch from ab70c78 to 440a44d Compare August 28, 2025 06:21

DrRyanHuang approved these changes Aug 28, 2025

View reviewed changes

gongshaotian approved these changes Aug 28, 2025

View reviewed changes

gongshaotian merged commit e93d4cf into PaddlePaddle:develop Aug 28, 2025
13 of 16 checks passed

DrRyanHuang mentioned this pull request Aug 28, 2025

[SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp #3694

Merged

DrRyanHuang mentioned this pull request Oct 14, 2025

[SOT][CUDAGraph] Add support for custom all-reduce operators under SOT mode #4386

Merged

Add with_output version AppendAttention #3302

Add with_output version AppendAttention #3302

Uh oh!

Conversation

Lmywl commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gongshaotian commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

DrRyanHuang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DrRyanHuang Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

DrRyanHuang Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Lmywl commented Aug 11, 2025 •

edited

Loading

DrRyanHuang left a comment •

edited

Loading