-
Notifications
You must be signed in to change notification settings - Fork 693
[SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp #3694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp #3694
Conversation
|
Thanks for your contribution! |
gongshaotian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #3694 +/- ##
=========================================
Coverage ? 0.00%
=========================================
Files ? 1
Lines ? 3
Branches ? 0
=========================================
Hits ? 0
Misses ? 3
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| paddle::Optional("q_norm_weight"), | ||
| paddle::Optional("k_norm_weight")}) | ||
| .Outputs({"fmha_out", "key_cache_out", "value_cache_out"}) | ||
| .SetInplaceMap({{"key_cache", "key_cache_out"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的 inplace map 删掉是符合预期的么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key_cache 和 value_cache 都是没有用到的输出,可以删除
这个是 append_attention 的OP注册,不是 append_attention_with_output 的注册,append_attention 不是 inplace 的输出,是内部创建的
SigureMo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
已添加单测 目前 |
gongshaotian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
本 PR 依赖 Paddle 主框架的两个PR:
memcpy&& Add CUDAGraph unitest Paddle#75078#3302 添加了
append_attention_with_output但是开启后存在打断,本PR消除full_cuda_graph=false时的打断动态图下运行的 cpp_extensions,都是不需要
key_cache_out和value_cache_out的本PR移除自定义算子注册的
key_cache_out与value_cache_out,与动态图对齐另外静态图没有 place,SOT转静的时候会打断:
故而移除
.to(qkv.place)PS: 目前CUDAGraph + 子图切分的脚本:
cc @SigureMo @zyfncg @gongshaotian