-
Notifications
You must be signed in to change notification settings - Fork 693
[BugFix][Speculative Decoding](Spend many dyas to solve)Fix write qknorm cache bug in speculative decoding #5491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
这个PR修复了speculative decoding中写入qknorm缓存的一个bug。主要变更是添加了seq_lens_encoder参数,用于区分encoder和decoder阶段的处理。
主要变更
- 在函数签名中添加
seq_lens_encoder参数,以便在speculative decoding的cache写入过程中正确判断是否处于encoder阶段 - 将原来的条件
seq_lens_decoder[ori_bi] == 0替换为seq_lens_encoder[ori_bi] > 0,以更准确地判断encoder阶段 - 这个变更在多个CUDA kernel中保持一致
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| custom_ops/gpu_ops/append_attn/speculate_write_cache_with_rope_kernel.cu | 在三个函数调用中添加seq_lens_encoder参数,将新参数正确传递给底层kernel实现 |
| custom_ops/gpu_ops/append_attn/speculate_write_cache_with_rope_impl.cuh | 在三个kernel函数中添加seq_lens_encoder参数定义,并更新条件判断逻辑来正确处理encoder阶段;但在第一个kernel中存在使用return而非continue的逻辑错误 |
Note regarding PR metadata: 标题中包含拼写错误("dyas"应为"days"),且PR描述不够详细,未能充分说明此bug的具体表现、根本原因以及修复方案。建议补充更详细的说明以便于代码审查和后续维护。
| const int ori_bi = batch_id_per_token[token_id]; | ||
| if (ori_bi == -1) continue; // NOTE(gongshaotian): For CUDAGraph padding | ||
| if (seq_lens_decoder[ori_bi] == 0) continue; | ||
| if (seq_lens_encoder[ori_bi] > 0) return; |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic inconsistency detected: This function uses return when seq_lens_encoder[ori_bi] > 0, but the same condition on lines 385 and 515 in similar kernels uses continue.
The return statement will exit the entire kernel for the thread, not just skip the current iteration. In a CUDA kernel loop, this is likely incorrect behavior - you probably want continue here to skip to the next iteration, consistent with the other two locations.
This inconsistency could cause incorrect behavior where threads terminate prematurely instead of continuing to process subsequent elements.
| if (seq_lens_encoder[ori_bi] > 0) return; | |
| if (seq_lens_encoder[ori_bi] > 0) continue; |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #5491 +/- ##
==========================================
Coverage ? 60.31%
==========================================
Files ? 329
Lines ? 41058
Branches ? 6244
==========================================
Hits ? 24766
Misses ? 14406
Partials ? 1886
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| const int* block_table_now = block_tables + ori_bi * max_blocks_per_seq; | ||
| const int block_idx = block_table_now[write_seq_id / block_size]; | ||
| if (block_idx < 0) { | ||
| return; // NOTE(gongshaotian): For CUDAGraph padding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有压测过吗,这里的continue会导致graph 挂掉吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里用continue,为啥会导致挂掉?
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.