[XPU] add speculate_get_logits by RuohengMa · Pull Request #5497 · PaddlePaddle/FastDeploy

RuohengMa · 2025-12-11T01:48:08Z

Motivation

add speculate_get_logits

Modifications

add speculate_get_logits

Usage or Command

No

Accuracy Tests

No

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-12-11T01:48:14Z

Thanks for your contribution!

hong19860320 · 2025-12-11T01:51:40Z

custom_ops/xpu_ops/src/ops/mtp/speculate_get_logits.cc

+  baidu::xpu::api::Context* ctx =
+      static_cast<const phi::XPUContext*>(dev_ctx)->x_context();
+  if (draft_logits.is_cpu()) {
+    ctx = new baidu::xpu::api::Context(baidu::xpu::api::kCPU);


这个 context 什么时候被销毁掉呢？是否会造成内存泄露？

CPU一般是用于单测验证，除了这里其他的算子可能也没对这个cpu的ctx做释放，后续可能需要统一排查一下

hong19860320

LGTM

cmcamdy

LGTM

codecov-commenter · 2025-12-11T03:50:24Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@9f4512c). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5497   +/-   ##
==========================================
  Coverage           ?   60.27%           
==========================================
  Files              ?      329           
  Lines              ?    41114           
  Branches           ?     6261           
==========================================
  Hits               ?    24782           
  Misses             ?    14443           
  Partials           ?     1889

Flag	Coverage Δ
GPU	`60.27% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mayang002 · 2025-12-11T05:58:37Z

custom_ops/xpu_ops/src/plugin/src/wrapper/mtp_wrapper/speculate_get_logits.cpp

+                      seq_lens_encoder,
+                      real_bsz,
+                      vocab_size);
+  WRAPPER_DUMP(ctx);


对输入输出的 xpu 指针加下 chekc 检查？

mayang002 · 2025-12-11T06:06:30Z

custom_ops/xpu_ops/src/plugin/src/kernel/kunlun3cpp/mtp_kernel/speculate_get_logits.xpu

+  if (clus_id < real_bsz && cid == 0) {
+    GM2SM_ASYNC(seq_lens_encoder, sm_seq_lens_encoder, real_bsz * sizeof(int));
+    GM2SM(seq_lens_this_time, sm_seq_lens_this_time, real_bsz * sizeof(int));
+    int next_token_num_previous = 0;
+    for (int bid = 0; bid < real_bsz; bid++) {
+      sm_batch_token_num[bid] =
+          sm_seq_lens_encoder[bid] > 0 ? 2 : sm_seq_lens_this_time[bid];
+      if (bid == 0) {
+        sm_cu_batch_token_offset[bid] = 0;
+        sm_cu_next_token_offset[bid] = 0;
+      } else {
+        sm_cu_batch_token_offset[bid] =
+            sm_cu_batch_token_offset[bid - 1] + sm_batch_token_num[bid - 1];
+        sm_cu_next_token_offset[bid] =
+            sm_cu_next_token_offset[bid - 1] + next_token_num_previous;
+      }
+      next_token_num_previous =
+          sm_seq_lens_encoder[bid] > 0 ? 1 : sm_seq_lens_this_time[bid];
+    }
+    mfence_sm();
+    if (clus_id == 0) {
+      SM2GM_ASYNC(sm_batch_token_num, batch_token_num, real_bsz * sizeof(int));
+      SM2GM_ASYNC(sm_cu_batch_token_offset,
+                  cu_batch_token_offset,
+                  real_bsz * sizeof(int));
+    }
+  }


这个部分的代码逻辑，21行-40行是每个 cluster 都会执行，41-46行只有 cluster0 会执行，是不是等价于21行-46行的代码实际上只有 clus_id == 0 执行的有用？

这里的prefix sum所有参与计算的cluster都需要使用，所以21-40所有< real_bsz的cluster都要计算一份；但是写回gm的话，只要一个cluster写就行，所以41-46只要cluster0

这里的prefix sum所有参与计算的cluster都需要使用，所以21-40所有< real_bsz的cluster都要计算一份；但是写回gm的话，只要一个cluster写就行，所以41-46只要cluster0

好的，明白了

RuohengMa and others added 6 commits December 5, 2025 09:00

[XPU] add speculate_step_system_cache

8f06f7e

Merge branch 'develop' into develop

ef359ae

[XPU] add speculate_step_system_cache

e0bec56

Merge branch 'develop' into develop

902beb2

Merge branch 'PaddlePaddle:develop' into develop

9f40fc6

[XPU] add speculate_get_logits

d5d8c7c

paddle-bot bot added the XPU label Dec 11, 2025

hong19860320 requested changes Dec 11, 2025

View reviewed changes

cmcamdy previously approved these changes Dec 11, 2025

View reviewed changes

delete context

c8fccdc

RuohengMa dismissed cmcamdy’s stale review via c8fccdc December 11, 2025 02:12

hong19860320 previously approved these changes Dec 11, 2025

View reviewed changes

cmcamdy previously approved these changes Dec 11, 2025

View reviewed changes

mayang002 suggested changes Dec 11, 2025

View reviewed changes

EmmonsCurse and others added 2 commits December 11, 2025 14:22

Merge branch 'develop' into sgl

635126f

add ptr check

8ad13e7

RuohengMa dismissed stale reviews from hong19860320 and cmcamdy via 8ad13e7 December 11, 2025 06:47

mayang002 approved these changes Dec 11, 2025

View reviewed changes

Jiang-Jia-Jun merged commit 12c76f8 into PaddlePaddle:develop Dec 12, 2025
13 of 17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU] add speculate_get_logits#5497

[XPU] add speculate_get_logits#5497
Jiang-Jia-Jun merged 9 commits intoPaddlePaddle:developfrom
RuohengMa:sgl

RuohengMa commented Dec 11, 2025

Uh oh!

paddle-bot bot commented Dec 11, 2025

Uh oh!

hong19860320 Dec 11, 2025

Uh oh!

cmcamdy Dec 11, 2025

Uh oh!

hong19860320 left a comment

Uh oh!

cmcamdy left a comment

Uh oh!

codecov-commenter commented Dec 11, 2025 •

edited

Loading

Uh oh!

mayang002 Dec 11, 2025

Uh oh!

RuohengMa Dec 11, 2025

Uh oh!

mayang002 Dec 11, 2025

Uh oh!

RuohengMa Dec 11, 2025

Uh oh!

mayang002 Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

RuohengMa commented Dec 11, 2025

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Dec 11, 2025

Uh oh!

hong19860320 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

cmcamdy Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

hong19860320 left a comment

Choose a reason for hiding this comment

Uh oh!

cmcamdy left a comment

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mayang002 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

RuohengMa Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

mayang002 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

RuohengMa Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

mayang002 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

codecov-commenter commented Dec 11, 2025 •

edited

Loading