[PD Disaggregation] Add device_id to distinguish the pipelines of sending kv signals in different services. by juncaipeng · Pull Request #5508 · PaddlePaddle/FastDeploy

juncaipeng · 2025-12-11T09:34:24Z

Motivation

多个p 服务需要基于 device_id 来区分发送 kv signal 的管道

Modifications

管道标记中考虑 device_id

Usage or Command

不变

Accuracy Tests

单侧覆盖

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-12-11T09:34:33Z

Thanks for your contribution!

juncaipeng · 2025-12-11T09:35:59Z

custom_ops/gpu_ops/open_shm_and_get_meta_signal.cc

                          const paddle::Tensor &seq_lens_this_time_tensor,
                          const paddle::Tensor &seq_lens_decoder_tensor,
                          const int rank,
+                          const int device_id,


只加了 device_id，其他都是 format自动修改

juncaipeng · 2025-12-11T09:36:28Z

custom_ops/gpu_ops/get_output_ep.cc

 #define MAX_BSZ 512
 void GetOutputKVSignal(const paddle::Tensor& x,
                       int64_t rank_id,
+                       int64_t device_id,


只加了 device_id，其他都是 format自动修改

Copilot

Pull request overview

This PR adds a device_id parameter to distinguish KV signal communication pipelines for different services in the PD (Prefix Disaggregation) feature. Multiple P services can now operate independently on different devices without pipeline conflicts.

Key Changes:

Thread device_id parameter through all attention backend layers (Flash, MLA, XPU, Append)
Update IPC message queue ID calculation to incorporate device_id: msg_id = 1024 + 1000 * device_id + rank
Add device_id parameter to C++ custom ops for both GPU and XPU platforms

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
fastdeploy/model_executor/layers/attention/*.py	Add device_id parameter to init_kv_signal_per_query calls in all attention backends
fastdeploy/model_executor/layers/attention/ops/init_kv_signal_per_query.py	Update function signature to accept device_id parameter
fastdeploy/cache_manager/cache_messager.py	Pass gpu_id to get_output_kv_signal and add debug logging
custom_ops/xpu_ops/src/ops/remote_cache_kv_ipc.h	Update msg_id calculation to include device_id
custom_ops/xpu_ops/src/ops/*.cc	Add device_id parameter to function signatures and pybind declarations
custom_ops/gpu_ops/remote_cache_kv_ipc.h	Update msg_id calculation to include device_id
custom_ops/gpu_ops/*.cc	Add device_id parameter, update pybind, and apply code formatting improvements

Copilot · 2025-12-11T09:37:40Z

custom_ops/gpu_ops/open_shm_and_get_meta_signal.cc

 #include "paddle/extension.h"
-#include "remote_cache_kv_ipc.h"
 #include "paddle/phi/core/allocator.h"
 #include "paddle/phi/core/dense_tensor.h"
+#include "remote_cache_kv_ipc.h"


The include order has been changed - "remote_cache_kv_ipc.h" was moved after the paddle includes. While this change appears to follow a more standard convention (system/library headers before local headers), ensure this doesn't introduce any compilation issues due to dependencies between headers.

Copilot · 2025-12-11T09:37:41Z

custom_ops/xpu_ops/src/ops/remote_cache_kv_ipc.h

      if (!inited) {
        // just init once
-        const int msg_id = 1024 + rank;
+        const int msg_id = 1024 + 1000 * device_id + rank;


The message queue ID calculation uses 1024 + 1000 * device_id + rank. This formula could lead to collisions if rank >= 1000 or if there are many devices. For example:

device_id=0, rank=1024 would give msg_id=2048

device_id=1, rank=24 would also give msg_id=2048

Consider using a safer calculation formula such as 1024 + device_id * 10000 + rank or documenting the valid ranges for device_id and rank to prevent collisions.

Suggested change

const int msg_id = 1024 + 1000 * device_id + rank;

// Use a larger multiplier to avoid collisions between device_id and rank

const int msg_id = 1024 + device_id * 10000 + rank;

Copilot · 2025-12-11T09:37:41Z

custom_ops/gpu_ops/remote_cache_kv_ipc.h

      if (!inited) {
        // just init once
-        const int msg_id = 1024 + rank;
+        const int msg_id = 1024 + 1000 * device_id + rank;


The message queue ID calculation uses 1024 + 1000 * device_id + rank. This formula could lead to collisions if rank >= 1000 or if there are many devices. For example:

device_id=0, rank=1024 would give msg_id=2048

device_id=1, rank=24 would also give msg_id=2048

Consider using a safer calculation formula such as 1024 + device_id * 10000 + rank or documenting the valid ranges for device_id and rank to prevent collisions.

Suggested change

const int msg_id = 1024 + 1000 * device_id + rank;

const int msg_id = 1024 + device_id * 10000 + rank;

Copilot · 2025-12-11T09:37:42Z

custom_ops/xpu_ops/src/ops/get_output.cc

+                       int64_t device_id,
                       bool wait_flag) {
-  int msg_queue_id = 1024 + rank_id;
+  int msg_queue_id = 1024 + 1000 * device_id + rank_id;


The message queue ID calculation uses 1024 + 1000 * device_id + rank_id. This formula could lead to collisions if rank_id >= 1000 or if there are many devices. For example:

device_id=0, rank_id=1024 would give msg_queue_id=2048

device_id=1, rank_id=24 would also give msg_queue_id=2048

Consider using a safer calculation formula such as 1024 + device_id * 10000 + rank_id or documenting the valid ranges for device_id and rank_id to prevent collisions.

Suggested change

int msg_queue_id = 1024 + 1000 * device_id + rank_id;

int msg_queue_id = 1024 + device_id * 10000 + rank_id;

Copilot · 2025-12-11T09:37:42Z

custom_ops/gpu_ops/get_output_ep.cc

-    static struct msgdatakv msg_rcv;
-    static key_t key = ftok("/opt/", msg_queue_id);
-    static int msgid = msgget(key, IPC_CREAT | 0666);
+  int msg_queue_id = 1024 + 1000 * device_id + rank_id;


The message queue ID calculation uses 1024 + 1000 * device_id + rank_id. This formula could lead to collisions if rank_id >= 1000 or if there are many devices. For example:

device_id=0, rank_id=1024 would give msg_queue_id=2048

device_id=1, rank_id=24 would also give msg_queue_id=2048

Consider using a safer calculation formula such as 1024 + device_id * 10000 + rank_id or documenting the valid ranges for device_id and rank_id to prevent collisions.

Suggested change

int msg_queue_id = 1024 + 1000 * device_id + rank_id;

int msg_queue_id = 1024 + device_id * 10000 + rank_id;

codecov-commenter · 2025-12-11T10:58:26Z

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@510b821). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/cache_manager/cache_messager.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5508   +/-   ##
==========================================
  Coverage           ?   60.27%           
==========================================
  Files              ?      329           
  Lines              ?    41117           
  Branches           ?     6261           
==========================================
  Hits               ?    24782           
  Misses             ?    14445           
  Partials           ?     1890

Flag	Coverage Δ
GPU	`60.27% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ddchenhao66

LGTM

liyonghua0910 · 2025-12-11T12:07:11Z

fastdeploy/cache_manager/cache_messager.py

        while True:
            try:
-                get_output_kv_signal(kv_signal_data, self.rank_id, 0)  # wait_flag
+                get_output_kv_signal(kv_signal_data, self.rank_id, self.gpu_id, 0)  # wait_flag


有个比较小众的问题，这里传 gpu_id 的话好像无法区分同一张卡上的不同 P 实例

juncaipeng · 2025-12-11T13:00:11Z

refer to #5514

Use device_id to distinguish the pipelines of different services.

061d4d3

Copilot AI review requested due to automatic review settings December 11, 2025 09:34

Copilot started reviewing on behalf of juncaipeng December 11, 2025 09:34 View session

juncaipeng commented Dec 11, 2025

View reviewed changes

Copilot AI reviewed Dec 11, 2025

View reviewed changes

ddchenhao66 approved these changes Dec 11, 2025

View reviewed changes

liyonghua0910 reviewed Dec 11, 2025

View reviewed changes

juncaipeng closed this Dec 11, 2025

	const int msg_id = 1024 + 1000 * device_id + rank;
	// Use a larger multiplier to avoid collisions between device_id and rank
	const int msg_id = 1024 + device_id * 10000 + rank;

	int msg_queue_id = 1024 + 1000 * device_id + rank_id;
	int msg_queue_id = 1024 + device_id * 10000 + rank_id;

Conversation

juncaipeng commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Dec 11, 2025

Uh oh!

juncaipeng Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

juncaipeng Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Dec 11, 2025

Codecov Report

Uh oh!

ddchenhao66 left a comment

Choose a reason for hiding this comment

Uh oh!

liyonghua0910 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

juncaipeng commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

juncaipeng commented Dec 11, 2025 •

edited

Loading