[PD Disaggregation] close the chunked_prefill in Decode Node #4468

zeroRains · 2025-10-17T04:24:20Z

Motivation

在PD分离场景D节点开启ChunkedPrefill+CUDAGraph下，参数max_num_batched_tokens与max_model_len不相等时，会导致压测过程的显存不断增加，触发OOM。

Modifications

在fastdeploy/config.py的postprocess方法中进行检测D节点是否开启了ChunkedPrefill，是则重置chunked_prefill相关信息

Usage or Command

PD分离启动脚本

# run.sh
ulimit -c 0

for name in `env | grep -E 'PADDLE|ENDPOINT' | awk -F'=' '{print $1}'`; do
  unset ${name}
done

model_path="Models/ERNIE-4.5-21B-A3B-Paddle" # lite模型
prefill_yaml="benchmarks/yaml/eb45-32k-wint4-tp4_prefill.yaml" # 需要同步修改yaml文件中tp为1
decode_yaml="benchmarks/yaml/eb45-32k-wint4-tp4_decode.yaml"
export PYTHONPATH=/workspace/FastDeploy:$PYTHONPATH # fd路径
rm -rf log_prefill
rm -rf log_decode
export ENABLE_V1_KVCACHE_SCHEDULER=0
export CUDA_VISIBLE_DEVICES=4 # P节点使用的卡
export FD_LOG_DIR="log_prefill"
export KVCACHE_RDMA_NICS="mlx5_2,mlx5_3,mlx5_4,mlx5_5"
export KVCACHE_VERBS_CONNECT=1
export KVCACHE_RDMA_GID_INDEX=0
python -m fastdeploy.entrypoints.openai.api_server --model ${model_path} --port 9189 --metrics-port 9399 --config ${prefill_yaml} --scheduler-name "splitwise" --scheduler-host "10.178.32.226" --scheduler-port 6379 --scheduler-ttl 9000 --scheduler-password "scheduler2025" --scheduler-topic pd_test >prefill.log 2>&1 & prefill_process=$!
export CUDA_VISIBLE_DEVICES=5 # D节点使用的卡
export FD_LOG_DIR="log_decode"
python -m fastdeploy.entrypoints.openai.api_server --model ${model_path} --port 9199 --metrics-port 9499 --config ${decode_yaml} --scheduler-name "splitwise" --scheduler-host "10.178.32.226" --scheduler-port 6379 --scheduler-ttl 9000 --scheduler-password "scheduler2025" --scheduler-topic pd_test >decode.log 2>&1 & decode_process=$!

压测脚本

# benchmark.sh
ulimit -c 0

for name in `env | grep -E 'PADDLE|ENDPOINT' | awk -F'=' '{print $1}'`; do
  unset ${name}
done

cd benchmarks

cmd="/miniconda3/envs/fd/bin/python"
text_path="0419_api9_yiyan_spv5_forqianfan_4872_fd" # 数据集路径
request_config="benchmarks/yaml/request_yaml/eb45-32k.yaml" # benchmark的yaml
export PYTHONPATH=/workspace/FastDeploy:$PYTHONPATH # fd路径

$cmd benchmark_serving.py \
  --backend openai-chat \
  --model EB45T \
  --endpoint /v1/chat/completions \
  --host 0.0.0.0 \
  --port 9189 \
  --dataset-name EBChat \
  --dataset-path $text_path \
  --hyperparameter-path $request_config \
  --percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
  --metric-percentiles 80,95,99,99.9,99.95,99.99 \
  --num-prompts 2000  \
  --max-concurrency 100 \
  --save-result 2>&1  | tee infer_log_4872.log

Accuracy Tests

本地压测通过～

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-10-17T04:24:26Z

Thanks for your contribution!

zeroRains · 2025-10-17T08:05:56Z

PR #4420 引入后，在D节点不管是否使用chunked_prefill都没有观察到显存增长，本PR暂时关闭～

close the chunked_prefill in Decode Node

498c722

zeroRains changed the title ~~[Benchmark][PD Disaggregation] close the chunked_prefill in Decode Node~~ [PD Disaggregation] close the chunked_prefill in Decode Node Oct 17, 2025

zeroRains closed this Oct 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PD Disaggregation] close the chunked_prefill in Decode Node #4468

[PD Disaggregation] close the chunked_prefill in Decode Node #4468

Uh oh!

zeroRains commented Oct 17, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Oct 17, 2025

Uh oh!

zeroRains commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[PD Disaggregation] close the chunked_prefill in Decode Node #4468

[PD Disaggregation] close the chunked_prefill in Decode Node #4468

Uh oh!

Conversation

zeroRains commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Oct 17, 2025

Uh oh!

zeroRains commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zeroRains commented Oct 17, 2025 •

edited

Loading