Skip to content

Conversation

@zeroRains
Copy link
Contributor

@zeroRains zeroRains commented Oct 17, 2025

Motivation

在PD分离场景D节点开启ChunkedPrefill+CUDAGraph下,参数max_num_batched_tokens与max_model_len不相等时,会导致压测过程的显存不断增加,触发OOM。

Modifications

在fastdeploy/config.py的postprocess方法中进行检测D节点是否开启了ChunkedPrefill,是则重置chunked_prefill相关信息

Usage or Command

PD分离启动脚本

# run.sh
ulimit -c 0

for name in `env | grep -E 'PADDLE|ENDPOINT' | awk -F'=' '{print $1}'`; do
  unset ${name}
done

model_path="Models/ERNIE-4.5-21B-A3B-Paddle" # lite模型
prefill_yaml="benchmarks/yaml/eb45-32k-wint4-tp4_prefill.yaml" # 需要同步修改yaml文件中tp为1
decode_yaml="benchmarks/yaml/eb45-32k-wint4-tp4_decode.yaml"
export PYTHONPATH=/workspace/FastDeploy:$PYTHONPATH # fd路径
rm -rf log_prefill
rm -rf log_decode
export ENABLE_V1_KVCACHE_SCHEDULER=0
export CUDA_VISIBLE_DEVICES=4 # P节点使用的卡
export FD_LOG_DIR="log_prefill"
export KVCACHE_RDMA_NICS="mlx5_2,mlx5_3,mlx5_4,mlx5_5"
export KVCACHE_VERBS_CONNECT=1
export KVCACHE_RDMA_GID_INDEX=0
python -m fastdeploy.entrypoints.openai.api_server --model ${model_path} --port 9189 --metrics-port 9399 --config ${prefill_yaml} --scheduler-name "splitwise" --scheduler-host "10.178.32.226" --scheduler-port 6379 --scheduler-ttl 9000 --scheduler-password "scheduler2025" --scheduler-topic pd_test >prefill.log 2>&1 & prefill_process=$!
export CUDA_VISIBLE_DEVICES=5 # D节点使用的卡
export FD_LOG_DIR="log_decode"
python -m fastdeploy.entrypoints.openai.api_server --model ${model_path} --port 9199 --metrics-port 9499 --config ${decode_yaml} --scheduler-name "splitwise" --scheduler-host "10.178.32.226" --scheduler-port 6379 --scheduler-ttl 9000 --scheduler-password "scheduler2025" --scheduler-topic pd_test >decode.log 2>&1 & decode_process=$!

压测脚本

# benchmark.sh
ulimit -c 0

for name in `env | grep -E 'PADDLE|ENDPOINT' | awk -F'=' '{print $1}'`; do
  unset ${name}
done

cd benchmarks

cmd="/miniconda3/envs/fd/bin/python"
text_path="0419_api9_yiyan_spv5_forqianfan_4872_fd" # 数据集路径
request_config="benchmarks/yaml/request_yaml/eb45-32k.yaml" # benchmark的yaml
export PYTHONPATH=/workspace/FastDeploy:$PYTHONPATH # fd路径

$cmd benchmark_serving.py \
  --backend openai-chat \
  --model EB45T \
  --endpoint /v1/chat/completions \
  --host 0.0.0.0 \
  --port 9189 \
  --dataset-name EBChat \
  --dataset-path $text_path \
  --hyperparameter-path $request_config \
  --percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
  --metric-percentiles 80,95,99,99.9,99.95,99.99 \
  --num-prompts 2000  \
  --max-concurrency 100 \
  --save-result 2>&1  | tee infer_log_4872.log

Accuracy Tests

本地压测通过~
image

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Oct 17, 2025

Thanks for your contribution!

@zeroRains zeroRains changed the title [Benchmark][PD Disaggregation] close the chunked_prefill in Decode Node [PD Disaggregation] close the chunked_prefill in Decode Node Oct 17, 2025
@zeroRains zeroRains closed this Oct 17, 2025
@zeroRains
Copy link
Contributor Author

PR #4420 引入后,在D节点不管是否使用chunked_prefill都没有观察到显存增长,本PR暂时关闭~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant