[V1 Loader] Support qwen2(bf16) #3502

zeroRains · 2025-08-20T17:21:01Z

pcard-71500

注意：V1 Loader用到了Paddle develop版本对set_value和copy_的新特性，因此需要使用Paddle develop或者3.2(暂未发布)及以上的版本。

本PR为Qwen2模型适配V1 Loader加载。
目前在bf16下验证，精度与develop对齐，且保证旧版和新版的load都能正常工作。
接入V1 Loader后，在Qwen2-7B-Instruct中，TP4的模型加载时内存占用下降为原本的29%，加载时间持平。

修改内容：

在Qwen2模型中新增load_weights方法，以适配V1 loader。
在QKVParallelLinear中新增bias_loader方法，以便在bias调用weight_loader时，也有对应的方法执行。

测试记录：
模型：Qwen2-7B-Instruct
性能记录

指标	旧版loader	V1 Loader
TP4内存占用（GB）	17	5
TP4模型加载时间（s）	4.41	4.41

ps: 内存占用指从fd开始执行端到端推理到结束出现的内存峰值。模型加载时间是指各个rank的worker.log中记录的模型加载时间的最大值。

精度记录

Qwen2-7B-Instruct tp1 bf16 结果对齐
base bf16 : ? I am a large language model created by Alibaba Cloud. I am called Qwen.\n\nCan you generate a poem about the beauty of nature?
pr   bf16 : ? I am a large language model created by Alibaba Cloud. I am called Qwen.\n\nCan you generate a poem about the beauty of nature?

tp4 bf16 结果对齐
base bf16 : ? I am a large language model created by Alibaba Cloud. I am called Qwen.\n\nCan you generate a poem about the beauty of nature?
pr   bf16 : ? I am a large language model created by Alibaba Cloud. I am called Qwen.\n\nCan you generate a poem about the beauty of nature?

测试脚本：

# run_demo.sh

export CUDA_VISIBLE_DEVICES=0,1,2,3
export PYTHONPATH=/workspace/miniconda3/envs/py310:$PYTHONPATH
export PYTHONPATH=/workspace/FastDeploy:$PYTHONPATH
python offline_demo.py

# offline_demo.py
from fastdeploy.engine.sampling_params import SamplingParams
from fastdeploy.entrypoints.llm import LLM

model_name_or_path = "/workspace/Models/Qwen2-7B-Instruct"
sampling_params = SamplingParams(temperature=0.1, max_tokens=30, top_p=0)
# load_choices取值"default_v1"是新版Loader，"default"是旧版
llm = LLM(model=model_name_or_path,num_gpu_blocks_override=1024, tensor_parallel_size=4, load_choices="default_v1")
output = llm.generate(prompts="who are you",
                      use_tqdm=True,
                      sampling_params=sampling_params)
print(output)

paddle-bot · 2025-08-20T17:21:06Z

Thanks for your contribution!

yuanlehome · 2025-08-21T03:08:23Z

加一个CI单测吧

zeroRains · 2025-08-21T03:13:45Z

加一个CI单测吧

这个跑端到端模型推理会好处理一些，直接将这部分抽离出来写单测，不太好弄。

…into qwen

support qwen2(bf16)

e722cb3

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

0a1a2ac

…into qwen

zeroRains force-pushed the qwen branch 3 times, most recently from 49f324a to 0a1a2ac Compare August 21, 2025 13:57

lizexu123 mentioned this pull request Aug 22, 2025

[Features] support hugging face qwen3 dense and qwen2 model #3449

Closed

merge bias_loader and weight_loader

d264ce5

yuanlehome approved these changes Aug 22, 2025

View reviewed changes

zeroRains mentioned this pull request Aug 22, 2025

[V1 Loader] support weight_only #3413

Merged

Jiang-Jia-Jun approved these changes Aug 22, 2025

View reviewed changes

Jiang-Jia-Jun merged commit 79f0dbb into PaddlePaddle:develop Aug 22, 2025
13 of 16 checks passed

zeroRains deleted the qwen branch August 23, 2025 04:49

DrRyanHuang mentioned this pull request Nov 10, 2025

[BugFix][Models] Add tie_word_embeddings for lmhead #4916

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1 Loader] Support qwen2(bf16) #3502

[V1 Loader] Support qwen2(bf16) #3502

Uh oh!

zeroRains commented Aug 20, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Aug 20, 2025

Uh oh!

yuanlehome commented Aug 21, 2025

Uh oh!

zeroRains commented Aug 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[V1 Loader] Support qwen2(bf16) #3502

[V1 Loader] Support qwen2(bf16) #3502

Uh oh!

Conversation

zeroRains commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Aug 20, 2025

Uh oh!

yuanlehome commented Aug 21, 2025

Uh oh!

zeroRains commented Aug 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zeroRains commented Aug 20, 2025 •

edited

Loading