[Features] support hugging face qwen3 dense and qwen2 model #3449

lizexu123 · 2025-08-17T09:28:42Z

Fastdeploy支持hugging face qwen2系列，qwen3 dense模型模型

使用示例:
目前仅在load_choices为default_v1时，才支持加载hugging face模型，我们根据模型config.json中是否有torch_dtype来判断是加载的hugging face模型还是paddle模型
服务方式:

启动服务方式:
python -m fastdeploy.entrypoints.openai.api_server --model ${model_path} \
    --max-num-seqs 256 --max-model-len 32768 \
    --port 9032 --engine-worker-queue-port 7102 \
    --metrics-port 7203 --tensor-parallel-size 2 \
    --gpu-memory-utilization 0.9 \
    --load_choices "default_v1" \

请求:
import openai

ip = "0.0.0.0"
service_http_port = "9032"  # 服务配置的

client = openai.Client(base_url=f"http://{ip}:{service_http_port}/v1", api_key="EMPTY_API_KEY")
response = client.chat.completions.create(
    model="default",
    messages=[
        {"role": "user", "content": "北京天安门在哪里?"},
    ],
    temperature=1,
    seed=1,
    stream=False,
    max_tokens=30,
)

print(response.choices[0].message.content)
print("\n")

离线方式:

offline_demo.py
from fastdeploy.engine.sampling_params import SamplingParams
from fastdeploy.entrypoints.llm import LLM

model_name_or_path = "/workspace/Models/Qwen2.5-7B-Instruct"
sampling_params = SamplingParams(temperature=0.1, max_tokens=30, top_p=0)
# load_choices取值"default_v1"是新版Loader，"default"是旧版
llm = LLM(model=model_name_or_path,num_gpu_blocks_override=1024, tensor_parallel_size=2, load_choices="default_v1")
output = llm.generate(prompts="who are you",
                      use_tqdm=True,
                      sampling_params=sampling_params)
print(output)

paddle-bot · 2025-08-17T09:28:48Z

Thanks for your contribution!

…into torch_qwen

yuanlehome · 2025-08-22T04:37:28Z

...i_use/Hugging_Face/Qwen2_5-7B-Instruct_offline/test_Qwen2_5-7B-Instruct_v1_loader_offline.py

@@ -0,0 +1,222 @@
+# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.


这两个单测文件删了，用彬涵那个单测形式，你这里直接copyV0的整个文件改，有大量重复冗余代码，之后V0上新增单测，V1感知不到

bukejiyu · 2025-08-22T04:43:36Z

fastdeploy/model_executor/layers/linear.py

            )
            param.copy_(loaded_weight, False)

+    def bias_loader(self, param, loaded_weight, loaded_shard_id: Optional[str] = None):


这个bias 可以和上面的weight 和一起吧，只是一个2维一个 1维

bukejiyu · 2025-08-22T04:47:23Z

fastdeploy/model_executor/layers/linear.py

+            if self.nranks != 1:
+                hugging_face_format = self.fd_config.load_config.hugging_face_format
+                if hugging_face_format:
+                    loaded_weight = loaded_weight.transpose([1, 0])


bias是1维的吧

已删除这里判断

bukejiyu · 2025-08-22T08:10:34Z

fastdeploy/model_executor/layers/lm_head.py

                if self.nranks > 1:
                    set_weight_attrs(self.linear.weight, {"output_dim": False})

+    def weight_loader(self, param, loaded_weight, loaded_shard_id: Optional[str] = None):


这个是不是可以删了？

qingqing01 · 2025-08-22T09:36:39Z

fastdeploy/model_executor/models/utils.py

    return fn


+def slice_fn(weight_or_paramter, output_dim, start, end, step=1):


新增的函数需要对所有入参增加注解，比如 output_dim 比较难理解

Jiang-Jia-Jun · 2025-08-22T17:10:43Z

fastdeploy/config.py

        self.is_unified_ckpt = check_unified_ckpt(self.model)

        self.override_name_from_config()
+        self.read_model_config()


这里model_format全部依赖自动判断了吗，如果某个模型config.json中没有torch_dtype，就无法被正确识别？也没法强制指定这个模型为torch模型？

support hugging_face

e0018a6

lizexu123 added 5 commits August 17, 2025 19:26

merge develop

58643a5

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

d8ae441

…into torch_qwen

support hugging face qwen3-0.6B

5cb6865

fix note

d719624

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

7258fd5

…into torch_qwen

lizexu123 changed the title ~~[Features] support hugging_face~~ [Features] support hugging_face qwen3 dense model Aug 20, 2025

lizexu123 changed the title ~~[Features] support hugging_face qwen3 dense model~~ [Features] support hugging face qwen3 dense model Aug 20, 2025

lizexu123 added 6 commits August 20, 2025 20:21

support tp>1 qwen3-0.6B

19b20f5

support qwen2 tp=1

aceb072

support qwen2 tp>1

c439c1d

add test

b4e9e03

fix hugging_face_format

ce44304

Simplify the code

108f6e0

lizexu123 changed the title ~~[Features] support hugging face qwen3 dense model~~ [Features] support hugging face qwen3 dense and qwen2 model Aug 21, 2025

lizexu123 added 4 commits August 22, 2025 11:24

merge develop

9df9180

merge develop

8b5fe23

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

79fec15

…into torch_qwen

merge develop and add load_choices

ef11260

yuanlehome reviewed Aug 22, 2025

View reviewed changes

lizexu123 added 5 commits August 22, 2025 15:59

Simplify the code and modify unit tests.

48c5268

fix

7f9b955

simplify the code

2a226ec

simplify the code

bf1fb74

simplify the code

9332f99

bukejiyu reviewed Aug 22, 2025

View reviewed changes

lizexu123 added 3 commits August 22, 2025 16:13

test_common_model.py

4701bf1

fix

86b3b71

review

2d70dc7

add slice_fn

7d7e72f

qingqing01 requested a review from Jiang-Jia-Jun August 22, 2025 09:33

qingqing01 reviewed Aug 22, 2025

View reviewed changes

lizexu123 added 6 commits August 22, 2025 17:47

add note

c96ab32

update note

86fab36

Automatically load the Hugging Face model.

6007751

delete print

bf0b0d5

delete #

814ceb6

delete unused code

e153ee5

Jiang-Jia-Jun reviewed Aug 22, 2025

View reviewed changes

lizexu123 closed this Aug 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Features] support hugging face qwen3 dense and qwen2 model #3449

[Features] support hugging face qwen3 dense and qwen2 model #3449

Uh oh!

lizexu123 commented Aug 17, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Aug 17, 2025

Uh oh!

yuanlehome Aug 22, 2025

Uh oh!

lizexu123 Aug 22, 2025

Uh oh!

bukejiyu Aug 22, 2025

Uh oh!

lizexu123 Aug 22, 2025

Uh oh!

bukejiyu Aug 22, 2025

Uh oh!

lizexu123 Aug 22, 2025

Uh oh!

bukejiyu Aug 22, 2025

Uh oh!

lizexu123 Aug 22, 2025

Uh oh!

qingqing01 Aug 22, 2025

Uh oh!

lizexu123 Aug 22, 2025

Uh oh!

Jiang-Jia-Jun Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		@@ -0,0 +1,222 @@
		# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.

		return fn


		def slice_fn(weight_or_paramter, output_dim, start, end, step=1):

[Features] support hugging face qwen3 dense and qwen2 model #3449

[Features] support hugging face qwen3 dense and qwen2 model #3449

Uh oh!

Conversation

lizexu123 commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Aug 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lizexu123 commented Aug 17, 2025 •

edited

Loading