Skip to content

Conversation

@lizexu123
Copy link
Collaborator

@lizexu123 lizexu123 commented Aug 17, 2025

Fastdeploy支持hugging face qwen2系列,qwen3 dense模型模型

使用示例:
目前仅在load_choices为default_v1时,才支持加载hugging face模型,我们根据模型config.json中是否有torch_dtype来判断是加载的hugging face模型还是paddle模型
服务方式:

启动服务方式:
python -m fastdeploy.entrypoints.openai.api_server --model ${model_path} \
    --max-num-seqs 256 --max-model-len 32768 \
    --port 9032 --engine-worker-queue-port 7102 \
    --metrics-port 7203 --tensor-parallel-size 2 \
    --gpu-memory-utilization 0.9 \
    --load_choices "default_v1" \

请求:
import openai

ip = "0.0.0.0"
service_http_port = "9032"  # 服务配置的

client = openai.Client(base_url=f"http://{ip}:{service_http_port}/v1", api_key="EMPTY_API_KEY")
response = client.chat.completions.create(
    model="default",
    messages=[
        {"role": "user", "content": "北京天安门在哪里?"},
    ],
    temperature=1,
    seed=1,
    stream=False,
    max_tokens=30,
)

print(response.choices[0].message.content)
print("\n")


离线方式:

offline_demo.py
from fastdeploy.engine.sampling_params import SamplingParams
from fastdeploy.entrypoints.llm import LLM

model_name_or_path = "/workspace/Models/Qwen2.5-7B-Instruct"
sampling_params = SamplingParams(temperature=0.1, max_tokens=30, top_p=0)
# load_choices取值"default_v1"是新版Loader,"default"是旧版
llm = LLM(model=model_name_or_path,num_gpu_blocks_override=1024, tensor_parallel_size=2, load_choices="default_v1")
output = llm.generate(prompts="who are you",
                      use_tqdm=True,
                      sampling_params=sampling_params)
print(output)

@paddle-bot
Copy link

paddle-bot bot commented Aug 17, 2025

Thanks for your contribution!

@lizexu123 lizexu123 changed the title [Features] support hugging_face [Features] support hugging_face qwen3 dense model Aug 20, 2025
@lizexu123 lizexu123 changed the title [Features] support hugging_face qwen3 dense model [Features] support hugging face qwen3 dense model Aug 20, 2025
@lizexu123 lizexu123 changed the title [Features] support hugging face qwen3 dense model [Features] support hugging face qwen3 dense and qwen2 model Aug 21, 2025
@@ -0,0 +1,222 @@
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个单测文件删了,用彬涵那个单测形式,你这里直接copyV0的整个文件改,有大量重复冗余代码,之后V0上新增单测,V1感知不到

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

)
param.copy_(loaded_weight, False)

def bias_loader(self, param, loaded_weight, loaded_shard_id: Optional[str] = None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个bias 可以和上面的weight 和一起吧,只是一个2维一个 1维

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已合并

if self.nranks != 1:
hugging_face_format = self.fd_config.load_config.hugging_face_format
if hugging_face_format:
loaded_weight = loaded_weight.transpose([1, 0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bias是1维的吧

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除这里判断

if self.nranks > 1:
set_weight_attrs(self.linear.weight, {"output_dim": False})

def weight_loader(self, param, loaded_weight, loaded_shard_id: Optional[str] = None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是不是可以删了?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return fn


def slice_fn(weight_or_paramter, output_dim, start, end, step=1):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增的函数需要对所有入参增加注解,比如 output_dim 比较难理解

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self.is_unified_ckpt = check_unified_ckpt(self.model)

self.override_name_from_config()
self.read_model_config()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里model_format全部依赖自动判断了吗,如果某个模型config.json中没有torch_dtype,就无法被正确识别?也没法强制指定这个模型为torch模型?

@lizexu123 lizexu123 closed this Aug 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants