[v1 loader]qwen Offline fp8 #4036

bukejiyu · 2025-09-09T13:18:07Z

该pr支持通过下列命令 loading qwen系列 fp8离线量化权重 ernie fp8系列离线量化权重

llm = LLM(
    model=model_name_or_path,
    num_gpu_blocks_override=1024,
    tensor_parallel_size=1,
    load_choices="default_v1",
    use_cudagraph=False,
)
output = llm.generate(
    prompts="who are you",
    use_tqdm=True,
    sampling_params=sampling_params,
)

paddle-bot · 2025-09-09T13:18:13Z

Thanks for your contribution!

YuanRisheng · 2025-09-11T03:02:18Z

tests/cov_pytest.ini

    --ignore=tests/ce
    --ignore=tests/operators/test_fused_moe.py
    --ignore=tests/operators/test_w4afp8_gemm.py
+    --ignore=tests/model_loader/test_model_cache.py


为啥把这个单测加进去，有问题需要解决吧

YuanRisheng · 2025-09-11T03:44:10Z

fastdeploy/model_executor/layers/linear.py

                **extra_weight_attrs,
                "weight_loader": extra_weight_attrs.get("weight_loader", default_weight_loader(layer.fd_config)),
-                "model_format": extra_weight_attrs.get("model_format", ""),
+                "weight_need_transpose": extra_weight_attrs.get("model_format") == "torch",


我不建议这样改，我理解model_format和weight_need_transpose是耦合在一块儿的，未来模型格式统一了后，这里的transpose操作就不复存在了，但你这么一改，意味着这个transpose逻辑独立于模型格式，到时候后续开发者接手这段代码，格式统一后，他很难get到transpose的逻辑需要删除

但是目前 torch模型也不都是全要transpose的，比如离线量化fp8 torch权重不需要transpose，以前那个判断就不对了吧，这样改 weight_need_transpose还是和模型类型强相关，但是可以在quantmethod create_weight中根据模型和量化类型来确定需不需要transpose

yuanlehome · 2025-09-12T07:17:04Z

fastdeploy/worker/worker_process.py

    if getattr(model_config, "num_hidden_layers", None) is None:
        raise ValueError("num_hidden_layers is None")

-    quantization_config = model_config.quantization_config


看一下 #4051 里work_process.py的改动，有冲突

已经处理过了

yuanlehome · 2025-09-12T07:17:10Z

fastdeploy/model_executor/utils.py

        setattr(config_obj, config_attr_name, origin_value)
+
+
+def rename_offline_ckpt_suffix_to_fd_suffix(


为什么要有这一块代码，缘由说清楚，来龙去脉讲一下

fp8的量化后缀不同模型似乎不太一致，比如llama的fp8 的scale 叫 weight_scale vllm有在llama4.py中专门处理这个maping，qwen 的 scale 叫 weight_scale_inv，所以加了个专门处理 checkpoint 后缀到 fd 后缀的映射方法

yuanlehome · 2025-09-12T07:18:44Z

update branch一下，可能有代码冲突了

yuanlehome · 2025-09-15T03:54:44Z

fastdeploy/model_executor/layers/linear.py

+            if weight_need_transpose:
+                loaded_weight = get_tensor(loaded_weight)
+                loaded_weight = loaded_weight.transpose([1, 0])
+                param.weight_need_transpose = False


这里又置为False是有什么考虑吗

因为 fuse在磁盘中的是递归调用，会调用2次 weight_loader，这样改可以只transpose一次

bukejiyu force-pushed the offline_fp8 branch from 12e3417 to 1b47536 Compare September 9, 2025 13:28

bukejiyu requested review from YuanRisheng and yuanlehome September 9, 2025 13:32

YuanRisheng reviewed Sep 11, 2025

View reviewed changes

bukejiyu force-pushed the offline_fp8 branch from 6a08047 to f548940 Compare September 11, 2025 11:58

support offline fp8

5fbe0b0

bukejiyu force-pushed the offline_fp8 branch from f548940 to 5fbe0b0 Compare September 11, 2025 15:06

bukejiyu added 3 commits September 12, 2025 12:15

update ut

79ad3d6

update ut

6a94dff

update ut

10acd99

yuanlehome reviewed Sep 12, 2025

View reviewed changes

bukejiyu and others added 4 commits September 12, 2025 15:27

Merge branch 'develop' into offline_fp8

f7c7948

fix

b182dc4

update

f43e11c

update

bf08997

YuanRisheng added the skip-ci: coverage label Sep 15, 2025

yuanlehome reviewed Sep 15, 2025

View reviewed changes

yuanlehome approved these changes Sep 15, 2025

View reviewed changes

Jiang-Jia-Jun merged commit 29ed617 into PaddlePaddle:develop Sep 15, 2025
23 of 28 checks passed

bukejiyu mentioned this pull request Sep 15, 2025

[BugFix]Fix Ernie bf16 model loading bug and add comments #4106

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v1 loader]qwen Offline fp8 #4036

[v1 loader]qwen Offline fp8 #4036

Uh oh!

bukejiyu commented Sep 9, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Sep 9, 2025

Uh oh!

YuanRisheng Sep 11, 2025

Uh oh!

YuanRisheng Sep 11, 2025

Uh oh!

bukejiyu Sep 11, 2025

Uh oh!

yuanlehome Sep 12, 2025

Uh oh!

bukejiyu Sep 12, 2025

Uh oh!

yuanlehome Sep 12, 2025

Uh oh!

bukejiyu Sep 12, 2025

Uh oh!

yuanlehome commented Sep 12, 2025

Uh oh!

yuanlehome Sep 15, 2025

Uh oh!

bukejiyu Sep 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		setattr(config_obj, config_attr_name, origin_value)


		def rename_offline_ckpt_suffix_to_fd_suffix(

[v1 loader]qwen Offline fp8 #4036

[v1 loader]qwen Offline fp8 #4036

Uh oh!

Conversation

bukejiyu commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Sep 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuanlehome commented Sep 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bukejiyu Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bukejiyu commented Sep 9, 2025 •

edited

Loading

bukejiyu Sep 15, 2025 •

edited

Loading