[XPU] Support W4A8C8-TP4-300B Model #4068

iosmers · 2025-09-11T11:28:03Z

本PR实现XPU适配ERNIE-4.5-300B-A47B-W4A8C8-TP4-Paddle模型
启动命令：
export XPU_VISIBLE_DEVICES="4,5,6,7"
python -m fastdeploy.entrypoints.openai.api_server --model ./PaddlePaddle/ERNIE-4.5-300B-A47B-W4A8C8-TP4-Paddle --port 8188
--tensor-parallel-size 4 --max-model-len 32768 --max-num-seqs 1 --quantization "W4A8" --gpu-memory-utilization 0.9

paddle-bot · 2025-09-11T11:28:09Z

Thanks for your contribution!

fastdeploy/model_executor/layers/moe/fused_moe_xpu_backend.py

CLAassistant · 2025-09-25T11:15:47Z

All committers have signed the CLA.

custom_ops/xpu_ops/src/ops/moe_topk_select.cc

custom_ops/xpu_ops/src/ops/weight_quantize_xpu.cc

fastdeploy/inter_communicator/engine_worker_queue.py

fastdeploy/model_executor/layers/attention/attention.py

fastdeploy/model_executor/layers/backends/xpu/moe/fused_moe.py

fastdeploy/model_executor/layers/backends/xpu/quantization/kv_cache.py

fastdeploy/model_executor/layers/quantization/kv_cache.py

fastdeploy/model_executor/layers/quantization/w4a8.py

hong19860320 · 2025-09-26T07:07:57Z

custom_ops/xpu_ops/src/ops/weight_quantize_xpu.cc

        PD_CHECK(ret == 0);
        return {out, scale};
-    } else {
+    }


这个代码风格有问题吧？

hong19860320 · 2025-09-26T07:10:49Z

fastdeploy/model_executor/layers/attention/attention.py


        if fd_config.quant_config and hasattr(fd_config.quant_config, "kv_cache_quant_type"):
            self.quant_method: QuantMethodBase = fd_config.quant_config.get_quant_method(self)
+            print(f"quant_method: {self.quant_method}")


是不是得走log 打印？

hong19860320 · 2025-09-26T07:11:40Z

fastdeploy/model_executor/layers/attention/attention.py

                default_initializer=paddle.nn.initializer.Constant(0),
            )

+    def calculate_md5(self, arr):


这个新增的接口用在哪？

hong19860320 · 2025-09-26T07:13:48Z

fastdeploy/model_executor/layers/backends/xpu/moe/fused_moe.py

        XPU compute Fused MoE.
        """
-        from fastdeploy.model_executor.ops.xpu import xpu_moe_layer
+        # from fastdeploy.model_executor.ops.xpu import xpu_moe_layer


无用的代码删掉

hong19860320 · 2025-09-26T07:39:00Z

fastdeploy/model_executor/layers/backends/xpu/quantization/kv_cache.py

+            or self.quant_type == XPUKvCacheQuantzationTypes.FP8_ZP
+            or self.quant_type == XPUKvCacheQuantzationTypes.BLOCK_WISE_FP8
+        ):
+            self.max_bound = 448.0


是不是应该报错不支持

hong19860320 · 2025-09-26T07:39:09Z

fastdeploy/model_executor/layers/backends/xpu/quantization/kv_cache.py

+        ):
+            self.max_bound = 448.0
+        elif self.quant_type == XPUKvCacheQuantzationTypes.INT4_ZP:
+            self.max_bound = 7.0


hong19860320 · 2025-09-26T07:40:39Z

fastdeploy/model_executor/layers/backends/xpu/quantization/kv_cache.py

+        cache_v_scale_tensor = get_tensor(state_dict.pop(self.cache_v_scale_name)).cast("float32").reshape_([-1])
+
+        if self.cache_quant_config.has_zero_point:  # cache_int4_zp
+            cache_k_scale = 1.0 / cache_k_scale_tensor


为什么是倒数关系？是不是加一下注释。

hong19860320 · 2025-09-26T07:44:21Z

fastdeploy/model_executor/layers/backends/xpu/quantization/kv_cache.py

+        use for loader v1
+        """
+        if layer.cache_k_scale._is_initialized():
+            layer.cache_k_out_scale.set_value(1 / layer.cache_k_scale)


加个注释说明一下吧

hong19860320

LGTM, 先合入一版，有问题在后续PR补上吧

gongshaotian

LGTM

XiaoguangHu01

LGTM

zhupengyang reviewed Sep 11, 2025

View reviewed changes

fastdeploy/model_executor/layers/moe/fused_moe_xpu_backend.py Outdated Show resolved Hide resolved

fastdeploy/model_executor/layers/moe/fused_moe_xpu_backend.py Outdated Show resolved Hide resolved

iosmers force-pushed the new_support_w4a8 branch from 027483b to aeb4eb7 Compare September 26, 2025 04:36

support w4a8

dd28881

iosmers force-pushed the new_support_w4a8 branch from aeb4eb7 to dd28881 Compare September 26, 2025 04:47

iosmers added 2 commits September 26, 2025 04:48

delete ep block attn

b1102cb

delete moe_topk_select

01908db

iosmers force-pushed the new_support_w4a8 branch from d5ce629 to 01908db Compare September 26, 2025 06:32

iosmers changed the title ~~xpu support w4a8~~ [XPU] Support W4A8C8-TP4-300B Model Sep 26, 2025

update note

3f64940

iosmers force-pushed the new_support_w4a8 branch from b4fc62a to 3f64940 Compare September 26, 2025 07:18

iosmers added 2 commits September 26, 2025 07:19

update

46958ec

fix conflict

0505c07

zhupengyang reviewed Sep 26, 2025

View reviewed changes

hong19860320 reviewed Sep 26, 2025

View reviewed changes

iosmers added 5 commits September 28, 2025 08:47

delte useless info

de6c2fc

update

3596071

add some note

81a3130

fix some format

3fd9241

update scale info

b1d48f9

hong19860320 previously approved these changes Sep 28, 2025

View reviewed changes

add ans baseline

64c73ef

iosmers dismissed hong19860320’s stale review via 64c73ef October 9, 2025 08:05

EmmonsCurse and others added 2 commits October 9, 2025 20:42

Merge branch 'develop' into new_support_w4a8

3d4c9b7

Merge branch 'develop' into new_support_w4a8

105ac8d

EmmonsCurse approved these changes Oct 10, 2025

View reviewed changes

DDDivano approved these changes Oct 10, 2025

View reviewed changes

qingqing01 approved these changes Oct 10, 2025

View reviewed changes

gongshaotian approved these changes Oct 10, 2025

View reviewed changes

hong19860320 approved these changes Oct 10, 2025

View reviewed changes

XiaoguangHu01 approved these changes Oct 10, 2025

View reviewed changes

EmmonsCurse merged commit 20c7b74 into PaddlePaddle:develop Oct 10, 2025
31 of 45 checks passed

[XPU] Support W4A8C8-TP4-300B Model #4068

[XPU] Support W4A8C8-TP4-300B Model #4068

Uh oh!

Conversation

iosmers commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

CLAassistant commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hong19860320 left a comment

Choose a reason for hiding this comment

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

iosmers commented Sep 11, 2025 •

edited

Loading

CLAassistant commented Sep 25, 2025 •

edited

Loading