Skip to content

Conversation

@bukejiyu
Copy link
Collaborator

@bukejiyu bukejiyu commented Jul 30, 2025

qwen3moe support default loader v1
todo:
1.增加新版loader单测
新版loader模型测试topp=0:

  • qwen3moe 逐token对齐

旧版loader模型测试:ci通过

@paddle-bot
Copy link

paddle-bot bot commented Jul 30, 2025

Thanks for your contribution!

@bukejiyu bukejiyu force-pushed the refactor_moe branch 4 times, most recently from 1429b97 to 0a03421 Compare August 1, 2025 07:31
@bukejiyu bukejiyu changed the title qwen3moe support qwen3moe Aug 1, 2025
Comment on lines 49 to 60
layer.weight = layer.create_parameter(
shape=layer.weight_shape,
dtype=layer.weight_dtype,
is_bias=False,
default_initializer=paddle.nn.initializer.Constant(0),
)

layer.weight = layer.create_parameter(
shape=layer.weight_shape,
dtype=layer.weight_dtype,
is_bias=False,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

layer.weight怎么赋值了俩次,这里有问题吧

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 655 to 657
# if fd_config.quant_config:
# self.quant_method = fd_config.quant_config.get_quant_method(self)
# self.quant_method.create_weights(self)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注释代码删掉

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 677 to 707
# self.init_weight()

# def init_weight(self):
# """
# Initialize the weights and biases.
# """
# if self.skip_quant:
# self.weight_dtype = self._dtype

# self.weight = self.create_parameter(
# shape=self.weight_shape,
# dtype=self.weight_dtype,
# is_bias=False,
# default_initializer=paddle.nn.initializer.Constant(0),
# )

# self.bias = None
# if self.with_bias:
# self.bias = self.create_parameter(
# shape=[self.hidden_size],
# dtype=self._dtype,
# is_bias=True,
# )

# if self.nranks > 0:
# # row parallel
# _set_var_distributed(self.weight, split_axis=0)

# # smooth quant
# self.linear_shift = None
# self.linear_smooth = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这代码还有用吗

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

weight_key_map=weight_key_map,
)

self.gate = ReplicatedLinear(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为何多了self.gate

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

把gate 从 fusemoe里面提了出来 对齐vllm ,让linear相关的权重创建放到linear.py里面控制

fake_hidden_states: Optional[paddle.Tensor] = None


class Ernie4_5_VMoeBlock(nn.Layer):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ernie4_5_VMoeBlock ->Ernie4_5_VLMoeBlock ?

}

self.fused_moe = FusedMoE(
self.expert = FusedMoE(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有的叫self.experts有的又叫self.expert ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 176 to 178
if self.weight_key not in state_dict:
# TODO(bukejiyu): Temporary hack for Ernie4.5-VL, remove after loader refactor
self.weight_key = self.weight_key + "_1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?为什么要写在这里?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成可以传特殊的weight_key

)

if forward_meta.max_enc_len_this_time:
if forward_meta.max_len_tensor_cpu[1]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么修改这里?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deepseek跑不通 不改成这个 ,这个我让瑞安也review下

Comment on lines +55 to +62
# mlp.gate.weight is precision-sensitive, so we cast it to float32 for computation
if layer.weight.dtype != weights.dtype:
weights = weights.cast(layer.weight.dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不是在self.gate初始化的时候指定了weight_dtype了吗?这里为什么需要cast?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为disk weights里面还是bf16

Comment on lines +77 to +79
# mlp.gate.weight is precision-sensitive, so we cast it to float32 for computation
if param.dtype != loaded_weight.dtype:
loaded_weight = loaded_weight.cast(param.dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么需要cast ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为新版的 loader,普通的linear会走 这个函数来load 权重,但是gate的权重在 磁盘上是 bf16 但是 param是 float32 所以需要cast一下

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

磁盘上不是fp32类型吗?
image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后面有需要再修改吧,这个pr只打算保持和develop统一

if hasattr(layer, "nranks") and layer.nranks > 0:
split_axis = extra_weight_attrs.get("split_axis")
_set_var_distributed(layer.weight, split_axis=split_axis)
set_weight_attrs(layer.weight, {"output_dim": extra_weight_attrs.get("output_dim")})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output_dim 是啥含义?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用来控制 tp切分的维度

self.quant_method.create_weights(
self,
split_axis=1,
output_dim=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最好加一些注释, layout是什么,split_axis 是哪个维度,output_dim是什么含义

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self.quant_method.create_weights(
self,
split_axis=0,
output_dim=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


class UnquantizedFusedMoEMethod(MoEMethodBase):
def create_weights(self, layer: nn.Layer, **extra_weight_attrs):
from fastdeploy.platforms import current_platform
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个为啥在这里 import?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

调整至上面,done

self.down_proj_weight_shape = [layer.num_experts, layer.moe_intermediate_size, layer.hidden_size]
else:
return self.apply_tp(layer, x, gate_out)
self.up_gate_proj_weight_shape = [layer.num_experts, layer.moe_intermediate_size * 2, layer.hidden_size]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPU之外的硬件 weight 的 layout 都是这个嘛?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对的 目前我们适配的硬件都会走 else

tp_size={self.tp_size}."
)

def weight_loader(self, param, loaded_weight, expert_id, shard_id: Optional[str] = None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

增加输入参数注解, 比如 shard_id含义是啥,另外,id通常代表 int 而不是 string 吧

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weight_loader里面都带了 assert shard_id in ["gate", "down", "up"] 只是和vllm叫了一个名字

"up_gate_proj_expert_in_scale_key": f"{prefix}.experts.{{}}.up_gate_proj.activation_scale",
"down_proj_expert_in_scale_key": f"{prefix}.experts.{{}}.down_proj.activation_scale",
}
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else的moe_quant_type是什么类型,可以加下注释

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -401,834 +401,834 @@ resampler_model.mlp.weight
resampler_model.mlp.bias
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件是自动生成的,还是手动改的?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

自动生成然后手动改了下 不然rl这边没法过ci

@bukejiyu bukejiyu force-pushed the refactor_moe branch 2 times, most recently from 8f2a735 to 2534428 Compare August 5, 2025 17:25
@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 20839ab into PaddlePaddle:develop Aug 6, 2025
16 of 21 checks passed
gzy19990617 pushed a commit to gzy19990617/FastDeploy that referenced this pull request Aug 7, 2025
Jiang-Jia-Jun pushed a commit that referenced this pull request Aug 8, 2025
* fix noaux_tc op

* fix

* update

* fix qk norm

* fix linear for prequant loader

* test

* fix

* fix

* rm some print

* fix noaux_tc op

* test

* Fix the confused enable_early_stop when only set early_stop_config (#3214)

* fix the confused early_stop_config when only set early_stop_config

* pre-commit

* write a general method

* Add ci case for min token and max token (#3229)

Co-authored-by: xujing43 <xujing43@baidu.com>

* add some evil cases (#3240)

* add repitation early stop cases

* add repitation early stop cases

* add bad cases

* add bad cases

* add evil cases

* qwen3_moe (#3084)

* [Feature] support seed parameter (#3161)

* support seed

* fix

* add SamplingMetadata seed test

* The next_tokens values are inconsistent!

* add air and rejection seed test

* fix

* add SamplingParams seed test

* fix seed=0

* Default to defualt

* fix

* fix args_utils

* fix review

* fix review

* fix

* fix

* add xpu,gcu,iluvatar support seed

* fix

* 【Fix Bug】 修复 fa3 支持集中式bug (#3235)

* fix fa3 集中式bug

* 增加qknorm参数

* fix qk norm

* fix

* update

* fix linear for prequant loader

* fix

* fix

* rm some print

* fix

* fix moe init weight&scale

* fix moe init weight&scale

---------

Co-authored-by: bukejiyu <395822456@qq.com>
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
Co-authored-by: Zero Rains <linjunlu@zerorains.top>
Co-authored-by: xjkmfa <108254620+xjkmfa@users.noreply.github.com>
Co-authored-by: xujing43 <xujing43@baidu.com>
Co-authored-by: Divano <dddivano@outlook.com>
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com>
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com>
Co-authored-by: yangjianfengo1 <125249383+yangjianfengo1@users.noreply.github.com>
Co-authored-by: qingqing01 <dangqingqing@baidu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants