-
Notifications
You must be signed in to change notification settings - Fork 693
support qwen3moe #3084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support qwen3moe #3084
Conversation
|
Thanks for your contribution! |
1429b97 to
0a03421
Compare
| layer.weight = layer.create_parameter( | ||
| shape=layer.weight_shape, | ||
| dtype=layer.weight_dtype, | ||
| is_bias=False, | ||
| default_initializer=paddle.nn.initializer.Constant(0), | ||
| ) | ||
|
|
||
| layer.weight = layer.create_parameter( | ||
| shape=layer.weight_shape, | ||
| dtype=layer.weight_dtype, | ||
| is_bias=False, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
layer.weight怎么赋值了俩次,这里有问题吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| # if fd_config.quant_config: | ||
| # self.quant_method = fd_config.quant_config.get_quant_method(self) | ||
| # self.quant_method.create_weights(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
注释代码删掉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| # self.init_weight() | ||
|
|
||
| # def init_weight(self): | ||
| # """ | ||
| # Initialize the weights and biases. | ||
| # """ | ||
| # if self.skip_quant: | ||
| # self.weight_dtype = self._dtype | ||
|
|
||
| # self.weight = self.create_parameter( | ||
| # shape=self.weight_shape, | ||
| # dtype=self.weight_dtype, | ||
| # is_bias=False, | ||
| # default_initializer=paddle.nn.initializer.Constant(0), | ||
| # ) | ||
|
|
||
| # self.bias = None | ||
| # if self.with_bias: | ||
| # self.bias = self.create_parameter( | ||
| # shape=[self.hidden_size], | ||
| # dtype=self._dtype, | ||
| # is_bias=True, | ||
| # ) | ||
|
|
||
| # if self.nranks > 0: | ||
| # # row parallel | ||
| # _set_var_distributed(self.weight, split_axis=0) | ||
|
|
||
| # # smooth quant | ||
| # self.linear_shift = None | ||
| # self.linear_smooth = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这代码还有用吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| weight_key_map=weight_key_map, | ||
| ) | ||
|
|
||
| self.gate = ReplicatedLinear( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为何多了self.gate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
把gate 从 fusemoe里面提了出来 对齐vllm ,让linear相关的权重创建放到linear.py里面控制
| fake_hidden_states: Optional[paddle.Tensor] = None | ||
|
|
||
|
|
||
| class Ernie4_5_VMoeBlock(nn.Layer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ernie4_5_VMoeBlock ->Ernie4_5_VLMoeBlock ?
| } | ||
|
|
||
| self.fused_moe = FusedMoE( | ||
| self.expert = FusedMoE( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有的叫self.experts有的又叫self.expert ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| if self.weight_key not in state_dict: | ||
| # TODO(bukejiyu): Temporary hack for Ernie4.5-VL, remove after loader refactor | ||
| self.weight_key = self.weight_key + "_1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?为什么要写在这里?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改成可以传特殊的weight_key
| ) | ||
|
|
||
| if forward_meta.max_enc_len_this_time: | ||
| if forward_meta.max_len_tensor_cpu[1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么修改这里?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deepseek跑不通 不改成这个 ,这个我让瑞安也review下
| # mlp.gate.weight is precision-sensitive, so we cast it to float32 for computation | ||
| if layer.weight.dtype != weights.dtype: | ||
| weights = weights.cast(layer.weight.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不是在self.gate初始化的时候指定了weight_dtype了吗?这里为什么需要cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为disk weights里面还是bf16
| # mlp.gate.weight is precision-sensitive, so we cast it to float32 for computation | ||
| if param.dtype != loaded_weight.dtype: | ||
| loaded_weight = loaded_weight.cast(param.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里为什么需要cast ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为新版的 loader,普通的linear会走 这个函数来load 权重,但是gate的权重在 磁盘上是 bf16 但是 param是 float32 所以需要cast一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后面有需要再修改吧,这个pr只打算保持和develop统一
| if hasattr(layer, "nranks") and layer.nranks > 0: | ||
| split_axis = extra_weight_attrs.get("split_axis") | ||
| _set_var_distributed(layer.weight, split_axis=split_axis) | ||
| set_weight_attrs(layer.weight, {"output_dim": extra_weight_attrs.get("output_dim")}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_dim 是啥含义?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用来控制 tp切分的维度
| self.quant_method.create_weights( | ||
| self, | ||
| split_axis=1, | ||
| output_dim=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
最好加一些注释, layout是什么,split_axis 是哪个维度,output_dim是什么含义
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| self.quant_method.create_weights( | ||
| self, | ||
| split_axis=0, | ||
| output_dim=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| class UnquantizedFusedMoEMethod(MoEMethodBase): | ||
| def create_weights(self, layer: nn.Layer, **extra_weight_attrs): | ||
| from fastdeploy.platforms import current_platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个为啥在这里 import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
调整至上面,done
| self.down_proj_weight_shape = [layer.num_experts, layer.moe_intermediate_size, layer.hidden_size] | ||
| else: | ||
| return self.apply_tp(layer, x, gate_out) | ||
| self.up_gate_proj_weight_shape = [layer.num_experts, layer.moe_intermediate_size * 2, layer.hidden_size] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPU之外的硬件 weight 的 layout 都是这个嘛?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对的 目前我们适配的硬件都会走 else
| tp_size={self.tp_size}." | ||
| ) | ||
|
|
||
| def weight_loader(self, param, loaded_weight, expert_id, shard_id: Optional[str] = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
增加输入参数注解, 比如 shard_id含义是啥,另外,id通常代表 int 而不是 string 吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weight_loader里面都带了 assert shard_id in ["gate", "down", "up"] 只是和vllm叫了一个名字
| "up_gate_proj_expert_in_scale_key": f"{prefix}.experts.{{}}.up_gate_proj.activation_scale", | ||
| "down_proj_expert_in_scale_key": f"{prefix}.experts.{{}}.down_proj.activation_scale", | ||
| } | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else的moe_quant_type是什么类型,可以加下注释
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| @@ -401,834 +401,834 @@ resampler_model.mlp.weight | |||
| resampler_model.mlp.bias | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件是自动生成的,还是手动改的?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
自动生成然后手动改了下 不然rl这边没法过ci
8f2a735 to
2534428
Compare
* fix noaux_tc op * fix * update * fix qk norm * fix linear for prequant loader * test * fix * fix * rm some print * fix noaux_tc op * test * Fix the confused enable_early_stop when only set early_stop_config (#3214) * fix the confused early_stop_config when only set early_stop_config * pre-commit * write a general method * Add ci case for min token and max token (#3229) Co-authored-by: xujing43 <xujing43@baidu.com> * add some evil cases (#3240) * add repitation early stop cases * add repitation early stop cases * add bad cases * add bad cases * add evil cases * qwen3_moe (#3084) * [Feature] support seed parameter (#3161) * support seed * fix * add SamplingMetadata seed test * The next_tokens values are inconsistent! * add air and rejection seed test * fix * add SamplingParams seed test * fix seed=0 * Default to defualt * fix * fix args_utils * fix review * fix review * fix * fix * add xpu,gcu,iluvatar support seed * fix * 【Fix Bug】 修复 fa3 支持集中式bug (#3235) * fix fa3 集中式bug * 增加qknorm参数 * fix qk norm * fix * update * fix linear for prequant loader * fix * fix * rm some print * fix * fix moe init weight&scale * fix moe init weight&scale --------- Co-authored-by: bukejiyu <395822456@qq.com> Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com> Co-authored-by: Zero Rains <linjunlu@zerorains.top> Co-authored-by: xjkmfa <108254620+xjkmfa@users.noreply.github.com> Co-authored-by: xujing43 <xujing43@baidu.com> Co-authored-by: Divano <dddivano@outlook.com> Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com> Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com> Co-authored-by: yangjianfengo1 <125249383+yangjianfengo1@users.noreply.github.com> Co-authored-by: qingqing01 <dangqingqing@baidu.com>

qwen3moe support default loader v1
todo:
1.增加新版loader单测
新版loader模型测试topp=0:
旧版loader模型测试:ci通过