-
Notifications
You must be signed in to change notification settings - Fork 693
[V1 Loader] support weight_only #3413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
9d2f94f to
742318c
Compare
8daa99b to
a1333a8
Compare
fastdeploy/worker/worker_process.py
Outdated
| elif args.quantization != "None": | ||
| quantization_config = {} | ||
| if load_config.load_choices == LoadChoices.DEFAULT_V1: | ||
| quantization_config["is_dyn_quant"] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个一直设为True合理吗,因为离线量化也有quantization_config,这个区分可能只能在quant_config内部来区分?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
离线量化走不进 args.quantization != "None" 这个分支,而且量化相关的权重创建也都在 quantmethod,几乎只能拿到quantconfig了吧,不好在别的地方来区分
| layer.up_gate_proj_weight, | ||
| layer.down_proj_weight, | ||
| getattr(layer, self.added_weight_attrs[0]), | ||
| getattr(layer, self.added_weight_attrs[1]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里希望保持原样,更清晰
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
但是量化前后名字不同
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有可能会不一样因为 w4a8也继承这个方法 w4a8目前还没确定咋弄
| self.ffn1_scale_shape = [layer.num_local_experts, layer.moe_intermediate_size * 2] | ||
| self.ffn2_scale_shape = [layer.num_local_experts, layer.hidden_size] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
所有地方不要出现ffn1/ffn2字样,就用up_gate/down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| is_channel_wise: bool = False, | ||
| has_zero_point: bool = False, | ||
| is_permuted: bool = True, | ||
| is_dyn_quant: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在这个pr中去掉所有is_dyn_quant这个变量
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
但是没法区分 创建的权重是啥 这个只能改名字
| layer.weight.value().get_tensor()._clear() | ||
| del layer.weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两行封装成一个utils函数调用
| if ( | ||
| current_platform.is_cuda() | ||
| or current_platform.is_xpu() | ||
| or current_platform.is_iluvatar() | ||
| or current_platform.is_gcu() | ||
| or current_platform.is_dcu() | ||
| or current_platform.is_maca() | ||
| ): | ||
| self.forward = self.forward_cuda | ||
| else: | ||
| raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
即然都一样,为何要多此一举呢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| if w.dtype != self.weight_dtype: | ||
| w = w.cast(self.weight_dtype) | ||
|
|
||
| def weight_loader(self, param, loaded_weight, loaded_shard_id: Optional[str] = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个weight_loader为什么删掉?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KVBatchLinear有自己的weight_loader会有什么问题?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| model_sublayer_name = re.sub(r"\.(up_gate_proj_weight|down_proj_weight|weight)$", "", model_param_name) | ||
| if "kv_b_proj" in model_sublayer_name: | ||
| kv_model_sublayer_name = model_sublayer_name.replace("kv_b_proj", "kv_b_proj_bmm") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一块为何key对不上?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if fd_config.model_config.moe_use_aux_free: | ||
| self.e_score_correction_bias = self.create_parameter( | ||
| shape=[1, fd_config.model_config.moe_num_experts], | ||
| dtype="float32", | ||
| default_initializer=paddle.nn.initializer.Constant(0), | ||
| ) | ||
| else: | ||
| self.e_score_correction_bias = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这么改完了,e_score_correction_bias的key变了吧?RL的names_mapping会有问题?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我改回去了 这部分改回去 maping影响不大
4c89684 to
2429cbe
Compare
| self.infer_to_train_mapping[ | ||
| f"{base_name}.{layer_idx}.mlp.{moe_tag}_fused_moe.experts.gate_correction_bias" | ||
| ] = f"{base_name}.{layer_idx}.mlp.moe_statics.e_score_correction_bias" | ||
| self.infer_to_train_mapping[f"{base_name}.{layer_idx}.mlp.gate_correction_bias"] = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里bias不区分文本和视觉了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
区分,但是load的时候按 fuse在一起的load,组网自己会把bias paramter分割。就是RL的fuse在一起的权重set到这里就行
| **extra_weight_attrs, | ||
| "tensor_track": TensorTracker(shape=layer.down_proj_weight.shape, output_dim=False), | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_dim 这些标记,TP 并行、EP 并行都支持吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ep磁盘权重用不到 output_dim这个属性,tp支持
| @@ -0,0 +1,89 @@ | |||
| # Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
以 test_开头
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conftest.py 是管理 pytest 配置和 fixture 的不是测试文件,conftest.py会被pytest自动识别
| 2, | ||
| 1024, | ||
| marks=[pytest.mark.core_model], | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
辛苦在这里加个"Qwen2-7B-Instruct"的模型,把#3502 这个PR的内容一起测试吧~
| all_param_mapping = general_params_mapping + text_expert_params_mapping + image_expert_params_mapping | ||
|
|
||
| params_dict = dict(self.named_parameters()) | ||
| after_loading_fn = process_weights_after_loading(dict(self.named_sublayers())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
process_weights_after_loading -> process_weights_after_loading_fn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
54c7008 to
0e9108b
Compare
cb53945 to
5224b91
Compare


依赖 paddle develop 2025.8.17及以后版本
支持动态量化loader
可以观察到 moe系列模型 load性能有 30%左右的提升
模型支持[qwen3/qwen3moe/deepseekv3/ernie text/ernie vl]