-
Notifications
You must be signed in to change notification settings - Fork 690
[v1 loader]qwen Offline fp8 #4036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
12e3417 to
1b47536
Compare
tests/cov_pytest.ini
Outdated
| --ignore=tests/ce | ||
| --ignore=tests/operators/test_fused_moe.py | ||
| --ignore=tests/operators/test_w4afp8_gemm.py | ||
| --ignore=tests/model_loader/test_model_cache.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为啥把这个单测加进去,有问题需要解决吧
| **extra_weight_attrs, | ||
| "weight_loader": extra_weight_attrs.get("weight_loader", default_weight_loader(layer.fd_config)), | ||
| "model_format": extra_weight_attrs.get("model_format", ""), | ||
| "weight_need_transpose": extra_weight_attrs.get("model_format") == "torch", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我不建议这样改,我理解model_format和weight_need_transpose是耦合在一块儿的,未来模型格式统一了后,这里的transpose操作就不复存在了,但你这么一改,意味着这个transpose逻辑独立于模型格式,到时候后续开发者接手这段代码,格式统一后,他很难get到transpose的逻辑需要删除
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
但是目前 torch模型也不都是全要transpose的,比如 离线量化fp8 torch权重不需要transpose,以前那个判断就不对了吧,这样改 weight_need_transpose还是和 模型类型强相关,但是可以在quantmethod create_weight中 根据模型和量化类型来确定 需不需要transpose
6a08047 to
f548940
Compare
f548940 to
5fbe0b0
Compare
| if getattr(model_config, "num_hidden_layers", None) is None: | ||
| raise ValueError("num_hidden_layers is None") | ||
|
|
||
| quantization_config = model_config.quantization_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看一下 #4051 里work_process.py的改动,有冲突
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经处理过了
| setattr(config_obj, config_attr_name, origin_value) | ||
|
|
||
|
|
||
| def rename_offline_ckpt_suffix_to_fd_suffix( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么要有这一块代码,缘由说清楚,来龙去脉讲一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fp8的 量化后缀不同模型似乎不太一致, 比如llama的fp8 的scale 叫 weight_scale vllm有在llama4.py中专门处理这个maping,qwen 的 scale 叫 weight_scale_inv, 所以加了个专门处理 checkpoint 后缀到 fd 后缀的映射方法
|
update branch一下,可能有代码冲突了 |
| if weight_need_transpose: | ||
| loaded_weight = get_tensor(loaded_weight) | ||
| loaded_weight = loaded_weight.transpose([1, 0]) | ||
| param.weight_need_transpose = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里又置为False是有什么考虑吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为 fuse在磁盘中的 是递归调用,会调用2次 weight_loader,这样改可以只transpose一次
该pr支持通过下列命令 loading qwen系列 fp8离线量化权重 ernie fp8系列离线量化权重