Conversation
bd680b0 to
bb491df
Compare
|
PR looks good to me. I noticed that the function serialization is also part of this PR. Out of the 8% perf gain, do we have a breakdown of how much can be attributed to the function serialization and how much to the metadata optimization? Just wondering how much we are actually gaining from that part. |
|
Also a minor thing since we are timing the refit separately now. Do you mind removing this print which spams the logs for large models? |
@yfw almost all 8% perf gain is from the function serialization. pass list of keys only during refitting This change may not offer any speed optimization but will result in cleaner and more readable code. It might be reasonable to expect little improvement from changing a dictionary (key, offset pair) to a key list (requiring local offset reconstruction) during serialization. |
yuki-97
left a comment
There was a problem hiding this comment.
Thanks @ZhiyuLi-Nvidia , LGTM!
guyueh1
left a comment
There was a problem hiding this comment.
One small comment otherwise LGTM
2025a57 to
e677b9c
Compare
Done. Could you take another look @yfw? |
e677b9c to
2931ae2
Compare
Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
898903a
9f29928 to
898903a
Compare
19703f7 to
0d26765
Compare
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com> Signed-off-by: tpoisonooo <khj.application@aliyun.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com> Signed-off-by: Qidong Su <qidongs@nvidia.com>
What does this PR do ?
fix: maintain fp32 mlp.router.expert_bias even with bf16 enabled
track refitting time inside prepare_for_generation
refit metadata optimization: pass list of keys only during refitting
benchmark refit performance 8% gain and the code would be cleaner.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information