[dump] support npu fusion patch#39238
[dump] support npu fusion patch#39238zheliuyu wants to merge 2 commits intohuggingface:mainfrom zheliuyu:main
Conversation
ArthurZucker
left a comment
There was a problem hiding this comment.
I recommend using somehting like #36853 ! We can add documentation about this if you want!
Do you mean loading the accelerated APIs of npu through |
|
yes, via the |
Thanks for your suggestion.
Please consider these opinions. Look forward to your reply. |
[2025.07.16] Experiment: Test the time-cost statistics after adding different npu fusion kernels.Experimental designStart an SFT task through verl's run_qwen2_5_05b_sft_peft_sp2_npu.sh. This task uses the Result
illustrate:
For the mean, rms norm can be increased by ~5.49%. silu can be increased by ~0.72%. The two patches are enabled at the same time to increase by ~6.21%. |
Please give some suggestions to me for the modification plan of this part. :) thanks ssssso much. @ArthurZucker @FightingZhen |
ArthurZucker
left a comment
There was a problem hiding this comment.
Hey!
Thanks for the feedbacks!
1.Some users may not be able to access huggingface-hub, if npu fusion kernels are obtained through _KERNEL_MAPPING.
2._KERNEL_MAPPING contains many acceleration modules for GPU, and the addition of acceleration modules for third party devices has disrupted its original architecture. I am worried that it will make _KERNEL_MAPPING increasingly complex.
regarding your comments, we want to make sure that both points are adressed!
So:
- Let's isolate the kernels and make sure we register them in
_KERNEL_MAPPINGusingnpuas the device - Let's maybe think of a better API / Design! But the goal is to have many kernel mappings, which will be good defaults! And allow users to register their own mapping!
In a way, even if it is not via kernels I want to make sure we set a good precedent! and the current PR does not really scale well with new models, and the rest of our code!
I agree with your viewpoint. The releases of transformers v0.45.0 gave me some inspiration, and I am currently refactoring this PR. |
|
Nice! Eager to see 🤗 |
|
Hey @zheliuyu any follow up here? Seems like the community is intereseted! |
Progress was affected by some other tasks. Let's restart with this pr. huggingface/kernels#146 \(^▽^)/ |


What does this PR do?
An attempt for #39105
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
WIP