doc(kernels): update kernels integration documentation#42277
doc(kernels): update kernels integration documentation#42277mfuntowicz wants to merge 5 commits intomainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
LysandreJik
left a comment
There was a problem hiding this comment.
It's a good start! I'm inviting @MekkCyber and @danieldk to also contribute here wrt the overall integration
Small additional comment: imo we should try to have a good, coherent doc here, and cross link it everywhere else where it makes sense (guides/docs like performance, optimization, etc)
MekkCyber
left a comment
There was a problem hiding this comment.
Thanks a lot @mfuntowicz ! This was very much needed
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
stevhliu
left a comment
There was a problem hiding this comment.
Thanks for kicking this off, I think we can polish this a bit further!
- the Key Benefits and Supported Operations list interrupts the narrative flow and gets in the way of the learning path. i think this may be better at the end in a Reference section or even a link to where users can find all the supported ops (may not scale well when the list grows)
- it'd be nice to integrate Key Benefits into the intro paragraph of the doc to emphasize its benefits, instead of a list
- it would also make more sense to move the Requirements higher up so users know upfront
- i think it'd flow better if we string the Advanced Features together with the content in the Quick Start. currently it feels a bit disjointed and doesn't really build off of or connect to what comes before
- may be useful to include an example that shows you how to find out which kernels are loaded
- maybe organize it like this:
# Kernels
intro paragraph about what it is and the key benefits
requirements and installation
## Enabling kernels
different ways of specifying kernels in `from_pretrained`
`use_kernels=True`
`attn_implementation` for attention kernels
`KernelConfig` for device-specific kernels
## Mode configuration
switching between inference, training, and torch.compile kernels
## Automatic kernel replacement
explanation about how it works
MekkCyber
left a comment
There was a problem hiding this comment.
Thanks ! looks good to me
stevhliu
left a comment
There was a problem hiding this comment.
Thanks! I recommend simplifying some of the sections (combining them or deleting them) to avoid having too many headers in the right sidebar as it can be a bit overwhelming. I don't think we have to divide the doc up so granularly.
I also added an example of how to explicitly set Mode.INFERENCE with kernelize, but I'm not 100% sure this is the correct way to do it. 😅 Would appreciate a look over that as well!
Also, remember to add it to the toctree!
| # Hugging Face Transformers Kernels | ||
|
|
||
| Kernels in transformers are used to optimize the performance of models with custom layers from the hub and very low effort. | ||
| The Transformers Kernels integration provides high-performance, optimized kernel implementations for common transformer operations. By enabling kernels with a single `use_kernels=True` flag, you can achieve significant speedups for model inference and training with minimal code changes. The system leverages specialized CUDA, Triton, ROCm, Metal, and XPU kernels distributed through the Hugging Face Hub, automatically replacing standard PyTorch operations while maintaining full compatibility. Kernels are mode-aware (automatically switching between training and inference optimizations), support multiple hardware backends (NVIDIA, AMD, Apple Silicon, Intel), and are fully customizable via `KernelConfig` for advanced use cases. |
There was a problem hiding this comment.
| The Transformers Kernels integration provides high-performance, optimized kernel implementations for common transformer operations. By enabling kernels with a single `use_kernels=True` flag, you can achieve significant speedups for model inference and training with minimal code changes. The system leverages specialized CUDA, Triton, ROCm, Metal, and XPU kernels distributed through the Hugging Face Hub, automatically replacing standard PyTorch operations while maintaining full compatibility. Kernels are mode-aware (automatically switching between training and inference optimizations), support multiple hardware backends (NVIDIA, AMD, Apple Silicon, Intel), and are fully customizable via `KernelConfig` for advanced use cases. | |
| The [Kernels](https://huggingface.co/docs/kernels/en/index) integration pulls specialized kernels distributed on the [Hub](https://huggingface.co/kernels-community) into Transformers to speed up inference and training. The kernels automatically replace standard PyTorch operations, switch between training and inference optimizations, and are configurable. | |
| Multiple hardware backends are supported as shown in the table below. Individual kernels may have specific requirements. Consult the kernel repository documentation for more detailed compatibility information. | |
| | hardware platform | devices | | |
| |---|---| | |
| | NVIDIA (CUDA) | modern architectures with compute capability 7.0+ (Volta, Turing, Ampere, Hopper, Blackwell) | | |
| | AMD (ROCm) | ROCm devices | | |
| | Apple Silicon (Metal) | M-series chips (M1, M2, M3, M4 and newer) | | |
| | Intel GPUs (XPU) | Intel Data Center CPU Max Series and compatible devices | | |
| > [!TIP] | |
| > Check out the Kernel [Hub](https://huggingface.co/kernels-community) for popular kernel implementations like FlashAttention, MegaBlocks, Liger, and more. |
| Kernels in transformers are used to optimize the performance of models with custom layers from the hub and very low effort. | ||
| The Transformers Kernels integration provides high-performance, optimized kernel implementations for common transformer operations. By enabling kernels with a single `use_kernels=True` flag, you can achieve significant speedups for model inference and training with minimal code changes. The system leverages specialized CUDA, Triton, ROCm, Metal, and XPU kernels distributed through the Hugging Face Hub, automatically replacing standard PyTorch operations while maintaining full compatibility. Kernels are mode-aware (automatically switching between training and inference optimizations), support multiple hardware backends (NVIDIA, AMD, Apple Silicon, Intel), and are fully customizable via `KernelConfig` for advanced use cases. | ||
|
|
||
| For more information on optimizing transformer performance, see the [Performance and Optimization guide](../performance). |
There was a problem hiding this comment.
I didn't see a Performance and Optimization guide anywhere
| For more information on optimizing transformer performance, see the [Performance and Optimization guide](../performance). |
| ## Requirements | ||
|
|
||
| ### Software Dependencies | ||
|
|
||
| Install the `kernels` package to enable this feature: | ||
|
|
||
| ```bash | ||
| pip install kernels>=0.9.0 | ||
| ``` | ||
|
|
||
| Upgrade to the latest version for the newest features and hardware support: | ||
|
|
||
| ```bash | ||
| pip install --upgrade kernels | ||
| ``` | ||
|
|
||
| Please note the 0.9.0 is the minimum version required for this feature. | ||
| We do recommend using the most recent version to get the best performances and bug fixes. |
There was a problem hiding this comment.
| ## Requirements | |
| ### Software Dependencies | |
| Install the `kernels` package to enable this feature: | |
| ```bash | |
| pip install kernels>=0.9.0 | |
| ``` | |
| Upgrade to the latest version for the newest features and hardware support: | |
| ```bash | |
| pip install --upgrade kernels | |
| ``` | |
| Please note the 0.9.0 is the minimum version required for this feature. | |
| We do recommend using the most recent version to get the best performances and bug fixes. | |
| Install version 0.9.0 to enable the kernel integration in Transformers. | |
| \```bash | |
| pip install kernels>=0.9.0 | |
| \``` | |
| We recommend upgrading to the latest version for the newest features and hardware support. | |
| \```bash | |
| pip install --upgrade kernels | |
| \``` |
| ### Hardware Compatibility | ||
|
|
||
| Kernels support multiple hardware platforms with varying levels of optimization: | ||
|
|
||
| - **NVIDIA GPUs (CUDA)**: Modern architectures with compute capability 7.0+ (Volta, Turing, Ampere, Hopper, Blackwell) | ||
| - **AMD GPUs (ROCm)**: Compatible with ROCm-supported devices | ||
| - **Apple Silicon (Metal)**: M-series chips (M1, M2, M3, M4 and newer) | ||
| - **Intel GPUs (XPU)**: Intel Data Center GPU Max Series and compatible devices | ||
|
|
||
| Individual kernel implementations may have specific requirements. Consult the kernel repository documentation for detailed compatibility information. | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ### Basic Usage | ||
|
|
||
| Let `kernels` pull and replace supported operations with optimized kernels when loading any model: |
There was a problem hiding this comment.
| ### Hardware Compatibility | |
| Kernels support multiple hardware platforms with varying levels of optimization: | |
| - **NVIDIA GPUs (CUDA)**: Modern architectures with compute capability 7.0+ (Volta, Turing, Ampere, Hopper, Blackwell) | |
| - **AMD GPUs (ROCm)**: Compatible with ROCm-supported devices | |
| - **Apple Silicon (Metal)**: M-series chips (M1, M2, M3, M4 and newer) | |
| - **Intel GPUs (XPU)**: Intel Data Center GPU Max Series and compatible devices | |
| Individual kernel implementations may have specific requirements. Consult the kernel repository documentation for detailed compatibility information. | |
| ## Quick Start | |
| ### Basic Usage | |
| Let `kernels` pull and replace supported operations with optimized kernels when loading any model: | |
| ## Loading kernels | |
| Set `use_kernels=True` in [`~PreTrainedModel.from_pretrained`] to pull and replace supported operations with kernels. |
| - **Backward Compatibility**: Models work identically with or without kernels enabled | ||
| - **Dynamic Replacement**: Kernel replacement happens at model load time and persists for the model's lifetime | ||
|
|
||
| ## Troubleshooting |
There was a problem hiding this comment.
| ## Troubleshooting | |
| ## Troubleshooting | |
| Kernel integration depends on hardware, drivers, and package versions working together. Refer to the sections below to troubleshoot common failures. |
| ### Installation Issues | ||
|
|
||
| If you encounter import errors: |
There was a problem hiding this comment.
| ### Installation Issues | |
| If you encounter import errors: | |
| ### Installation issues | |
| If you encounter import errors, make sure the kernels library is installed. |
| pip install kernels | ||
| ``` | ||
|
|
||
| ### Kernel Loading Failures |
There was a problem hiding this comment.
| ### Kernel Loading Failures | |
| ### Kernel loading failures |
| - Verify your CUDA/ROCm/Metal drivers are up to date | ||
| - Consult the kernel repository documentation for known issues | ||
|
|
||
| ### Device Compatibility |
There was a problem hiding this comment.
| ### Device Compatibility | |
| ### Device compatibility |
| ## Additional Resources | ||
|
|
||
| - **Kernels Library**: [github.com/huggingface/kernels](https://github.com/huggingface/kernels) - Core kernels implementation and kernel builder tools | ||
| - **Community Kernels**: [huggingface.co/kernels-community](https://huggingface.co/kernels-community) - Browse available kernel implementations | ||
| - **Performance Guide**: See the [Performance and Optimization documentation](../performance) for comprehensive optimization strategies | ||
| - **API Reference**: Detailed `KernelConfig` documentation for advanced configuration options No newline at end of file |
There was a problem hiding this comment.
| ## Additional Resources | |
| - **Kernels Library**: [github.com/huggingface/kernels](https://github.com/huggingface/kernels) - Core kernels implementation and kernel builder tools | |
| - **Community Kernels**: [huggingface.co/kernels-community](https://huggingface.co/kernels-community) - Browse available kernel implementations | |
| - **Performance Guide**: See the [Performance and Optimization documentation](../performance) for comprehensive optimization strategies | |
| - **API Reference**: Detailed `KernelConfig` documentation for advanced configuration options | |
| ## Resources | |
| Visit the [kernels](https://github.com/huggingface/kernels) library for core kernel implementations and kernel builder tools. |
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
|
Feel free to ping me again for another review whenever you're ready! |
Add some more content to the kernels integration in Transformers.