Skip to content

doc(kernels): update kernels integration documentation#42277

Open
mfuntowicz wants to merge 5 commits intomainfrom
doc-kernels
Open

doc(kernels): update kernels integration documentation#42277
mfuntowicz wants to merge 5 commits intomainfrom
doc-kernels

Conversation

@mfuntowicz
Copy link
Copy Markdown
Member

Add some more content to the kernels integration in Transformers.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good start! I'm inviting @MekkCyber and @danieldk to also contribute here wrt the overall integration

Small additional comment: imo we should try to have a good, coherent doc here, and cross link it everywhere else where it makes sense (guides/docs like performance, optimization, etc)

Comment thread docs/source/en/kernel_doc/overview.md Outdated
Comment thread docs/source/en/kernel_doc/overview.md Outdated
Comment thread docs/source/en/kernel_doc/overview.md Outdated
Comment thread docs/source/en/kernel_doc/overview.md Outdated
Copy link
Copy Markdown
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @mfuntowicz ! This was very much needed

Comment thread docs/source/en/kernel_doc/overview.md Outdated
Comment thread docs/source/en/kernel_doc/overview.md Outdated
Comment thread docs/source/en/kernel_doc/overview.md Outdated
Comment thread docs/source/en/kernel_doc/overview.md Outdated
Comment thread docs/source/en/kernel_doc/overview.md Outdated
Comment thread docs/source/en/kernel_doc/overview.md Outdated
Comment thread docs/source/en/kernel_doc/overview.md
Comment thread docs/source/en/kernel_doc/overview.md Outdated
mfuntowicz and others added 2 commits November 19, 2025 11:13
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Copy link
Copy Markdown
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for kicking this off, I think we can polish this a bit further!

  • the Key Benefits and Supported Operations list interrupts the narrative flow and gets in the way of the learning path. i think this may be better at the end in a Reference section or even a link to where users can find all the supported ops (may not scale well when the list grows)
  • it'd be nice to integrate Key Benefits into the intro paragraph of the doc to emphasize its benefits, instead of a list
  • it would also make more sense to move the Requirements higher up so users know upfront
  • i think it'd flow better if we string the Advanced Features together with the content in the Quick Start. currently it feels a bit disjointed and doesn't really build off of or connect to what comes before
  • may be useful to include an example that shows you how to find out which kernels are loaded
  • maybe organize it like this:
# Kernels

intro paragraph about what it is and the key benefits
requirements and installation

## Enabling kernels
different ways of specifying kernels in `from_pretrained`
`use_kernels=True`
`attn_implementation` for attention kernels
`KernelConfig` for device-specific kernels

## Mode configuration
switching between inference, training, and torch.compile kernels

## Automatic kernel replacement
explanation about how it works

Comment thread docs/source/en/kernel_doc/overview.md Outdated
Comment thread docs/source/en/kernel_doc/overview.md Outdated
Comment thread docs/source/en/kernel_doc/overview.md Outdated
Comment thread docs/source/en/kernel_doc/overview.md
@MekkCyber MekkCyber requested a review from stevhliu December 10, 2025 08:52
Copy link
Copy Markdown
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! looks good to me

Copy link
Copy Markdown
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I recommend simplifying some of the sections (combining them or deleting them) to avoid having too many headers in the right sidebar as it can be a bit overwhelming. I don't think we have to divide the doc up so granularly.

I also added an example of how to explicitly set Mode.INFERENCE with kernelize, but I'm not 100% sure this is the correct way to do it. 😅 Would appreciate a look over that as well!

Also, remember to add it to the toctree!

Comment thread docs/source/en/kernel_doc/overview.md Outdated
# Hugging Face Transformers Kernels

Kernels in transformers are used to optimize the performance of models with custom layers from the hub and very low effort.
The Transformers Kernels integration provides high-performance, optimized kernel implementations for common transformer operations. By enabling kernels with a single `use_kernels=True` flag, you can achieve significant speedups for model inference and training with minimal code changes. The system leverages specialized CUDA, Triton, ROCm, Metal, and XPU kernels distributed through the Hugging Face Hub, automatically replacing standard PyTorch operations while maintaining full compatibility. Kernels are mode-aware (automatically switching between training and inference optimizations), support multiple hardware backends (NVIDIA, AMD, Apple Silicon, Intel), and are fully customizable via `KernelConfig` for advanced use cases.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Transformers Kernels integration provides high-performance, optimized kernel implementations for common transformer operations. By enabling kernels with a single `use_kernels=True` flag, you can achieve significant speedups for model inference and training with minimal code changes. The system leverages specialized CUDA, Triton, ROCm, Metal, and XPU kernels distributed through the Hugging Face Hub, automatically replacing standard PyTorch operations while maintaining full compatibility. Kernels are mode-aware (automatically switching between training and inference optimizations), support multiple hardware backends (NVIDIA, AMD, Apple Silicon, Intel), and are fully customizable via `KernelConfig` for advanced use cases.
The [Kernels](https://huggingface.co/docs/kernels/en/index) integration pulls specialized kernels distributed on the [Hub](https://huggingface.co/kernels-community) into Transformers to speed up inference and training. The kernels automatically replace standard PyTorch operations, switch between training and inference optimizations, and are configurable.
Multiple hardware backends are supported as shown in the table below. Individual kernels may have specific requirements. Consult the kernel repository documentation for more detailed compatibility information.
| hardware platform | devices |
|---|---|
| NVIDIA (CUDA) | modern architectures with compute capability 7.0+ (Volta, Turing, Ampere, Hopper, Blackwell) |
| AMD (ROCm) | ROCm devices |
| Apple Silicon (Metal) | M-series chips (M1, M2, M3, M4 and newer) |
| Intel GPUs (XPU) | Intel Data Center CPU Max Series and compatible devices |
> [!TIP]
> Check out the Kernel [Hub](https://huggingface.co/kernels-community) for popular kernel implementations like FlashAttention, MegaBlocks, Liger, and more.

Kernels in transformers are used to optimize the performance of models with custom layers from the hub and very low effort.
The Transformers Kernels integration provides high-performance, optimized kernel implementations for common transformer operations. By enabling kernels with a single `use_kernels=True` flag, you can achieve significant speedups for model inference and training with minimal code changes. The system leverages specialized CUDA, Triton, ROCm, Metal, and XPU kernels distributed through the Hugging Face Hub, automatically replacing standard PyTorch operations while maintaining full compatibility. Kernels are mode-aware (automatically switching between training and inference optimizations), support multiple hardware backends (NVIDIA, AMD, Apple Silicon, Intel), and are fully customizable via `KernelConfig` for advanced use cases.

For more information on optimizing transformer performance, see the [Performance and Optimization guide](../performance).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see a Performance and Optimization guide anywhere

Suggested change
For more information on optimizing transformer performance, see the [Performance and Optimization guide](../performance).

Comment on lines +7 to +24
## Requirements

### Software Dependencies

Install the `kernels` package to enable this feature:

```bash
pip install kernels>=0.9.0
```

Upgrade to the latest version for the newest features and hardware support:

```bash
pip install --upgrade kernels
```

Please note the 0.9.0 is the minimum version required for this feature.
We do recommend using the most recent version to get the best performances and bug fixes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Requirements
### Software Dependencies
Install the `kernels` package to enable this feature:
```bash
pip install kernels>=0.9.0
```
Upgrade to the latest version for the newest features and hardware support:
```bash
pip install --upgrade kernels
```
Please note the 0.9.0 is the minimum version required for this feature.
We do recommend using the most recent version to get the best performances and bug fixes.
Install version 0.9.0 to enable the kernel integration in Transformers.
\```bash
pip install kernels>=0.9.0
\```
We recommend upgrading to the latest version for the newest features and hardware support.
\```bash
pip install --upgrade kernels
\```

Comment on lines +26 to +41
### Hardware Compatibility

Kernels support multiple hardware platforms with varying levels of optimization:

- **NVIDIA GPUs (CUDA)**: Modern architectures with compute capability 7.0+ (Volta, Turing, Ampere, Hopper, Blackwell)
- **AMD GPUs (ROCm)**: Compatible with ROCm-supported devices
- **Apple Silicon (Metal)**: M-series chips (M1, M2, M3, M4 and newer)
- **Intel GPUs (XPU)**: Intel Data Center GPU Max Series and compatible devices

Individual kernel implementations may have specific requirements. Consult the kernel repository documentation for detailed compatibility information.

## Quick Start

### Basic Usage

Let `kernels` pull and replace supported operations with optimized kernels when loading any model:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Hardware Compatibility
Kernels support multiple hardware platforms with varying levels of optimization:
- **NVIDIA GPUs (CUDA)**: Modern architectures with compute capability 7.0+ (Volta, Turing, Ampere, Hopper, Blackwell)
- **AMD GPUs (ROCm)**: Compatible with ROCm-supported devices
- **Apple Silicon (Metal)**: M-series chips (M1, M2, M3, M4 and newer)
- **Intel GPUs (XPU)**: Intel Data Center GPU Max Series and compatible devices
Individual kernel implementations may have specific requirements. Consult the kernel repository documentation for detailed compatibility information.
## Quick Start
### Basic Usage
Let `kernels` pull and replace supported operations with optimized kernels when loading any model:
## Loading kernels
Set `use_kernels=True` in [`~PreTrainedModel.from_pretrained`] to pull and replace supported operations with kernels.

- **Backward Compatibility**: Models work identically with or without kernels enabled
- **Dynamic Replacement**: Kernel replacement happens at model load time and persists for the model's lifetime

## Troubleshooting
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Troubleshooting
## Troubleshooting
Kernel integration depends on hardware, drivers, and package versions working together. Refer to the sections below to troubleshoot common failures.

Comment on lines +237 to +239
### Installation Issues

If you encounter import errors:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Installation Issues
If you encounter import errors:
### Installation issues
If you encounter import errors, make sure the kernels library is installed.

pip install kernels
```

### Kernel Loading Failures
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Kernel Loading Failures
### Kernel loading failures

- Verify your CUDA/ROCm/Metal drivers are up to date
- Consult the kernel repository documentation for known issues

### Device Compatibility
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Device Compatibility
### Device compatibility

Comment on lines +256 to +261
## Additional Resources

- **Kernels Library**: [github.com/huggingface/kernels](https://github.com/huggingface/kernels) - Core kernels implementation and kernel builder tools
- **Community Kernels**: [huggingface.co/kernels-community](https://huggingface.co/kernels-community) - Browse available kernel implementations
- **Performance Guide**: See the [Performance and Optimization documentation](../performance) for comprehensive optimization strategies
- **API Reference**: Detailed `KernelConfig` documentation for advanced configuration options No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Additional Resources
- **Kernels Library**: [github.com/huggingface/kernels](https://github.com/huggingface/kernels) - Core kernels implementation and kernel builder tools
- **Community Kernels**: [huggingface.co/kernels-community](https://huggingface.co/kernels-community) - Browse available kernel implementations
- **Performance Guide**: See the [Performance and Optimization documentation](../performance) for comprehensive optimization strategies
- **API Reference**: Detailed `KernelConfig` documentation for advanced configuration options
## Resources
Visit the [kernels](https://github.com/huggingface/kernels) library for core kernel implementations and kernel builder tools.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
@stevhliu
Copy link
Copy Markdown
Member

Feel free to ping me again for another review whenever you're ready!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants