doc(kernels): update kernels integration documentation by mfuntowicz · Pull Request #42277 · huggingface/transformers

mfuntowicz · 2025-11-19T09:06:05Z

Add some more content to the kernels integration in Transformers.

HuggingFaceDocBuilderDev · 2025-11-19T09:14:37Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

LysandreJik

It's a good start! I'm inviting @MekkCyber and @danieldk to also contribute here wrt the overall integration

Small additional comment: imo we should try to have a good, coherent doc here, and cross link it everywhere else where it makes sense (guides/docs like performance, optimization, etc)

MekkCyber

Thanks a lot @mfuntowicz ! This was very much needed

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

stevhliu

Thanks for kicking this off, I think we can polish this a bit further!

the Key Benefits and Supported Operations list interrupts the narrative flow and gets in the way of the learning path. i think this may be better at the end in a Reference section or even a link to where users can find all the supported ops (may not scale well when the list grows)
it'd be nice to integrate Key Benefits into the intro paragraph of the doc to emphasize its benefits, instead of a list
it would also make more sense to move the Requirements higher up so users know upfront
i think it'd flow better if we string the Advanced Features together with the content in the Quick Start. currently it feels a bit disjointed and doesn't really build off of or connect to what comes before
may be useful to include an example that shows you how to find out which kernels are loaded
maybe organize it like this:

# Kernels

intro paragraph about what it is and the key benefits
requirements and installation

## Enabling kernels
different ways of specifying kernels in `from_pretrained`
`use_kernels=True`
`attn_implementation` for attention kernels
`KernelConfig` for device-specific kernels

## Mode configuration
switching between inference, training, and torch.compile kernels

## Automatic kernel replacement
explanation about how it works

MekkCyber

Thanks ! looks good to me

stevhliu

Thanks! I recommend simplifying some of the sections (combining them or deleting them) to avoid having too many headers in the right sidebar as it can be a bit overwhelming. I don't think we have to divide the doc up so granularly.

I also added an example of how to explicitly set Mode.INFERENCE with kernelize, but I'm not 100% sure this is the correct way to do it. 😅 Would appreciate a look over that as well!

Also, remember to add it to the toctree!

stevhliu · 2025-12-10T19:18:09Z

+# Hugging Face Transformers Kernels

-Kernels in transformers are used to optimize the performance of models with custom layers from the hub and very low effort.
+The Transformers Kernels integration provides high-performance, optimized kernel implementations for common transformer operations. By enabling kernels with a single `use_kernels=True` flag, you can achieve significant speedups for model inference and training with minimal code changes. The system leverages specialized CUDA, Triton, ROCm, Metal, and XPU kernels distributed through the Hugging Face Hub, automatically replacing standard PyTorch operations while maintaining full compatibility. Kernels are mode-aware (automatically switching between training and inference optimizations), support multiple hardware backends (NVIDIA, AMD, Apple Silicon, Intel), and are fully customizable via `KernelConfig` for advanced use cases.


Suggested change

The Transformers Kernels integration provides high-performance, optimized kernel implementations for common transformer operations. By enabling kernels with a single `use_kernels=True` flag, you can achieve significant speedups for model inference and training with minimal code changes. The system leverages specialized CUDA, Triton, ROCm, Metal, and XPU kernels distributed through the Hugging Face Hub, automatically replacing standard PyTorch operations while maintaining full compatibility. Kernels are mode-aware (automatically switching between training and inference optimizations), support multiple hardware backends (NVIDIA, AMD, Apple Silicon, Intel), and are fully customizable via `KernelConfig` for advanced use cases.

The [Kernels](https://huggingface.co/docs/kernels/en/index) integration pulls specialized kernels distributed on the [Hub](https://huggingface.co/kernels-community) into Transformers to speed up inference and training. The kernels automatically replace standard PyTorch operations, switch between training and inference optimizations, and are configurable.

Multiple hardware backends are supported as shown in the table below. Individual kernels may have specific requirements. Consult the kernel repository documentation for more detailed compatibility information.

| hardware platform | devices |

|---|---|

| NVIDIA (CUDA) | modern architectures with compute capability 7.0+ (Volta, Turing, Ampere, Hopper, Blackwell) |

| AMD (ROCm) | ROCm devices |

| Apple Silicon (Metal) | M-series chips (M1, M2, M3, M4 and newer) |

| Intel GPUs (XPU) | Intel Data Center CPU Max Series and compatible devices |

> [!TIP]

> Check out the Kernel [Hub](https://huggingface.co/kernels-community) for popular kernel implementations like FlashAttention, MegaBlocks, Liger, and more.

stevhliu · 2025-12-10T19:18:36Z

-Kernels in transformers are used to optimize the performance of models with custom layers from the hub and very low effort.
+The Transformers Kernels integration provides high-performance, optimized kernel implementations for common transformer operations. By enabling kernels with a single `use_kernels=True` flag, you can achieve significant speedups for model inference and training with minimal code changes. The system leverages specialized CUDA, Triton, ROCm, Metal, and XPU kernels distributed through the Hugging Face Hub, automatically replacing standard PyTorch operations while maintaining full compatibility. Kernels are mode-aware (automatically switching between training and inference optimizations), support multiple hardware backends (NVIDIA, AMD, Apple Silicon, Intel), and are fully customizable via `KernelConfig` for advanced use cases.
+
+For more information on optimizing transformer performance, see the [Performance and Optimization guide](../performance).


I didn't see a Performance and Optimization guide anywhere

Suggested change

For more information on optimizing transformer performance, see the [Performance and Optimization guide](../performance).

stevhliu · 2025-12-10T19:18:53Z

+## Requirements
+
+### Software Dependencies
+
+Install the `kernels` package to enable this feature:
+
+```bash
+pip install kernels>=0.9.0
+```
+
+Upgrade to the latest version for the newest features and hardware support:
+
+```bash
+pip install --upgrade kernels
+```
+
+Please note the 0.9.0 is the minimum version required for this feature.
+We do recommend using the most recent version to get the best performances and bug fixes.


Suggested change

## Requirements

### Software Dependencies

Install the `kernels` package to enable this feature:

```bash

pip install kernels>=0.9.0

```

Upgrade to the latest version for the newest features and hardware support:

```bash

pip install --upgrade kernels

```

Please note the 0.9.0 is the minimum version required for this feature.

We do recommend using the most recent version to get the best performances and bug fixes.

Install version 0.9.0 to enable the kernel integration in Transformers.

\```bash

pip install kernels>=0.9.0

\```

We recommend upgrading to the latest version for the newest features and hardware support.

\```bash

pip install --upgrade kernels

\```

stevhliu · 2025-12-10T19:20:18Z

+### Hardware Compatibility
+
+Kernels support multiple hardware platforms with varying levels of optimization:
+
+- **NVIDIA GPUs (CUDA)**: Modern architectures with compute capability 7.0+ (Volta, Turing, Ampere, Hopper, Blackwell)
+- **AMD GPUs (ROCm)**: Compatible with ROCm-supported devices
+- **Apple Silicon (Metal)**: M-series chips (M1, M2, M3, M4 and newer)
+- **Intel GPUs (XPU)**: Intel Data Center GPU Max Series and compatible devices
+
+Individual kernel implementations may have specific requirements. Consult the kernel repository documentation for detailed compatibility information.
+
+## Quick Start
+
+### Basic Usage
+
+Let `kernels` pull and replace supported operations with optimized kernels when loading any model:


Suggested change

### Hardware Compatibility

Kernels support multiple hardware platforms with varying levels of optimization:

- **NVIDIA GPUs (CUDA)**: Modern architectures with compute capability 7.0+ (Volta, Turing, Ampere, Hopper, Blackwell)

- **AMD GPUs (ROCm)**: Compatible with ROCm-supported devices

- **Apple Silicon (Metal)**: M-series chips (M1, M2, M3, M4 and newer)

- **Intel GPUs (XPU)**: Intel Data Center GPU Max Series and compatible devices

Individual kernel implementations may have specific requirements. Consult the kernel repository documentation for detailed compatibility information.

## Quick Start

### Basic Usage

Let `kernels` pull and replace supported operations with optimized kernels when loading any model:

## Loading kernels

Set `use_kernels=True` in [`~PreTrainedModel.from_pretrained`] to pull and replace supported operations with kernels.

stevhliu · 2025-12-10T19:28:04Z

+- **Backward Compatibility**: Models work identically with or without kernels enabled
+- **Dynamic Replacement**: Kernel replacement happens at model load time and persists for the model's lifetime
+
+## Troubleshooting


Suggested change

## Troubleshooting

## Troubleshooting

Kernel integration depends on hardware, drivers, and package versions working together. Refer to the sections below to troubleshoot common failures.

stevhliu · 2025-12-10T19:28:17Z

+### Installation Issues
+
+If you encounter import errors:


Suggested change

### Installation Issues

If you encounter import errors:

### Installation issues

If you encounter import errors, make sure the kernels library is installed.

stevhliu · 2025-12-10T19:28:34Z

+pip install kernels
+```
+
+### Kernel Loading Failures


Suggested change

### Kernel Loading Failures

### Kernel loading failures

stevhliu · 2025-12-10T19:28:44Z

+- Verify your CUDA/ROCm/Metal drivers are up to date
+- Consult the kernel repository documentation for known issues
+
+### Device Compatibility


Suggested change

### Device Compatibility

### Device compatibility

stevhliu · 2025-12-10T19:29:03Z

+## Additional Resources
+
+- **Kernels Library**: [github.com/huggingface/kernels](https://github.com/huggingface/kernels) - Core kernels implementation and kernel builder tools
+- **Community Kernels**: [huggingface.co/kernels-community](https://huggingface.co/kernels-community) - Browse available kernel implementations
+- **Performance Guide**: See the [Performance and Optimization documentation](../performance) for comprehensive optimization strategies
+- **API Reference**: Detailed `KernelConfig` documentation for advanced configuration options


Suggested change

## Additional Resources

- **Kernels Library**: [github.com/huggingface/kernels](https://github.com/huggingface/kernels) - Core kernels implementation and kernel builder tools

- **Community Kernels**: [huggingface.co/kernels-community](https://huggingface.co/kernels-community) - Browse available kernel implementations

- **Performance Guide**: See the [Performance and Optimization documentation](../performance) for comprehensive optimization strategies

- **API Reference**: Detailed `KernelConfig` documentation for advanced configuration options

## Resources

Visit the [kernels](https://github.com/huggingface/kernels) library for core kernel implementations and kernel builder tools.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

stevhliu · 2025-12-15T23:33:49Z

Feel free to ping me again for another review whenever you're ready!

doc(kernels): update kernels integration documentation

6d9445e

LysandreJik reviewed Nov 19, 2025

View reviewed changes

Comment thread docs/source/en/kernel_doc/overview.md Outdated

danieldk reviewed Nov 19, 2025

View reviewed changes

Comment thread docs/source/en/kernel_doc/overview.md Outdated

Comment thread docs/source/en/kernel_doc/overview.md Outdated

Comment thread docs/source/en/kernel_doc/overview.md Outdated

MekkCyber reviewed Nov 19, 2025

View reviewed changes

mfuntowicz and others added 2 commits November 19, 2025 11:13

Update docs/source/en/kernel_doc/overview.md

5ce59fd

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

Update docs/source/en/kernel_doc/overview.md

a1f05f7

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

stevhliu reviewed Nov 19, 2025

View reviewed changes

Comment thread docs/source/en/kernel_doc/overview.md Outdated

Comment thread docs/source/en/kernel_doc/overview.md Outdated

Comment thread docs/source/en/kernel_doc/overview.md Outdated

Comment thread docs/source/en/kernel_doc/overview.md

doc(kernels): address comments

241e14a

MekkCyber requested a review from stevhliu December 10, 2025 08:52

MekkCyber approved these changes Dec 10, 2025

View reviewed changes

stevhliu reviewed Dec 10, 2025

View reviewed changes

Update docs/source/en/kernel_doc/overview.md

ba51113

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

stevhliu mentioned this pull request Dec 15, 2025

Add local kernel loading support to KernelConfig(). #42800

Merged

5 tasks

stevhliu mentioned this pull request Jan 14, 2026

[docs] kernels #43269

Merged

evalstate mentioned this pull request Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

-The Transformers Kernels integration provides high-performance, optimized kernel implementations for common transformer operations. By enabling kernels with a single `use_kernels=True` flag, you can achieve significant speedups for model inference and training with minimal code changes. The system leverages specialized CUDA, Triton, ROCm, Metal, and XPU kernels distributed through the Hugging Face Hub, automatically replacing standard PyTorch operations while maintaining full compatibility. Kernels are mode-aware (automatically switching between training and inference optimizations), support multiple hardware backends (NVIDIA, AMD, Apple Silicon, Intel), and are fully customizable via `KernelConfig` for advanced use cases.
+The [Kernels](https://huggingface.co/docs/kernels/en/index) integration pulls specialized kernels distributed on the [Hub](https://huggingface.co/kernels-community) into Transformers to speed up inference and training. The kernels automatically replace standard PyTorch operations, switch between training and inference optimizations, and are configurable.
+Multiple hardware backends are supported as shown in the table below. Individual kernels may have specific requirements. Consult the kernel repository documentation for more detailed compatibility information.
+| hardware platform | devices |
+|---|---|
+| NVIDIA (CUDA) | modern architectures with compute capability 7.0+ (Volta, Turing, Ampere, Hopper, Blackwell) |
+| AMD (ROCm) | ROCm devices |
+| Apple Silicon (Metal) | M-series chips (M1, M2, M3, M4 and newer) |
+| Intel GPUs (XPU) | Intel Data Center CPU Max Series and compatible devices |
+> [!TIP]
+> Check out the Kernel [Hub](https://huggingface.co/kernels-community) for popular kernel implementations like FlashAttention, MegaBlocks, Liger, and more.

Conversation

mfuntowicz commented Nov 19, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 19, 2025

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevhliu commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants