Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,21 @@
- local: cli
title: Kernels CLI
title: API Reference
- sections:
- local: cli-init
title: kernels init
- local: cli-upload
title: kernels upload
- local: cli-benchmark
title: kernels benchmark
- local: cli-check
title: kernels check
- local: cli-versions
title: kernels versions
- local: cli-generate-readme
title: kernels generate-readme
- local: cli-lock
title: kernels lock
- local: cli-download
title: kernels download
title: CLI Reference
147 changes: 147 additions & 0 deletions docs/source/cli-benchmark.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# kernels benchmark

Use `kernels benchmark` to run benchmark scripts shipped with a kernel repository.

The command:

- Downloads the kernel repo at a specific **branch** or **version**
- Runs all `benchmarks/benchmark*.py` scripts
- Times each `benchmark_*` workload and prints a results table
- Optionally saves results as JSON

## Installation

`kernels benchmark` requires extra dependencies:

```bash
uv pip install 'kernels[benchmark]' # or pip install 'kernels[benchmark]'
```

## Example

```bash
kernels benchmark kernels-community/activation --version 1
```

Example output:

```text
Downloading kernels-community/activation@v1...
Running benchmark.py...

GPU Apple M3 Max (30 cores)
CPU Apple M3 Max
OS Darwin 25.2.0
PyTorch 2.10.0

Running SiluWorkloads on mps

┌───────────────┬────────────┬─────┬───────────┬────────────┬───────────┬───────────┬───────────┬───────────┬────────────┬───────────┬─────────┐
│ Benchmark │ Workload │ N │ Speedup │ Mean(ms) │ Std(ms) │ Min(ms) │ Max(ms) │ IQR(ms) │ Outliers │ Ref(ms) │ Match │
├───────────────┼────────────┼─────┼───────────┼────────────┼───────────┼───────────┼───────────┼───────────┼────────────┼───────────┼─────────┤
│ SiluWorkloads │ large │ 100 │ 1.72x │ 6.5153 │ 0.4343 │ 6.2883 │ 8.4699 │ 0.1701 │ 8 │ 11.2048 │ ✓ │
│ SiluWorkloads │ medium │ 100 │ 2.48x │ 1.1813 │ 0.3976 │ 1.04 │ 4.2146 │ 0.0698 │ 5 │ 2.9332 │ ✓ │
│ SiluWorkloads │ small │ 100 │ 1.96x │ 0.4909 │ 0.2175 │ 0.4407 │ 2.6438 │ 0.0085 │ 16 │ 0.9622 │ ✓ │
└───────────────┴────────────┴─────┴───────────┴────────────┴───────────┴───────────┴───────────┴───────────┴────────────┴───────────┴─────────┘

large: 1.72x faster (95% CI: 6.4302-6.6004ms vs ref 11.2048ms) ✓ significant
medium: 2.48x faster (95% CI: 1.1034-1.2592ms vs ref 2.9332ms) ✓ significant
small: 1.96x faster (95% CI: 0.4483-0.5335ms vs ref 0.9622ms) ✓ significant

Kernel: 2385e44 Benchmark: 5b53516
```

## Usage

You must specify which revision to benchmark, either via flags or with `@...` in the repo id:

```bash
kernels benchmark <repo_id> --version <N>
kernels benchmark <repo_id> --branch <name>
kernels benchmark <repo_id>@v<N>
kernels benchmark <repo_id>@<branch>
```

## Examples

Benchmark a tagged kernel version:

```bash
kernels benchmark kernels-community/activation --version 1
```

Equivalent shorthand:

```bash
kernels benchmark kernels-community/activation@v1
```

Benchmark a branch:

```bash
kernels benchmark kernels-community/activation --branch main
```

Tune warmup and iteration count:

```bash
kernels benchmark kernels-community/activation@v1 --warmup 20 --iterations 200
```

Save results to a file (JSON):

```bash
kernels benchmark kernels-community/activation@v1 --output results.json
```

Benchmark a local kernel checkout (must contain `benchmarks/`):

```bash
kernels benchmark ./my_kernel
```

## Output

- By default, a table is printed (timings in ms).
- `--output <file>.json` writes a JSON payload to disk.

## Writing Benchmark Scripts

Benchmark scripts must live under `benchmarks/` in the kernel repository and match `benchmark*.py`.
Each script should define one or more subclasses of `kernels.benchmark.Benchmark`.

Minimal example (`benchmarks/benchmark_activation.py`):

```python
import torch

from kernels.benchmark import Benchmark


class ActivationBenchmark(Benchmark):
seed = 0

def setup(self):
self.x = torch.randn(128, 1024, device=self.device, dtype=torch.float16)
self.out = torch.empty(128, 512, device=self.device, dtype=torch.float16)

def benchmark_silu_and_mul(self):
self.kernel.silu_and_mul(self.out, self.x)

def verify_silu_and_mul(self):
# Return reference tensor; runner compares with self.out
return torch.nn.functional.silu(self.x[..., :512]) * self.x[..., 512:]
```

The runner will:

- Call `setup()` once per workload (or `setup_<workload>()` if present)
- Warm up (`--warmup`)
- Time `benchmark_<workload>()` for `--iterations`
- If `verify_<workload>()` exists, check that outputs match (`torch.allclose(..., atol=1e-2)`) and show a speedup vs the reference computation

## Troubleshooting

- If the repo does not contain a `benchmarks/` directory (or no `benchmark*.py` files), the command exits with an error.
- If a benchmark script defines no `Benchmark` subclasses, the command exits with an error.
- If `verify_<workload>()` exists and the outputs do not match, the command exits with an error.
65 changes: 65 additions & 0 deletions docs/source/cli-check.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# kernels check

Use `kernels check` to verify that a kernel on the Hub meets compliance requirements.

## What It Checks

- Python ABI compatibility (default: 3.9)
- Operating system compatibility (macOS 15.0+, manylinux_2_28)

## Usage

```bash
kernels check <repo_id> [--revision <rev>] [--macos <version>] [--manylinux <version>] [--python-abi <version>]
```

## Installation

`kernels check` requires an additional dependency:

```bash
uv pip install kernel-abi-check # or pip install kernel-abi-check
```

## Examples

Check a kernel on the Hub:

```bash
kernels check kernels-community/flash-attn3
```

Check a specific revision:

```bash
kernels check kernels-community/flash-attn3 --revision v2
```

Check with custom compatibility requirements:

```bash
kernels check kernels-community/flash-attn3 --python-abi 3.10 --manylinux manylinux_2_31
```

## Example Output

```text
Checking variant: torch210-metal-aarch64-darwin
Dynamic library _example_kernel_metal_2juixjwdznbhy.abi3.so:
🐍 Python ABI 3.9 compatible
🍏 compatible with macOS 15.0
Checking variant: torch29-metal-aarch64-darwin
Dynamic library _example_kernel_metal_vtlnpevkb6uum.abi3.so:
🐍 Python ABI 3.9 compatible
🍏 compatible with macOS 15.0
```

## Options

| Option | Default | Description |
| -------------- | ---------------- | ----------------------------------- |
| `--revision` | `main` | Branch, tag, or commit SHA to check |
| `--macos` | `15.0` | Minimum macOS version to require |
| `--manylinux` | `manylinux_2_28` | Manylinux version to require |
| `--python-abi` | `3.9` | Python ABI version to require |

50 changes: 50 additions & 0 deletions docs/source/cli-download.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# kernels download

Use `kernels download` to download kernels that have been locked in a project's `kernels.lock` file.

## Usage

```bash
kernels download <project_dir> [--all-variants]
```

## What It Does

- Reads the `kernels.lock` file from the specified project directory
- Downloads each locked kernel at its pinned revision (SHA)
- Installs the appropriate variant for your platform (or all variants with `--all-variants`)

## Examples

Download kernels for the current project:

```bash
kernels download .
```

Download all build variants (useful for CI or multi-platform builds):

```bash
kernels download . --all-variants
```

Download kernels for a specific project:

```bash
kernels download /path/to/my-project
```

## Options

| Option | Description |
| ---------------- | ----------------------------------------------------------------------------------------- |
| `--all-variants` | Download all build variants of each kernel instead of just the current platform's variant |

## Prerequisites

Your project directory must contain a `kernels.lock` file. Generate one using [`kernels lock`](cli-lock.md).

## See Also

- [kernels lock](cli-lock.md) - Generate the lock file
- [kernels versions](cli-versions.md) - View available kernel versions
61 changes: 61 additions & 0 deletions docs/source/cli-generate-readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# kernels generate-readme

Use `kernels generate-readme` to automatically generate documentation snippets for a kernel's public functions.

## Usage

```bash
kernels generate-readme <repo_id> [--revision <rev>]
```

## What It Does

- Downloads the specified kernel from the Hub
- Inspects the kernel's public API
- Generates markdown documentation snippets showing function signatures and usage

## Examples

Generate README snippets for a kernel:

```bash
kernels generate-readme kernels-community/activation > README.md
```

## Example Output

README.md snippet for `kernels-community/activation`:
```md
---
tags:
- kernels
---

## Functions

### Function `fatrelu_and_mul`

`(out: torch.Tensor, x: torch.Tensor, threshold: float = 0.0) -> None`

No documentation available.

### Function `gelu`

`(out: torch.Tensor, x: torch.Tensor) -> None`

No documentation available.

### Function `gelu_and_mul`

`(out: torch.Tensor, x: torch.Tensor) -> None`

No documentation available.

### Function `gelu_fast`

`(out: torch.Tensor, x: torch.Tensor) -> None`

No documentation available.

...
```
Loading
Loading