Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
43007fc
Expand building Claude skill to cover general ET building from source
psiddh Mar 9, 2026
f7d0f37
Refactor building skill from reference manual to action-oriented flow
Mar 10, 2026
c6ba3b0
Update .claude/skills/building/SKILL.md
psiddh Mar 10, 2026
f4390de
Update .claude/skills/building/SKILL.md
psiddh Mar 10, 2026
cb41d23
Update .claude/skills/building/SKILL.md
psiddh Mar 10, 2026
e033764
Update .claude/skills/building/SKILL.md
psiddh Mar 10, 2026
d6c134b
Update .claude/skills/building/SKILL.md
psiddh Mar 10, 2026
ffc0722
Update .claude/skills/building/SKILL.md
psiddh Mar 10, 2026
c6b9d34
Fix Cadence CPU runner CMake build
aliafzal Mar 10, 2026
286ccef
Skip Samsung jobs that require secrests on forked PRs (#18064)
GregoryComer Mar 10, 2026
c85bfe1
Autoglob: don't expose mode labels (#18069)
manuelcandales Mar 10, 2026
179f84e
Harden building skill from e2e testing
Mar 10, 2026
d75acb2
Add routing table to building skill for Android/iOS/model targets
Mar 10, 2026
62d417d
Update .claude/skills/building/SKILL.md
psiddh Mar 10, 2026
74d0c7e
Address PR review comments on building skill
Mar 10, 2026
8c0a60b
Re-export Q_ANNOTATION_KEY from quantizer annotators package
manuelcandales Mar 10, 2026
6209f27
Fix fresh-Mac gaps: Xcode CLT, conda shell hook, Python version fallback
Mar 10, 2026
d2e8919
Remove Xcode CLT prerequisite — not in ET docs, rarely needed
Mar 10, 2026
dda73d3
Add WASM/Emscripten compiler flags to runtime_wrapper.bzl
s09g Mar 10, 2026
3baf6c2
Remove extern "C" wrapping and fix format specifiers for ARM embedded…
Ninja91 Mar 10, 2026
bad1aec
Xnnpack disable workspace nonlock (#17780)
Froskekongen Mar 10, 2026
cedfe4c
Fix heap-buffer-overflow in constant_pad_nd (#18018)
psiddh Mar 10, 2026
518daa8
Revise ethos doc links in CMakeLists.txt (#18075)
psiddh Mar 10, 2026
da50e3f
Merge branch 'main' into expand-building-claude-skill
psiddh Mar 10, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
222 changes: 211 additions & 11 deletions .claude/skills/building/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,223 @@
---
name: building
description: Build ExecuTorch runners or C++ libraries. Use when compiling runners for Llama, Whisper, or other models, or building the C++ runtime.
description: Build ExecuTorch from source — Python package, C++ runtime, runners, cross-compilation, and backend-specific builds. Use when compiling anything in the ExecuTorch repo, diagnosing build failures, or setting up platform-specific builds.
---

# Building
# Building ExecuTorch

## Runners (Makefile)
## Step 1: Ensure Python environment (detect and fix automatically)

**Path A — conda (preferred):**
```bash
# Initialize conda for non-interactive shells (required in Claude Code / CI)
eval "$(conda shell.bash hook 2>/dev/null)"

# Check if executorch conda env exists; create if not
conda env list 2>/dev/null | grep executorch || \
ls "$(conda info --base 2>/dev/null)/envs/" 2>/dev/null | grep executorch || \
conda create -yn executorch python=3.12

# Activate
conda activate executorch
```

**Path B — no conda (fall back to venv):**
```bash
# Find a compatible Python (3.10–3.13). On macOS with only Homebrew Python 3.14+,
# install a compatible version first: brew install python@3.12
python3.12 -m venv .executorch-venv # or python3.11, python3.10, python3.13
source .executorch-venv/bin/activate
pip install --upgrade pip
```

**Then verify (either path):**

Run `python --version` and `cmake --version`. Fix automatically:
- **Python not 3.10–3.13**: recreate the env with a correct Python version.
- **cmake missing or < 3.24**: run `pip install 'cmake>=3.24'` inside the env.
- **cmake >= 4.0**: works in practice, no action needed.

Parallel jobs: `$(sysctl -n hw.ncpu)` on macOS, `$(nproc)` on Linux.

## Step 2: Build

Route based on what the user asks for:
- User mentions **Android** → skip to [Cross-compilation: Android](#cross-compilation)
- User mentions **iOS** or **frameworks** → skip to [Cross-compilation: iOS](#cross-compilation)
- User mentions a **model name** (llama, whisper, etc.) → skip to [LLM / ASR model runner](#llm--asr-model-runner-simplest-path-for-running-models)
- User mentions **C++ runtime** or **cmake** → skip to [C++ runtime](#c-runtime-standalone)
- Otherwise → default to **Python package** below

### Python package (default)
```bash
make help # list all targets
make llama-cpu # Llama
make whisper-metal # Whisper on Metal
make gemma3-cuda # Gemma3 on CUDA
conda activate executorch
./install_executorch.sh --editable # editable install from source
```
This handles everything: submodules, deps, C++ build, Python install. Takes ~10 min on Apple Silicon.

For subsequent rebuilds (deps already present): `pip install -e . --no-build-isolation`

For minimal install (skip example deps): `./install_executorch.sh --minimal`

Enable additional backends:
```bash
CMAKE_ARGS="-DEXECUTORCH_BUILD_COREML=ON -DEXECUTORCH_BUILD_MPS=ON" ./install_executorch.sh --editable
```

Verify: `python -c "from executorch.exir import to_edge_transform_and_lower; print('OK')"`

### LLM / ASR model runner (simplest path for running models)

```bash
conda activate executorch
make <model>-<backend>
```

Available targets (run `make help` for full list):

| Target | Backend | macOS | Linux |
|--------|---------|-------|-------|
| `llama-cpu` | CPU | yes | yes |
| `llama-cuda` | CUDA | — | yes |
| `llama-cuda-debug` | CUDA (debug) | — | yes |
| `llava-cpu` | CPU | yes | yes |
| `whisper-cpu` | CPU | yes | yes |
| `whisper-metal` | Metal | yes | — |
| `whisper-cuda` | CUDA | — | yes |
| `parakeet-cpu` | CPU | yes | yes |
| `parakeet-metal` | Metal | yes | — |
| `parakeet-cuda` | CUDA | — | yes |
| `voxtral-cpu` | CPU | yes | yes |
| `voxtral-cuda` | CUDA | — | yes |
| `voxtral-metal` | Metal | yes | — |
| `voxtral_realtime-cpu` | CPU | yes | yes |
| `voxtral_realtime-cuda` | CUDA | — | yes |
| `voxtral_realtime-metal` | Metal | yes | — |
| `gemma3-cpu` | CPU | yes | yes |
| `gemma3-cuda` | CUDA | — | yes |
| `sortformer-cpu` | CPU | yes | yes |
| `sortformer-cuda` | CUDA | — | yes |
| `silero-vad-cpu` | CPU | yes | yes |
| `clean` | — | yes | yes |

Output: `cmake-out/examples/models/<model>/<runner>`

## C++ Libraries (CMake)
### C++ runtime (standalone)

**With presets (recommended):**

| Platform | Command |
|----------|---------|
| macOS | `cmake -B cmake-out --preset macos` (uses Xcode generator — requires Xcode) |
| Linux | `cmake -B cmake-out --preset linux -DCMAKE_BUILD_TYPE=Release` |
| Windows | `cmake -B cmake-out --preset windows -T ClangCL` |

Then: `cmake --build cmake-out --config Release -j$(sysctl -n hw.ncpu)` (macOS) or `cmake --build cmake-out -j$(nproc)` (Linux)

**LLM libraries via workflow presets** (configure + build + install in one command):
```bash
cmake --workflow --preset llm-release # CPU
cmake --workflow --preset llm-release-metal # Metal (macOS)
cmake --workflow --preset llm-release-cuda # CUDA (Linux/Windows)
```

**Manual CMake (custom flags):**
```bash
cmake -B cmake-out \
-DCMAKE_BUILD_TYPE=Release \
-DEXECUTORCH_BUILD_XNNPACK=ON \
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
-DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON \
-DEXECUTORCH_BUILD_EXTENSION_NAMED_DATA_MAP=ON \
-DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
-DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON
cmake --build cmake-out --parallel "$(nproc 2>/dev/null || sysctl -n hw.ncpu)"
```

Run `cmake --list-presets` to see all available presets.

### Cross-compilation

**iOS/macOS frameworks:**
```bash
./scripts/build_apple_frameworks.sh --coreml --mps --xnnpack
```
Link in Xcode with `-all_load` linker flag.

**Android:**

Requires `ANDROID_NDK` on PATH (typically set by Android Studio or standalone NDK install).
```bash
cmake --list-presets # list presets
cmake --workflow --preset llm-release # LLM CPU
cmake --workflow --preset llm-release-metal # LLM Metal
# Verify NDK is available
echo $ANDROID_NDK # must point to NDK root, e.g. ~/Library/Android/sdk/ndk/<version>
export ANDROID_ABIS=arm64-v8a BUILD_AAR_DIR=aar-out
mkdir -p $BUILD_AAR_DIR && sh scripts/build_android_library.sh
```

## Key build options

Most commonly needed flags (full list: `CMakeLists.txt`):

| Flag | What it enables |
|------|-----------------|
| `EXECUTORCH_BUILD_XNNPACK` | XNNPACK CPU backend |
| `EXECUTORCH_BUILD_COREML` | Core ML (macOS/iOS) |
| `EXECUTORCH_BUILD_MPS` | MPS GPU (macOS/iOS) |
| `EXECUTORCH_BUILD_METAL` | Metal compute (macOS, requires EXTENSION_TENSOR) |
| `EXECUTORCH_BUILD_CUDA` | CUDA GPU (Linux/Windows, requires EXTENSION_TENSOR) |
| `EXECUTORCH_BUILD_KERNELS_OPTIMIZED` | Optimized kernels |
| `EXECUTORCH_BUILD_KERNELS_QUANTIZED` | Quantized kernels |
| `EXECUTORCH_BUILD_EXTENSION_MODULE` | Module extension (requires DATA_LOADER + FLAT_TENSOR + NAMED_DATA_MAP) |
| `EXECUTORCH_BUILD_EXTENSION_LLM` | LLM extension |
| `EXECUTORCH_BUILD_TESTS` | Unit tests (`ctest --test-dir cmake-out --output-on-failure`) |
| `EXECUTORCH_BUILD_DEVTOOLS` | DevTools (Inspector, ETDump) |
| `EXECUTORCH_OPTIMIZE_SIZE` | Size-optimized build (`-Os`, no exceptions/RTTI) |
| `CMAKE_BUILD_TYPE` | `Release` or `Debug` (5-10x slower). Some presets (e.g. `llm-release`) set this; others require it explicitly. |

## Troubleshooting

| Symptom | Fix |
|---------|-----|
| Missing headers / `CMakeLists.txt not found` in third-party | `git submodule sync --recursive && git submodule update --init --recursive` |
| Mysterious failures after `git pull` or branch switch | `rm -rf cmake-out/ pip-out/ && git submodule sync && git submodule update --init --recursive` |
| `conda env list` PermissionError | Use `CONDA_NO_PLUGINS=true conda env list` or check env dir directly |
| CMake >= 4.0 | Works in practice despite `< 4.0` in docs; only fix if build actually fails |
| `externally-managed-environment` / PEP 668 error | You're using system Python, not conda. Activate conda env first. |
| pip conflicts with torch versions | Fresh conda env; or `./install_executorch.sh --use-pt-pinned-commit` |
| Missing `Python.h` (Linux) | `sudo apt install python3.X-dev` |
| Missing operator registrations at runtime | Link kernel libs with `-Wl,-force_load,<lib>` (macOS) or `-Wl,--whole-archive <lib> -Wl,--no-whole-archive` (Linux) |
| `install_executorch.sh` fails on Intel Mac | No prebuilt PyTorch wheels; use `--use-pt-pinned-commit --minimal` |
| XNNPACK build errors about cpuinfo/pthreadpool | Ensure `EXECUTORCH_BUILD_CPUINFO=ON` and `EXECUTORCH_BUILD_PTHREADPOOL=ON` (both ON by default) |
| Duplicate kernel registration abort | Only link one `gen_operators_lib` per target |

## Build output

**From `./install_executorch.sh` (Python package):**

| Artifact | Location |
|----------|----------|
| Python package | `site-packages/executorch` |

**From CMake builds** (`cmake --install` with `CMAKE_INSTALL_PREFIX=cmake-out`):

| Artifact | Location |
|----------|----------|
| Core runtime | `cmake-out/lib/libexecutorch.a` |
| XNNPACK backend | `cmake-out/lib/libxnnpack_backend.a` |
| executor_runner | `cmake-out/executor_runner` (Ninja/Make) or `cmake-out/Release/executor_runner` (Xcode) |
| Model runners | `cmake-out/examples/models/<model>/<runner>` |

**From cross-compilation:**

| Artifact | Location |
|----------|----------|
| iOS frameworks | `cmake-out/*.xcframework` |
| Android AAR | `aar-out/` |

## Tips
- Always use `Release` for benchmarking; `Debug` is 5–10x slower
- `ccache` is auto-detected if installed (`brew install ccache`)
- `Ninja` is faster than Make (`-G Ninja`) — but `--preset macos` uses Xcode generator
- For LLM workflows, `make <model>-<backend>` is the simplest path
- After `git pull`, clean and re-init submodules before rebuilding
6 changes: 4 additions & 2 deletions .github/workflows/pull.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1057,7 +1057,8 @@ jobs:

test-samsung-quantmodels-linux:
name: test-samsung-quantmodels-linux
# if: github.event.pull_request.head.repo.full_name == github.repository || github.event_name != 'pull_request'
# Skip this job if the pull request is from a fork (secrets are not available)
if: github.event.pull_request.head.repo.full_name == github.repository || github.event_name != 'pull_request'
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
permissions:
id-token: write
Expand Down Expand Up @@ -1094,7 +1095,8 @@ jobs:

test-samsung-models-linux:
name: test-samsung-models-linux
# if: github.event.pull_request.head.repo.full_name == github.repository || github.event_name != 'pull_request'
# Skip this job if the pull request is from a fork (secrets are not available)
if: github.event.pull_request.head.repo.full_name == github.repository || github.event_name != 'pull_request'
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
permissions:
id-token: write
Expand Down
9 changes: 8 additions & 1 deletion backends/cadence/build_cadence_runner.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,19 @@ main() {

local example_dir=backends/cadence
local build_dir="cmake-out/${example_dir}"
local cmake_prefix_path="${PWD}/cmake-out/lib/cmake/ExecuTorch;${PWD}/cmake-out/third-party/gflags"
# Detect lib vs lib64
if [ -d "${PWD}/cmake-out/lib64/cmake/ExecuTorch" ]; then
libdir="lib64"
else
libdir="lib"
fi
local cmake_prefix_path="${PWD}/cmake-out/${libdir}/cmake/ExecuTorch;${PWD}/cmake-out/third-party/gflags"
rm -rf ${build_dir}
CXXFLAGS="-fno-exceptions -fno-rtti" cmake -DCMAKE_PREFIX_PATH="${cmake_prefix_path}" \
-DCMAKE_BUILD_TYPE=Release \
-DEXECUTORCH_CADENCE_CPU_RUNNER=ON \
-DEXECUTORCH_ENABLE_LOGGING=ON \
-DPYTHON_EXECUTABLE="$(which python3)" \
-B"${build_dir}" \
"${example_dir}"
cmake --build "${build_dir}" --config Release -j16
Expand Down
18 changes: 3 additions & 15 deletions backends/cadence/generic/operators/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -79,21 +79,9 @@ target_include_directories(
)

# Custom ops that are needed to run the test model.
add_library(
custom_ops
"quantized_add_out.cpp"
"quantized_linear_out.cpp"
"quantized_conv2d_nchw_out.cpp"
"quantized_conv2d_nhwc_out.cpp"
"quantized_relu_out.cpp"
"quantized_layer_norm.cpp"
"quantize_per_tensor.cpp"
"quantized_fully_connected_out.cpp"
"dequantize_per_tensor.cpp"
"quantized_matmul_out.cpp"
"op_requantize_out.cpp"
"im2row_out.cpp"
)
file(GLOB custom_ops_srcs "*.cpp")
add_library(custom_ops ${custom_ops_srcs})

target_include_directories(
custom_ops PUBLIC ${ROOT_DIR}/.. ${CMAKE_BINARY_DIR}
${_common_include_directories}
Expand Down
4 changes: 1 addition & 3 deletions backends/cortex_m/ops/cmsis_scratch_buffer_context.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,8 @@
*/
#pragma once

#include "cortex_m_ops_common.h"
extern "C" {
#include "arm_nnfunctions.h"
}
#include "cortex_m_ops_common.h"

namespace cortex_m {
namespace native {
Expand Down
Loading
Loading