Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
876a553
Add: document manual dependency scope design
uv-xiao Apr 1, 2026
2e0ee41
Update: tighten manual dependency design constraints
uv-xiao Apr 2, 2026
23a1fe2
docs: refine manual tensormap dependency design
uv-xiao Apr 3, 2026
37a0fae
Update: add partial manual TensorMap scope support
uv-xiao Apr 3, 2026
54e5324
Add unmodified tensormap runtime baseline
uv-xiao Apr 5, 2026
bf1fc84
Restore zero-overhead auto path
uv-xiao Apr 5, 2026
c4b5fa8
Restore zero-overhead auto scope path
uv-xiao Apr 5, 2026
5b10117
Add manual scope guard regression tests
uv-xiao Apr 5, 2026
0bb19c9
Harden manual scope guard coverage
uv-xiao Apr 5, 2026
58d3c1a
Add manual scope outer-write boundary test
uv-xiao Apr 5, 2026
eed20b5
Support: add partial-manual benchmark selector
uv-xiao Apr 5, 2026
a3b96ba
Update: refresh manual-dep benchmark data
uv-xiao Apr 5, 2026
5787470
Update: cut manual scope overhead
uv-xiao Apr 5, 2026
bd9a760
Fix: stabilize partial-manual paged attention chunking
uv-xiao Apr 5, 2026
988eddc
Fix: restore deferred manual submit path
uv-xiao Apr 5, 2026
a247f59
Update: move manual boundary discovery to submit
uv-xiao Apr 7, 2026
431a9ea
Fix: remove manual scope membership scan
uv-xiao Apr 7, 2026
9f628e8
Update: speed up manual scope edge replay
uv-xiao Apr 7, 2026
c3e5951
Update: streamline partial-manual paged attention
uv-xiao Apr 8, 2026
77f548f
Update: mark partial-manual paged attention boundaries
uv-xiao Apr 8, 2026
c1722c1
Update: refresh manual-dep benchmark findings
uv-xiao Apr 8, 2026
3e34a71
Update: refresh manual-dep benchmark findings
uv-xiao Apr 8, 2026
97e0242
Update: clarify manual-scope dependency model
uv-xiao Apr 8, 2026
e0aa3c4
Update: explain manual-scope design tradeoffs
uv-xiao Apr 8, 2026
29f1eb6
Refactor: remove branch-local unmodified runtime support
uv-xiao Apr 8, 2026
c72fa9d
Fix: auto-pick free NPU for manual-scope tests
uv-xiao Apr 8, 2026
af357ad
Fix: harden manual-scope metadata growth
uv-xiao Apr 8, 2026
abace34
Update: move manual external wiring to submit
uv-xiao Apr 8, 2026
4530e81
Update: collapse manual scope bookkeeping
uv-xiao Apr 8, 2026
9a9974d
Update: refresh manual dependency design note
uv-xiao Apr 8, 2026
aed3526
Update: align manual dependency doc with fresh matrix
uv-xiao Apr 8, 2026
a5332ef
Update: collapse manual scope_end scan
uv-xiao Apr 8, 2026
38bb942
Support: ignore local worktrees
uv-xiao Apr 9, 2026
01d0723
Fix: restore rebased manual benchmark paths
uv-xiao Apr 9, 2026
e5fa1bc
Fix: align rebased unroll partial-manual ABI
uv-xiao Apr 10, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .agents
9 changes: 5 additions & 4 deletions .claude/commands/perf-example-device.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@ Benchmark the hardware performance of a single example at $ARGUMENTS.
Reference `tools/benchmark_rounds.sh` for the full implementation pattern (device log resolution, timing parsing, reporting format). This skill runs the same logic but for a single example only.

1. Verify `$ARGUMENTS` exists and contains `kernels/kernel_config.py` and `golden.py`
2. Check `command -v npu-smi` — if not found, tell the user this requires hardware and stop
3. **Detect platform**: Run `npu-smi info` and parse the chip name. Map `910B`/`910C` → `a2a3`, `950` → `a5`. If unrecognized, warn and default to `a2a3`
4. Find the lowest-ID idle device (HBM-Usage = 0) from the `npu-smi info` output. If none, stop
5. Run the example following the same pattern as `run_bench()` in `tools/benchmark_rounds.sh`:
2. Require the example path to live under `examples/a2a3/` or `examples/a5/`. If it does not, stop and report that root-level `examples/{runtime}/...` paths are invalid.
3. Check `command -v npu-smi` — if not found, tell the user this requires hardware and stop
4. **Detect platform**: Infer the architecture from the example path (`examples/a2a3/...` → `a2a3`, `examples/a5/...` → `a5`). Use `npu-smi info` only as a sanity check; if the detected chip family conflicts with the path, report the mismatch and stop instead of silently switching platforms.
5. Find the lowest-ID idle device (HBM-Usage = 0) from the `npu-smi info` output. If none, stop
6. Run the example following the same pattern as `run_bench()` in `tools/benchmark_rounds.sh`:
- Snapshot logs, run `run_example.py` with `-n 10`, find new log, parse timing, report results
8 changes: 5 additions & 3 deletions .claude/commands/profile.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
Run the example at $ARGUMENTS with profiling enabled on hardware.

1. Verify the directory exists and contains `kernels/kernel_config.py` and `golden.py`
2. Run: `python examples/scripts/run_example.py -k $ARGUMENTS/kernels -g $ARGUMENTS/golden.py -p a2a3 --enable-profiling`
3. If the test passes, report the swimlane output file location in `outputs/`
4. Summarize the task statistics from the console output (per-function timing breakdown)
2. Require the example path to live under `examples/a2a3/` or `examples/a5/`. If it does not, stop and report that root-level `examples/{runtime}/...` paths are invalid.
3. Infer the platform from the example path (`examples/a2a3/...` → `a2a3`, `examples/a5/...` → `a5`).
4. Run: `python examples/scripts/run_example.py -k $ARGUMENTS/kernels -g $ARGUMENTS/golden.py -p <platform> --enable-profiling`
5. If the test passes, report the swimlane output file location in `outputs/`
6. Summarize the task statistics from the console output (per-function timing breakdown)
11 changes: 6 additions & 5 deletions .claude/commands/test-example-device.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
Run the hardware device test for the example at $ARGUMENTS.

1. Verify the directory exists and contains `kernels/kernel_config.py` and `golden.py`
2. Check `command -v npu-smi` — if not found, tell the user to use `/test-example-sim` instead and stop
3. **Detect platform**: Run `npu-smi info` and parse the chip name. Map `910B`/`910C` → `a2a3`, `950` → `a5`. If unrecognized, warn and default to `a2a3`
4. Read `.github/workflows/ci.yml` to extract the current `-c` (pto-isa commit) flag from the `st-onboard-<platform>` job's `./ci.sh` invocation
5. Run: `python examples/scripts/run_example.py -k $ARGUMENTS/kernels -g $ARGUMENTS/golden.py -p <platform> -c <commit>`
6. Report pass/fail status with any error output
2. Require the example path to live under `examples/a2a3/` or `examples/a5/`. If it does not, stop and report that root-level `examples/{runtime}/...` paths are invalid.
3. Check `command -v npu-smi` — if not found, tell the user to use `/test-example-sim` instead and stop
4. **Detect platform**: Infer the architecture from the example path (`examples/a2a3/...` → `a2a3`, `examples/a5/...` → `a5`). Use `npu-smi info` only as a sanity check; if the detected chip family conflicts with the path, report the mismatch and stop instead of silently switching platforms.
5. Read `.github/workflows/ci.yml` to extract the current `-c` (pto-isa commit) flag from the `st-onboard-<platform>` job's `./ci.sh` invocation
6. Run: `python examples/scripts/run_example.py -k $ARGUMENTS/kernels -g $ARGUMENTS/golden.py -p <platform> -c <commit>`
7. Report pass/fail status with any error output
9 changes: 5 additions & 4 deletions .claude/commands/test-example-sim.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
Run the simulation test for the example at $ARGUMENTS.

1. Verify the directory exists and contains `kernels/kernel_config.py` and `golden.py`
2. Read `.github/workflows/ci.yml` to extract the current `-c` (pto-isa commit) flag from the `st-sim-*` jobs' `./ci.sh` invocations
3. **Detect platform**: Infer the architecture from the example path (e.g., `examples/a2a3/...` → `a2a3sim`, `examples/a5/...` → `a5sim`). If the path doesn't contain an arch prefix, default to `a2a3sim`
4. Run: `python examples/scripts/run_example.py -k $ARGUMENTS/kernels -g $ARGUMENTS/golden.py -p <platform> -c <commit>`
5. Report pass/fail status with any error output
2. Require the example path to live under `examples/a2a3/` or `examples/a5/`. If it does not, stop and report that root-level `examples/{runtime}/...` paths are invalid.
3. Read `.github/workflows/ci.yml` to extract the current `-c` (pto-isa commit) flag from the `st-sim-*` jobs' `./ci.sh` invocations
4. **Detect platform**: Infer the architecture from the example path (`examples/a2a3/...` → `a2a3sim`, `examples/a5/...` → `a5sim`).
5. Run: `python examples/scripts/run_example.py -k $ARGUMENTS/kernels -g $ARGUMENTS/golden.py -p <platform> -c <commit>`
6. Report pass/fail status with any error output
5 changes: 5 additions & 0 deletions .claude/rules/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ See [docs/architecture.md](../../docs/architecture.md) for the full diagram, API

## Example / Test Layout

Examples must live under `examples/{arch}/{runtime}/{name}/`. Valid example roots are
`examples/a2a3/` and `examples/a5/`. Paths such as
`examples/host_build_graph/<name>/` or `examples/tensormap_and_ringbuffer/<name>/`
directly under `examples/` are invalid.

```text
my_example/
golden.py # generate_inputs() + compute_golden()
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ venv/
.claude/settings.local.json
.claude/worktrees
.claude/plans
.worktrees/

# Git cloned dependencies (not tracked in repo)
examples/scripts/_deps/
Expand Down
17 changes: 0 additions & 17 deletions AGENTS.md

This file was deleted.

1 change: 1 addition & 0 deletions AGENTS.md
5 changes: 3 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ See [docs/developer-guide.md](docs/developer-guide.md) for full directory struct
| ---- | ----------------- |
| Platform Developer | `src/{arch}/platform/` |
| Runtime Developer | `src/{arch}/runtime/` |
| Codegen Developer | `examples/` |
| Codegen Developer | `examples/{arch}/` |

## Common Commands

Expand All @@ -32,8 +32,9 @@ clang-format -i <file>

## Important Rules

1. **Consult `.claude/rules/` for coding conventions** (architecture, codestyle, terminology) — these are always-loaded guidelines. **Consult `.claude/skills/` for task-specific workflows** (e.g., `git-commit/` when committing, `testing/` when running tests)
1. **Consult `.agents/rules/` for coding conventions** (architecture, codestyle, terminology) — these are always-loaded guidelines. **Consult `.agents/skills/` for task-specific workflows** (e.g., `git-commit/` when committing, `testing/` when running tests)
2. **Do not modify directories outside your assigned area** unless the user explicitly requests it
3. Create new subdirectories under your assigned directory as needed
4. When in doubt, ask the user before making changes to other areas
5. **Avoid including private information in documentation or code** such as usernames, absolute paths with usernames, or other personally identifiable information. Use relative paths or generic placeholders instead
6. **Place examples under `examples/{arch}/{runtime}/{name}/`**. Do not create `examples/{runtime}/...` directly under `examples/`.
4 changes: 3 additions & 1 deletion docs/developer-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,9 @@ When preprocessor guards are used to isolate platform code paths, the `__aarch64

## Example / Test Layout

Every example and device test follows this structure:
Examples must live under `examples/{arch}/{runtime}/{name}/`, and device scenes must
live under `tests/st/{arch}/{runtime}/{name}/`. Every example and device test follows
this structure:

```text
my_example/
Expand Down
Loading
Loading