Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
140 commits
Select commit Hold shift + click to select a range
773110d
[CI] Add test coverage reporting as PR comments
hughperkins Apr 10, 2026
05732ca
[CI] Add diff coverage reporting on PR comments
hughperkins Apr 10, 2026
bd993fa
[CI] Add diff coverage gate at 80% for changed lines
hughperkins Apr 10, 2026
ac86607
[CI] Fix diff-cover: use --format markdown instead of --md-report
hughperkins Apr 10, 2026
580738b
[CI] Fix diff-cover format flags
hughperkins Apr 10, 2026
331b31a
[CI] Exclude JIT-compiled kernel code from coverage
hughperkins Apr 10, 2026
cbe3c72
[Feature] Add kernel code coverage via AST rewriting
hughperkins Apr 10, 2026
dd0e850
[Test] Run kernel coverage e2e tests on both CPU and CUDA
hughperkins Apr 10, 2026
5e13cee
Fix stale coverage field after qd.init() re-initialization
hughperkins Apr 10, 2026
4406489
Fix off-by-one in AST rewriter unit test expectations
hughperkins Apr 10, 2026
caa3ee5
Harvest coverage probes before runtime reset, accumulate across qd.in…
hughperkins Apr 10, 2026
4427f13
Hook into PyQuadrants.clear() to harvest probes before runtime destru…
hughperkins Apr 10, 2026
8c5e615
Write arc data in .coverage.kernel when branch coverage is enabled
hughperkins Apr 10, 2026
b2ec13c
Detect arc mode from .coverage file and delete stale .coverage.kernel
hughperkins Apr 10, 2026
d47004c
Add simt e2e test with block.sync() and subgroup.shuffle
hughperkins Apr 10, 2026
fd27142
Test runtime-branched subgroup shuffle with coverage probes
hughperkins Apr 10, 2026
565da3d
Fix simt test arch filter: use [qd.cuda, qd.vulkan] not [qd.gpu]
hughperkins Apr 10, 2026
eadc831
Fix simt test: use arch=qd.gpu (already a list), not arch=[qd.gpu]
hughperkins Apr 10, 2026
77c3da4
Exempt coverage field from pure kernel violation checks
hughperkins Apr 10, 2026
795b3b5
Merge branch 'hp/kernel-coverage' into hp/pr-coverage-w-kernels
hughperkins Apr 10, 2026
0c4cc67
Enable kernel coverage in CI and merge data across test phases
hughperkins Apr 10, 2026
b480a06
Fix formatting and lint: black, ruff imports, pylint disable
hughperkins Apr 10, 2026
470fb92
Remove accidentally committed diff-cover.html artifact
hughperkins Apr 10, 2026
68c39fb
Add coverage artifacts to .gitignore
hughperkins Apr 10, 2026
98a17d5
Only import _kernel_coverage when QD_KERNEL_COVERAGE=1
hughperkins Apr 10, 2026
d9482bb
Suppress pyright import error for optional coverage dependency
hughperkins Apr 10, 2026
92f00d5
Fix IndexError in AST position reporter for coverage probe nodes
hughperkins Apr 11, 2026
72f01d8
Isolate pull-requests:write permission to coverage-comment job
hughperkins Apr 11, 2026
e83ed6c
Add comment explaining OVERALL coverage extraction
hughperkins Apr 11, 2026
c5d4a18
Cache QD_KERNEL_COVERAGE env var lookup at module load time
hughperkins Apr 11, 2026
3f7b30f
Skip coverage probes for autodiff kernels
hughperkins Apr 11, 2026
54a3293
Skip offline cache tests when QD_KERNEL_COVERAGE=1
hughperkins Apr 11, 2026
c02bace
Run offline cache tests before enabling kernel coverage in CI
hughperkins Apr 11, 2026
46acd3a
Skip concurrent-kernel and src-ll-cache-corruption tests under coverage
hughperkins Apr 11, 2026
5940f9d
Skip test_fe_ll_observations under kernel coverage
hughperkins Apr 11, 2026
81c1a08
Fix kernel coverage SQLite race with pytest-xdist
hughperkins Apr 11, 2026
0409574
Rename kernel coverage files to avoid pytest-cov combine conflict
hughperkins Apr 11, 2026
5016556
Add CUDA coverage to linux.yml CI workflow
hughperkins Apr 11, 2026
fe67ca0
Add --ignore-errors to coverage xml/report for temp file sources
hughperkins Apr 11, 2026
ed4ef06
Fix coverage-comment job: install coverage[toml] and add --ignore-errors
hughperkins Apr 11, 2026
382684f
Include hidden files in coverage artifact uploads
hughperkins Apr 11, 2026
e53aafb
Use pre-generated coverage.xml files instead of re-combining .coverage
hughperkins Apr 11, 2026
b7f9770
Add coverage path mapping for installed package
hughperkins Apr 11, 2026
bb3004b
fix: resolve actual package path for coverage measurement
hughperkins Apr 11, 2026
59d8a59
chore: exclude test files from diff-cover reports
hughperkins Apr 11, 2026
46fb3fe
feat: include test files in coverage measurement
hughperkins Apr 11, 2026
6fbf61a
revert: remove test dir from --cov measurement
hughperkins Apr 11, 2026
73958db
feat: show kernel branch coverage in PR diff report
hughperkins Apr 11, 2026
2ea9d35
chore: remove accidentally committed coverage data file
hughperkins Apr 11, 2026
fa8377c
chore: add _qd_kcov.* to .gitignore
hughperkins Apr 11, 2026
b671ed6
feat: add coverage_report.py for local and CI coverage reports
hughperkins Apr 11, 2026
6a6d61e
refactor: CI and devs share coverage_report.py for test + report
hughperkins Apr 11, 2026
7ef7449
refactor: separate test running from coverage post-processing
hughperkins Apr 11, 2026
1ab9a9c
feat: HTML diff coverage report as default for local dev
hughperkins Apr 11, 2026
948ce37
feat: add -o/--output flag for HTML coverage report path
hughperkins Apr 11, 2026
205e54b
fix: remove blank lines between code lines in HTML coverage report
hughperkins Apr 11, 2026
6327054
fix: post fresh coverage comment instead of sticky update
hughperkins Apr 11, 2026
ce6c367
feat: add collapsible annotated code sections to PR coverage comment
hughperkins Apr 11, 2026
97e5b8f
fix: use green/red circle emoji instead of tick/cross in coverage com…
hughperkins Apr 11, 2026
c811390
fix: default kernel coverage to arc mode when .coverage not yet written
hughperkins Apr 11, 2026
d45071c
feat: include git commit hash in coverage PR comment heading
hughperkins Apr 11, 2026
c855602
fix: correctly detect arc mode when .coverage is missing
hughperkins Apr 11, 2026
44c5b11
debug: add instrumentation to _detect_arc_mode for CI diagnosis
hughperkins Apr 11, 2026
f2b9c87
debug: temporarily restrict CI to test_kernel_coverage for faster ite…
hughperkins Apr 11, 2026
58bfaad
fix: remove debug instrumentation, restore full test suite
hughperkins Apr 11, 2026
1efd3bc
Merge remote-tracking branch 'origin/main' into hp/pr-coverage-w-kernels
hughperkins Apr 12, 2026
bc51d51
docs: add kernel code coverage user guide
hughperkins Apr 12, 2026
16debd6
docs: rewrite kernel coverage guide for library users
hughperkins Apr 12, 2026
6ac266c
docs: reorganize kernel coverage guide sections
hughperkins Apr 12, 2026
9a433e7
docs: add advanced usage section, clarify autodiff limitation
hughperkins Apr 12, 2026
4a8a670
feat: make coverage probe capacity configurable via QD_COVERAGE_MAX_P…
hughperkins Apr 12, 2026
24d608c
fix: guard against probe capacity overflow with warning
hughperkins Apr 12, 2026
4f3254d
docs: expand autodiff coverage section with concrete examples
hughperkins Apr 12, 2026
54db562
docs: trim edge case explanation in autodiff section
hughperkins Apr 12, 2026
683ff9b
test: add coverage tests for reinit survival, autodiff, and env var
hughperkins Apr 12, 2026
42c88da
fix: add debug logging to silent excepts, generalize pure-check exemp…
hughperkins Apr 12, 2026
52decab
style: fix import conventions and add missing type annotations
hughperkins Apr 12, 2026
4961717
fix: upgrade harvest failure log to warning, remove redundant copy, t…
hughperkins Apr 12, 2026
b187b37
refactor: replace report functions with renderer class hierarchy
hughperkins Apr 12, 2026
6221120
fix: restore annotated renderer to print grouped summary then annotat…
hughperkins Apr 12, 2026
c68e312
fix: reinit with same arch in test_kernel_coverage_survives_reinit
hughperkins Apr 12, 2026
e8c4200
fix: use stable directory for coverage files instead of assuming CWD
hughperkins Apr 12, 2026
fe0222f
fix: emit minimal entry/exit arcs instead of fabricated inter-line arcs
hughperkins Apr 12, 2026
e8cbf37
test: add coverage tests for while/with/try, dedup, func, multi-kernel
hughperkins Apr 12, 2026
ceb6805
refactor: consolidate kernel coverage tests
hughperkins Apr 12, 2026
c968085
style: rewrap comments and docstrings to 120 columns
hughperkins Apr 12, 2026
cb088dc
style: fix black formatting, ruff import sorting, pylint no-else-return
hughperkins Apr 12, 2026
f8fb179
Fix two CI test failures in kernel coverage tests
hughperkins Apr 13, 2026
436d992
Fix CI: remove flaky autodiff probe assertion, fix pyright type error
hughperkins Apr 13, 2026
ca918ee
Fix pyright: suppress type mismatch on qd.field() assignment
hughperkins Apr 13, 2026
7c2e1b3
Merge branch 'main' into hp/pr-coverage-w-kernels
hughperkins Apr 13, 2026
d1ce0bf
docs: mention pytest-cov in kernel coverage intro
hughperkins Apr 15, 2026
890f9dd
feat: auto-enable kernel coverage when pytest-cov is active
hughperkins Apr 16, 2026
82243d8
docs: remove example section from kernel coverage guide
hughperkins Apr 16, 2026
3f305d4
docs: clarify branch coverage description
hughperkins Apr 16, 2026
c79b862
docs: move offline cache interaction out of limitations section
hughperkins Apr 16, 2026
e89b53b
docs: simplify autodiff coverage section
hughperkins Apr 16, 2026
392c1ba
docs: simplify autodiff edge case wording
hughperkins Apr 16, 2026
5d20c67
docs: fix autodiff edge case to be per-call, not per-kernel
hughperkins Apr 16, 2026
17a467d
fix: eagerly allocate coverage field in qd.init() for thread safety
hughperkins Apr 16, 2026
2d4d966
Merge remote-tracking branch 'origin/main' into hp/pr-coverage-w-kernels
hughperkins Apr 17, 2026
76678c6
fix(test): filter pytest artifacts from API test assertions
hughperkins Apr 17, 2026
4d0e423
fix(ci): generate coverage artifacts even when tests fail
hughperkins Apr 17, 2026
6315287
fix(test): revert clock accuracy iteration count to 200000
hughperkins Apr 17, 2026
03aac48
fix(ci): disable kernel coverage for torch tests to fix DLPack byte_o…
hughperkins Apr 17, 2026
08637e1
fix(test): skip snode layout offset test when kernel coverage is enabled
hughperkins Apr 17, 2026
b7c6ead
Merge branch 'main' into hp/pr-coverage-w-kernels
hughperkins Apr 17, 2026
9237abe
fix(ci): disable kernel coverage for CUDA tests to fix DLPack field f…
hughperkins Apr 17, 2026
570aa65
fix(test): widen clock accuracy tolerance to ±2 for CI GPU jitter
hughperkins Apr 17, 2026
f773b39
Revert "fix(test): widen clock accuracy tolerance to ±2 for CI GPU ji…
hughperkins Apr 17, 2026
f37d280
fix(test): widen clock accuracy tolerance to ±2 for CI GPU jitter
hughperkins Apr 17, 2026
c3a8c75
Merge origin/main into hp/pr-coverage-w-kernels, resolve conflict in …
hughperkins Apr 17, 2026
baffc00
fix: check QD_KERNEL_COVERAGE at call time instead of module load
hughperkins Apr 17, 2026
cbe01ec
fix(ci): add phase 3 to run coverage-skipped tests with QD_KERNEL_COV…
hughperkins Apr 17, 2026
2ef75b5
fix: enforce 80% diff coverage gate in coverage_report.py
hughperkins Apr 17, 2026
b38b311
fix: correct inverted fallback condition in combine_coverage()
hughperkins Apr 17, 2026
932bd9a
fix: add proper locking to _harvest_field()
hughperkins Apr 17, 2026
e951652
fix: snapshot _accumulated_lines under lock in flush()
hughperkins Apr 17, 2026
01150ba
fix: move ensure_field_allocated() after CWD restore in init()
hughperkins Apr 17, 2026
76f2f53
fix: auto-enable --cov-branch when kernel coverage is active
hughperkins Apr 17, 2026
ccf2dad
style: fix black formatting for _kernel_coverage_enabled()
hughperkins Apr 17, 2026
79f29b5
fix: acquire _lock in get_field() to prevent TOCTOU race
hughperkins Apr 17, 2026
edd8e92
fix(ci): ensure coverage comment is posted even when below 80% gate
hughperkins Apr 17, 2026
dbef8ad
Merge branch 'main' into hp/pr-coverage-w-kernels
hughperkins Apr 18, 2026
8bc367e
ci: retrigger CI after runner timeout flake
hughperkins Apr 18, 2026
ef2f2bd
docs: note that CI posts new coverage comments (not edit-last) for ch…
hughperkins Apr 23, 2026
484089d
fix(ci): add continue-on-error to CPU coverage download and guard XML…
hughperkins Apr 23, 2026
2c21f36
fix: skip '\ No newline at end of file' marker in diff parser
hughperkins Apr 23, 2026
572fcc8
Merge branch 'main' into hp/pr-coverage-w-kernels
hughperkins Apr 23, 2026
352b029
fix(ci): run kernel coverage tests on CUDA with QD_KERNEL_COVERAGE=1
hughperkins Apr 23, 2026
d6d50e2
Fix CUDA CI: pass short filename to run_tests.py
hughperkins Apr 23, 2026
d3e4bad
fix: include QD_KERNEL_COVERAGE in fastcache key
hughperkins Apr 23, 2026
50b711d
fix: communicate arc/line mode from pytest plugin via env var
hughperkins Apr 23, 2026
245de20
Merge origin/main into hp/pr-coverage-w-kernels (resolve conflict in …
hughperkins Apr 23, 2026
b80cb55
Unwrap hard-wrapped lines in kernel_coverage.md
hughperkins Apr 24, 2026
f4052b1
Merge branch 'main' into hp/pr-coverage-w-kernels
hughperkins Apr 24, 2026
3c3191a
Rewrap code comments at 120 chars instead of ~80
hughperkins Apr 24, 2026
920b379
Revert test_intrinsics.py changes to match origin/main
hughperkins Apr 24, 2026
fa460fe
Merge branch 'main' into hp/pr-coverage-w-kernels
hughperkins Apr 24, 2026
786e1dc
Merge branch 'main' into hp/pr-coverage-w-kernels
hughperkins Apr 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 110 additions & 0 deletions .github/workflows/linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ on:
- reopened
- synchronize
workflow_dispatch:
permissions:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block can be removed now right?

contents: read
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref }}
cancel-in-progress: true
Expand All @@ -21,6 +23,8 @@ jobs:
runs-on: ${{ matrix.OS }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we changing this?

- name: Python check
uses: actions/setup-python@v4
with:
Expand All @@ -37,3 +41,109 @@ jobs:
- name: Linux test
run: |
bash .github/workflows/scripts_new/linux/4_test.sh
- name: Upload wheel for CUDA job
if: always()
uses: actions/upload-artifact@v4
with:
name: linux-wheel
path: dist/*.whl
- name: Upload CPU coverage data
if: always()
uses: actions/upload-artifact@v4
with:
name: coverage-cpu
path: |
coverage.xml
pytest-coverage.txt

test-cuda:
name: Linux CUDA Test
needs: build
runs-on: gpu-t4-4-core
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Python check
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install CUDA libraries
run: |
sudo apt-get install -y libcusolver-dev-12-8 libcusolver-12-8 libcusparse-dev-12-8 libcusparse-12-8 libnvjitlink-12-8 libcublas-12-8
echo "/usr/local/cuda/targets/x86_64-linux/lib" | sudo tee /etc/ld.so.conf.d/cuda-targets.conf
sudo ldconfig
- name: Download wheel
uses: actions/download-artifact@v4
with:
name: linux-wheel
- name: Install quadrants
run: |
set -x
mkdir -p dist
mv *.whl dist/
pip install dist/*.whl
- name: Install test requirements
run: |
pip install --group test
pip install -r requirements_test_xdist.txt
- name: Run CUDA tests with coverage
run: |
bash .github/workflows/scripts_new/linux/4_test_cuda.sh
- name: Upload CUDA coverage data
if: always()
uses: actions/upload-artifact@v4
with:
name: coverage-cuda
Comment thread
claude[bot] marked this conversation as resolved.
path: coverage.xml

coverage-comment:
Comment thread
claude[bot] marked this conversation as resolved.
if: github.event_name == 'pull_request' && always()
needs: [build, test-cuda]
runs-on: ubuntu-latest
permissions:
pull-requests: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Python check
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Download CPU coverage
uses: actions/download-artifact@v4
continue-on-error: true
with:
name: coverage-cpu
path: coverage-cpu
- name: Download CUDA coverage
uses: actions/download-artifact@v4
continue-on-error: true
Comment thread
claude[bot] marked this conversation as resolved.
with:
name: coverage-cuda
path: coverage-cuda
- name: Generate coverage report
continue-on-error: true
run: |
COV_XMLS=""
if [ -f coverage-cpu/coverage.xml ]; then
COV_XMLS="coverage-cpu/coverage.xml"
fi
if [ -f coverage-cuda/coverage.xml ]; then
COV_XMLS="$COV_XMLS coverage-cuda/coverage.xml"
fi
if [ -z "$COV_XMLS" ]; then
echo "No coverage XML files found, skipping report"
exit 0
fi

python tests/coverage_report.py --report-only \
--compare-branch=origin/${{ github.base_ref }} \
--coverage-xml $COV_XMLS \
--format markdown > coverage-comment.md
- name: Post coverage comment
if: always() && hashFiles('coverage-comment.md') != ''
run: gh pr comment ${{ github.event.pull_request.number }} --body-file coverage-comment.md
env:
GH_TOKEN: ${{ github.token }}
Comment thread
claude[bot] marked this conversation as resolved.
Comment thread
claude[bot] marked this conversation as resolved.
15 changes: 12 additions & 3 deletions .github/workflows/scripts_new/linux/4_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,18 @@ pip install -r requirements_test_xdist.txt
export QD_LIB_DIR="$(python -c 'import quadrants as ti; print(ti.__path__[0])' | tail -n 1)/_lib/runtime"
./build/quadrants_cpp_tests --gtest_filter=-AMDGPU.*

TEST_EXIT=0

# Phase 1: run all tests except torch-dependent ones
python tests/run_tests.py -v -r 1 -m "not needs_torch"
python tests/run_tests.py -v -r 3 --coverage -m "not needs_torch" || TEST_EXIT=$?

# Phase 2: install torch, run only torch tests
pip install torch --index-url https://download.pytorch.org/whl/cpu
python tests/run_tests.py -v -r 1 -m needs_torch
QD_KERNEL_COVERAGE=0 python tests/run_tests.py -v -r 3 --coverage --cov-append -m needs_torch || TEST_EXIT=$?

# Phase 3: run tests that are skipped under kernel coverage (offline cache, snode layout, FE-LL observations,
# etc.) without --coverage so QD_KERNEL_COVERAGE stays 0.
QD_KERNEL_COVERAGE=0 python tests/run_tests.py -v -r 3 -m "not needs_torch" || TEST_EXIT=$?

python tests/coverage_report.py --collect-only

exit $TEST_EXIT
20 changes: 20 additions & 0 deletions .github/workflows/scripts_new/linux/4_test_cuda.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash

set -ex

TEST_EXIT=0

# Disable kernel-level coverage on CUDA: it changes field memory layout and breaks dlpack tests
# (ValueError: Expected zero byte_offset). Python code coverage (--cov) still runs.
QD_KERNEL_COVERAGE=0 python tests/run_tests.py -v -r 1 --arch cuda --coverage -m "not needs_torch" || TEST_EXIT=$?

pip install torch --index-url https://download.pytorch.org/whl/cu128
QD_KERNEL_COVERAGE=0 python tests/run_tests.py -v -r 1 --arch cuda --coverage --cov-append -m needs_torch || TEST_EXIT=$?
Comment thread
claude[bot] marked this conversation as resolved.

# Run kernel coverage tests on CUDA with coverage enabled — these are skipped by the phases above
# (QD_KERNEL_COVERAGE=0) and include GPU-only tests like test_kernel_coverage_simt_e2e.
QD_KERNEL_COVERAGE=1 python tests/run_tests.py -v -r 1 --arch cuda --coverage --cov-append test_kernel_coverage.py || TEST_EXIT=$?

python tests/coverage_report.py --collect-only

exit $TEST_EXIT
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,13 @@ __pycache__
/python/test_env
/CHANGELOG.md
/.coverage
/.coverage.*
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

_qd_kcov.*
/coverage.xml
/coverage-report.html
/htmlcov
/diff-cover.*
/pytest-coverage.txt
libpython_path.txt
.vscode
_build
Expand Down
8 changes: 8 additions & 0 deletions docs/source/user_guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,14 @@ graph
perf_dispatch
```

```{toctree}
:caption: Testing
:maxdepth: 1
:titlesonly:

kernel_coverage
```

```{toctree}
:caption: Reference
:maxdepth: 1
Expand Down
103 changes: 103 additions & 0 deletions docs/source/user_guide/kernel_coverage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Kernel code coverage

Standard Python coverage tools only measure host-side code. Quadrants kernel coverage goes further — it tracks which lines actually execute *inside* compiled kernels on the device (CPU or GPU), including which branches of `if`/`else` blocks are taken at runtime.

The coverage data is written in the standard `coverage.py` format, so it works with `coverage report`, `pytest-cov`, `diff-cover`, and IDE coverage viewers out of the box.

## Prerequisites

Kernel coverage requires the `coverage` Python package:

```bash
pip install coverage
```

## Enabling kernel coverage

### Automatic with pytest-cov

If you use `pytest-cov`, kernel coverage is enabled automatically — no configuration needed. Quadrants ships a pytest plugin that detects `--cov` and sets `QD_KERNEL_COVERAGE=1` for you. Just run:

```bash
pytest --cov=my_package --cov-branch tests/
```

To disable kernel coverage while still collecting Python coverage, opt out explicitly:

```bash
QD_KERNEL_COVERAGE=0 pytest --cov=my_package --cov-branch tests/
```

### Manual with any script

For scripts outside pytest, set the `QD_KERNEL_COVERAGE` environment variable:

```bash
QD_KERNEL_COVERAGE=1 python my_simulation.py
```

This works with any script that uses quadrants kernels — no changes to your code are needed.

When the process exits, quadrants writes one or more `_qd_kcov.<pid>` files in the working directory containing the collected coverage data.

## Viewing results

### With coverage.py

Combine the kernel coverage files and produce a report using the standard `coverage` tool:

```bash
# Combine all kernel coverage files into .coverage
coverage combine _qd_kcov.*

# Terminal summary
coverage report --show-missing

# HTML report
coverage html
```

### With pytest-cov

When using `pytest-cov`, kernel coverage is enabled automatically (see above). The kernel coverage data is merged with Python coverage after the run:

```bash
coverage combine _qd_kcov.* .coverage
```

## Key properties

- **Zero overhead when disabled.** The coverage module is never imported unless `QD_KERNEL_COVERAGE=1` is set. There is no cost in normal operation.
- **Branch coverage.** Probes inside `if`/`else` bodies only fire when that branch is taken, giving true runtime branch coverage — not just kernel-level coverage, or static conditional coverage.
- **Works with pytest-xdist.** Each worker writes to a separate file; combine them afterward.
- **Survives `qd.init()` resets.** Coverage data is accumulated across multiple `qd.init()` calls within the same process.

## Advanced usage

### Probe capacity

There is a limit of 100,000 coverage probes per process (one probe per unique source line per kernel/func). If you hit the limit — for example in a very large codebase with many kernels — increase it via the environment variable:

```bash
QD_COVERAGE_MAX_PROBES=500000 QD_KERNEL_COVERAGE=1 python my_simulation.py
```

## Coverage and autodiff

The forward pass is covered. The backward pass is not, because instrumenting it would interfere with gradient computation. This is normally fine — the backward pass is auto-generated and replays the same control flow, so forward coverage is sufficient.

One edge case: kernel calls inside a `qd.ad.Tape` with `validation=True` will not be covered.

## Offline cache interaction

Coverage probes change the compiled kernel, so the offline cache will see them as new kernels and recompile. This is expected and does not affect correctness, but the first run with coverage enabled will be slower if you normally rely on cached kernels.

## CI integration

The CI workflow posts a diff coverage report as a PR comment on each push. A **new comment** is created each time (rather than editing the previous one) so that the PR timeline shows a clear chronological sequence of commits and their corresponding coverage results.

## Under the hood

When `QD_KERNEL_COVERAGE=1` is set, quadrants rewrites the Python AST of each `@qd.kernel` and `@qd.func` before compilation. It inserts lightweight probe statements (`field[probe_id] = 1`) at each source line. These probes compile as ordinary field stores and execute on the device alongside your kernel code.

At process exit, the probe data is read back from the device and written to a `.coverage`-compatible file.
11 changes: 11 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,9 @@ test = [
"pyright",
]

[project.entry-points.pytest11]
quadrants = "quadrants.pytest_plugin"

[project.urls]
Homepage = "https://github.com/Genesis-Embodied-AI/quadrants"

Expand All @@ -120,6 +123,14 @@ requires = [
# things, without doing full c++ build
build-backend = "setuptools.build_meta"

[tool.coverage.paths]
source = [
"python/quadrants",
"*/site-packages/quadrants",
]

[tool.coverage.report]

[tool.pytest.ini_options]
filterwarnings = [
"ignore:Calling non-taichi function",
Expand Down
2 changes: 2 additions & 0 deletions python/quadrants/lang/_fast_caching/src_hasher.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import json
import os
import warnings
from typing import Any, Iterable, Sequence

Expand Down Expand Up @@ -49,6 +50,7 @@ def create_cache_key(
kernel_source_info.filepath,
str(kernel_source_info.start_lineno),
"pruned",
"kcov" if os.environ.get("QD_KERNEL_COVERAGE") == "1" else "",
)
)
return cache_key
Expand Down
18 changes: 18 additions & 0 deletions python/quadrants/lang/_func_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import ast
import inspect
import math
import os
import sys
import textwrap
import types
Expand All @@ -21,6 +22,11 @@

import numpy as np


def _kernel_coverage_enabled() -> bool:
return os.environ.get("QD_KERNEL_COVERAGE") == "1"


from quadrants._lib import core as _qd_core
from quadrants._lib.core.quadrants_python import KernelLaunchContext
from quadrants.lang import _kernel_impl_dataclass, impl
Expand Down Expand Up @@ -246,9 +252,21 @@ def get_tree_and_ctx(

autodiff_mode = current_kernel.autodiff_mode

_kcov = None
if _kernel_coverage_enabled() and autodiff_mode == _qd_core.AutodiffMode.NONE:
from . import ( # pylint: disable=import-outside-toplevel
_kernel_coverage as _kcov,
)

tree = _kcov.rewrite_ast(tree, function_source_info.filepath, function_source_info.start_lineno)

quadrants_callable = current_kernel.quadrants_callable
is_pure = quadrants_callable is not None and quadrants_callable.is_pure
global_vars = self._get_global_vars(self.func)
if _kcov is not None:
cov_field = _kcov.get_field()
if cov_field is not None:
global_vars[_kcov.FIELD_VAR_NAME] = cov_field
Comment on lines +255 to +269
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 A TOCTOU race condition exists between rewrite_ast() and get_field() in get_tree_and_ctx() (_func_base.py lines 251–262): rewrite_ast() injects _qd_cov[probe_id] = 1 nodes into the kernel AST and releases _lock, but a concurrent qd.reset() can null _cov_field before get_field() acquires _lock, causing the global_vars injection to be skipped while the AST already references _qd_cov — producing NameError: name '_qd_cov' is not defined at kernel compilation time. To fix, hold _lock across both rewrite_ast() and the subsequent get_field() + global_vars update as one atomic operation.

Extended reasoning...

What the bug is and how it manifests

In get_tree_and_ctx() (python/quadrants/lang/_func_base.py, lines 248–262), kernel coverage support is applied in two separate steps guarded by two independent lock acquisitions. First, _kcov.rewrite_ast() is called: it acquires _lock, rewrites the kernel AST to insert _qd_cov[probe_id] = 1 store-nodes, then releases _lock before returning the modified tree. Later, _kcov.get_field() is called in a separate with _lock: block. If get_field() returns None, the if cov_field is not None: guard on line 261 skips adding _qd_cov to global_vars. The compiled kernel then references _qd_cov (injected by rewrite_ast()) without the field being defined in scope, raising NameError: name '_qd_cov' is not defined.

The specific code path that triggers it

get_field() returns None whenever _cov_field_prog is not impl.get_runtime()._prog. This condition is set by _harvest_field(), which is invoked via the _hooked_clear() hook during qd.reset() or qd.init(). _harvest_field() acquires _lock, sets _cov_field_prog = None, nullifies _cov_field, then releases _lock. Critically, qd.reset() does not acquire compilation_lock, so Thread B calling qd.reset() can race freely with Thread A in the middle of get_tree_and_ctx().

Step-by-step proof of the race

  1. Thread A calls a kernel function; get_tree_and_ctx() enters.
  2. Thread A calls _kcov.rewrite_ast(): acquires _lock, inserts _qd_cov[0] = 1 into the AST, releases _lock, returns the modified tree.
  3. Race window opens — Thread A has not yet called get_field().
  4. Thread B calls qd.reset()_hooked_clear()_harvest_field(): acquires _lock, sets _cov_field_prog = None, clears _cov_field, releases _lock.
  5. Thread A calls _kcov.get_field(): acquires _lock, sees _cov_field_prog is None ≠ current prog, returns None, releases _lock.
  6. Thread A's guard if cov_field is not None: is false → global_vars[FIELD_VAR_NAME] is never set.
  7. Thread A proceeds to compile the already-rewritten AST, which references _qd_covNameError at compilation.

Why existing code does not prevent it

The module-level _lock in _kernel_coverage.py correctly serializes individual calls to rewrite_ast(), get_field(), and _harvest_field(), but it does not hold across the gap between these calls. The compilation_lock in impl.py is an RLock that serializes concurrent kernel compilations against each other, but qd.reset() never touches compilation_lock, so it cannot prevent _harvest_field() from running between rewrite_ast() and get_field(). There is no mechanism that atomically ties AST rewriting to the subsequent field injection.

Impact

Any multithreaded program that compiles kernels on one thread while calling qd.reset() or qd.init() on another — a natural pattern when resetting between benchmark iterations or test cases — can hit this race. The failure is a hard NameError at kernel compilation rather than a silent data-quality issue, so it will surface as a crash. Because the timing window is short (between two function calls), the race is non-deterministic and may be hard to reproduce reliably.

How to fix it

Extend _kernel_coverage.py's API with a combined rewrite_ast_and_get_field() function (or expose a context-manager that holds _lock for the caller) that performs AST rewriting and returns the coverage field in a single critical section. Alternatively, in _func_base.py, refactor so that the single lock acquisition covers both the rewrite_ast() mutation and the subsequent global_vars injection, ensuring the AST and the global_vars dictionary are always consistent.


template_vars = {}
if is_kernel or is_real_function:
Expand Down
Loading
Loading