Skip to content

Infrastructure to test DaCe's codegen (in)deterministic behavior#2590

Open
kotsaloscv wants to merge 11 commits into
mainfrom
dace_deterministic_codegen_test
Open

Infrastructure to test DaCe's codegen (in)deterministic behavior#2590
kotsaloscv wants to merge 11 commits into
mainfrom
dace_deterministic_codegen_test

Conversation

@kotsaloscv
Copy link
Copy Markdown
Contributor

@kotsaloscv kotsaloscv commented May 3, 2026

Adds infrastructure to detect non-determinism in gt4py's DaCe codegen. Two new parametrized nox sessions run gt4py's own DaCe test selection twice with isolated build caches, then byte-compare the generated sources under each program's src/. The session is successful when the codegen is identical between the two runs, fails on any diff. Supports the dace_cpu, dace_gpu (CUDA), and HIP backends. Driven entirely from gt4py's own nox sessions — no external repo needed.

Layout

  • noxfile.py — two new sessions, test_cartesian_determinism and test_next_determinism, parametrized over device (plus meshlib for test_next). Each calls a shared _run_dace_determinism_check helper that runs gt4py's regular DaCe pytest selection twice with GT4PY_BUILD_CACHE_DIR pointed at isolated workdirs, then invokes the comparator. Tagged dace + determinism, so nox -t determinism runs every variant.
  • scripts/dace_deterministic_codegen.py — pure-stdlib comparison library + thin argparse CLI. Snapshots each cache, compares <program>/src/{cpu,cuda}/... byte-for-byte (HIP files under src/cuda/hip/ covered transparently). Distinct exception types (and CLI exit codes) for actual mismatch, unsupported backend, no programs cached, and source-not-retained — each with an actionable diagnostic. Reusable for ad-hoc comparison of any two existing caches.
  • ci/cscs-ci-dace-determinism.yml — CI driver, included from cscs-ci.yml.

DaCe build folder mode

The check requires DACE_compiler_build_folder_mode=development. gt4py configures DaCe to production by default, which strips the generated src/ tree after compilation — leaving nothing to compare. The nox sessions set this env var automatically; the comparator raises NoSourceFilesObservedError with an env-var hint if it ever encounters caches built without it.

CI

New dace-determinism stage in ci/cscs-ci-dace-determinism.yml, included from cscs-ci.yml.

Local use

./noxfile.py -s 'test_next_determinism-3.10(cpu, nomesh)'   # one variant
nox -t determinism                                          # all variants
python scripts/dace_deterministic_codegen.py --run1 PATH --run2 PATH

@kotsaloscv kotsaloscv self-assigned this May 3, 2026
@kotsaloscv kotsaloscv marked this pull request as ready for review May 8, 2026 08:22
@kotsaloscv kotsaloscv requested a review from tehrengruber May 8, 2026 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant