From 92ba30bee519bc738a49fce0f337fa9e510f4eb8 Mon Sep 17 00:00:00 2001 From: Geoffroy Lesur Date: Sat, 18 Oct 2025 16:27:02 +0200 Subject: [PATCH 1/3] add documentation for CI add fix proposed to issue with mpi IO in FAQs --- doc/source/faq.rst | 9 ++ doc/source/index.rst | 1 + doc/source/testing.rst | 117 +++++++++++++++++++++ doc/source/testing/idfxTest.rst | 173 ++++++++++++++++++++++++++++++++ 4 files changed, 300 insertions(+) create mode 100644 doc/source/testing.rst create mode 100644 doc/source/testing/idfxTest.rst diff --git a/doc/source/faq.rst b/doc/source/faq.rst index 4c7bc056..198d7f54 100644 --- a/doc/source/faq.rst +++ b/doc/source/faq.rst @@ -61,6 +61,15 @@ How can I stop the code without loosing the current calculation? I'm doing performance measures. How do I disable all outputs in *Idefix*? Add ``-nowrite`` when you call *Idefix* executable. +VTK output appears corrupted when running with MPI (OpenMPI) + Some OpenMPI configurations (notably OpenMPI 4 with the ompio component) can produce corrupted VTK/VTU output when running with MPI enabled. This appears to be caused by bugs in OpenMPI's ompio I/O component. + Disable ompio so OpenMPI falls back to ROMIO (MPICH's MPI-IO), which is typically more stable: + + .. code-block:: console + + mpirun --mca io ^ompio ... + + This has resolved intermittent corruption for several users. See issue #348 for discussion and reports. Developement ------------ diff --git a/doc/source/index.rst b/doc/source/index.rst index 411d73d6..d36ab39b 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -122,6 +122,7 @@ The Idefix collaboration benefited from funding from the “Programme National d reference modules programmingguide + testing performances kokkos contributing diff --git a/doc/source/testing.rst b/doc/source/testing.rst new file mode 100644 index 00000000..44ff88d0 --- /dev/null +++ b/doc/source/testing.rst @@ -0,0 +1,117 @@ +Continuous Integration (CI) tests +================================ + +This document describes the GitHub Actions continuous-integration setup used to run the Idefix +test-suite. The CI is implemented by two workflows checked in .github/workflows: + +- .github/workflows/idefix-ci.yml +- .github/workflows/idefix-ci-jobs.yml + +Overview +-------- + +The CI is split in two layers: + +- A top-level workflow (.github/workflows/idefix-ci.yml) that: + + - runs a Linter job (pre-commit) on push / PR / manual dispatch, + - then calls a reusable workflow for different compiler/backends (intel, gcc, cuda) + providing two inputs: TESTME_OPTIONS and IDEFIX_COMPILER. + +- A reusable workflow (.github/workflows/idefix-ci-jobs.yml) that: + + - defines the actual test jobs grouped by physics domain (ShocksHydro, ParabolicHydro, + ShocksMHD, ParabolicMHD, Fargo, ShearingBox, SelfGravity, Planet, Dust, Braginskii, + Examples, Utils), + - runs test scripts on self-hosted runners, + - expects the repository to be checked out with submodules, + - invokes the repository-provided CI helper scripts to configure / build / run tests. + +Key configuration points +------------------------ + +- Inputs passed from the top-level workflow: + + - TESTME_OPTIONS (string): flags forwarded to the per-test runner (examples: -cuda, -Werror, + -intel, -all). + - IDEFIX_COMPILER (string): which compiler the tests should use (e.g. icc, gcc, nvcc). + +- Environment variables set by the reusable workflow: + + - IDEFIX_COMPILER, TESTME_OPTIONS, PYTHONPATH, IDEFIX_DIR + +- Linter job: + + - Runs only when repository is the main project (not arbitrary forks). + - Uses actions/setup-python and runs pre-commit (pre-commit/action@v3 and pre-commit-ci/lite). + - Prevents regressions in style and common mistakes before running heavy test jobs. + +- Test execution: + + - All test jobs call the repository script scripts/ci/run-tests with a test directory + and the TESTME_OPTIONS flags. Example invocation (from the workflows): + scripts/ci/run-tests $IDEFIX_DIR/test/HD/sod -all $TESTME_OPTIONS + + - The reusable workflow is written to execute many test directories in separate job steps, + so each physics group is kept logically separated in CI logs. + +Runners and prerequisites +------------------------- + +- The heavy numerical tests run on self-hosted runners (see runs-on: self-hosted). + The CI assumes appropriate hardware and dependencies are available on those runners + (compilers, MPI, GPUs when CUDA/HIP flags are used, required system libraries). + +- The workflows check out the repository and its submodules. Submodules must be available + on the CI machines. + +How tests are driven (testme scripts) +------------------------------------- + +Each test directory contains a small Python "testMe" driver that uses the helper Python +class documented in the repository: + +- See the test helper documentation: :doc:`idfxTest ` + +That helper (idfxTest) is responsible for: + +- parsing TESTME_OPTIONS-like flags (precision, MPI, CUDA, reconstruction, vector potential, etc.), +- calling configure / compile / run, +- performing standard python checks and non-regression (RMSE) comparisons against + reference dumps, +- optionally creating / updating reference dumps (init mode). + +Practical examples +------------------ + +- Example of a CI invocation (triggered by workflows): + + - Top-level workflow calls the reusable jobs workflow for each compiler/back-end, e.g. + TESTME_OPTIONS="-cuda -Werror" IDEFIX_COMPILER=nvcc + +- Running tests locally (developer machine) + - You can mimic what CI does by calling the repository helper script directly. Example: + scripts/ci/run-tests /path/to/idefix/test/HD/sod -all -mpi -dec 2 2 -reconstruction 3 -single + +Notes for maintainers +--------------------- + +- The reusable jobs workflow contains a commented concurrency block for optional cancellation + of in-flight runs — consider enabling it if you want to auto-cancel redundant CI runs. +- Because tests are run on self-hosted runners, ensure the pools have the required compilers, + MPI stacks and GPU drivers for the requested TESTME_OPTIONS. +- Keep TESTME_OPTIONS in sync with the options understood by the test helper documented in + :doc:`idfxTest `. + +Relevant files +-------------- + +- Workflow entry point: .github/workflows/idefix-ci.yml +- Reusable jobs: .github/workflows/idefix-ci-jobs.yml +- Test helper documentation: :doc:`idfxTest ` + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + testing/idfxTest.rst \ No newline at end of file diff --git a/doc/source/testing/idfxTest.rst b/doc/source/testing/idfxTest.rst new file mode 100644 index 00000000..93171102 --- /dev/null +++ b/doc/source/testing/idfxTest.rst @@ -0,0 +1,173 @@ +========= +idfxTest +========= + +.. autoclass:: idfxTest + :members: + :undoc-members: + :show-inheritance: + +Overview +-------- + +The ``idfxTest`` class provides a high-level interface for automating the configuration, compilation, execution, and regression testing of Idefix simulations. It is designed to be used in test scripts (such as ``testme.py``) to streamline the testing workflow, including handling reference files and plotting differences. + +Constructor and Command-Line Options +------------------------------------ + +The constructor parses command-line arguments using ``argparse``. These options can be passed directly to the test script or via the command line. The following options are available: + +.. list-table:: + :header-rows: 1 + + * - Option + - Attribute + - Description + * - ``-noplot`` + - ``noplot`` + - Disable plotting in standard tests (default: True). + * - ``-ploterr`` + - ``ploterr`` + - Enable plotting of differences when regression tests fail. + * - ``-cmake OPT [OPT ...]`` + - ``cmake`` + - Extra CMake options (list of strings). + * - ``-definitions FILE`` + - ``definitions`` + - Specify a custom ``definitions.hpp`` file. + * - ``-dec NX NY NZ`` + - ``dec`` + - MPI domain decomposition (list of integers). + * - ``-check`` + - ``check`` + - Only perform regression tests without compilation. + * - ``-cuda`` + - ``cuda`` + - Enable CUDA backend for Nvidia GPUs. + * - ``-intel`` + - ``intel`` + - Use Intel OneAPI compilers. + * - ``-hip`` + - ``hip`` + - Enable HIP backend for AMD GPUs. + * - ``-single`` + - ``single`` + - Enable single precision. + * - ``-vectPot`` + - ``vectPot`` + - Enable vector potential formulation. + * - ``-reconstruction N`` + - ``reconstruction`` + - Set reconstruction scheme (2=PLM, 3=LimO3, 4=PPM). + * - ``-idefixDir PATH`` + - ``idefixDir`` + - Set the directory for Idefix source files (default: ``$IDEFIX_DIR``). + * - ``-mpi`` + - ``mpi`` + - Enable MPI parallelism. + * - ``-all`` + - ``all`` + - Run the full test suite with multiple configurations. + * - ``-init`` + - ``init`` + - Reinitialize reference files for non-regression tests. + * - ``-Werror`` + - ``Werror`` + - Treat compiler warnings as errors. + +Main Methods +------------ + +.. list-table:: + :header-rows: 1 + + * - Method + - Description + * - ``configure`` + - Runs CMake to configure the build system for Idefix, using options set by command-line flags (e.g., precision, MPI, CUDA, etc.). + * - ``compile`` + - Compiles the Idefix code using ``make`` with the specified number of parallel jobs. + * - ``run`` + - Executes the Idefix binary, optionally with MPI, using the provided input file and runtime options. + * - ``checkOnly`` + - Performs regression testing only, without compiling or running the code (useful for checking outputs after a manual run). + * - ``standardTest`` + - Runs any Python-based standard tests (e.g., ``testidefix.py``) present in the test directory for additional validation. + * - ``nonRegressionTest`` + - Compares the output dump file to a reference file using RMSE; fails if the error exceeds the tolerance. + * - ``compareDump`` + - Compares two arbitrary dump files using the same logic as ``nonRegressionTest``. + * - ``makeReference`` + - Copies the specified output file to the reference directory, updating the reference for future regression tests. + +Usage Example +------------- + +Below is an example inspired by ``testme.py`` from ``test/HD/sod/testme.py``. This demonstrates a typical workflow for running tests and performing regression checks. + +.. code-block:: python + + import pytools.idfx_test as tst + + name = "dump.0001.dmp" + + def testMe(test): + test.configure() + test.compile() + inifiles = ["idefix.ini", "idefix-hll.ini", "idefix-hllc.ini", "idefix-tvdlf.ini"] + if test.reconstruction == 4: + inifiles = ["idefix-rk3.ini", "idefix-hllc-rk3.ini"] + + # Loop over all ini files for this test + for ini in inifiles: + test.run(inputFile=ini) + if test.init: + test.makeReference(filename=name) + test.standardTest() + test.nonRegressionTest(filename=name) + + test = tst.idfxTest() + + if not test.all: + if test.check: + test.checkOnly(filename=name) + else: + testMe(test) + else: + test.noplot = True + for rec in range(2, 5): + test.vectPot = False + test.single = False + test.reconstruction = rec + test.mpi = False + testMe(test) + + # Test in single precision + test.reconstruction = 2 + test.single = True + testMe(test) + +How to Run +---------- + +You can run the test script from the command line, passing any of the supported options. For example: + +.. code-block:: bash + + python testme.py -mpi -dec 2 2 -reconstruction 3 -single -ploterr -idefixDir /path/to/idefix + +This will configure, compile, and run the test in MPI mode with a 2x2 domain decomposition, third-order reconstruction, single precision, and plotting enabled for regression errors. + +Reference File Management +------------------------ + +- Reference files are stored in ``$IDEFIX_DIR/reference/``. +- The filename is generated based on precision, reconstruction, input file, and vector potential settings. +- Use ``test.init`` to regenerate reference files (dangerous: overwrites existing references). + +Regression Testing +------------------ + +- The ``nonRegressionTest`` method compares output dumps to reference files using RMSE. +- If the error exceeds the tolerance, the test fails and (optionally) plots the difference. + From da7b00f2970a687af8dfee146235c67a7c2ff378 Mon Sep 17 00:00:00 2001 From: Geoffroy Lesur Date: Sat, 18 Oct 2025 16:55:03 +0200 Subject: [PATCH 2/3] Update doc/source/faq.rst Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- doc/source/faq.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/source/faq.rst b/doc/source/faq.rst index 198d7f54..4dd19831 100644 --- a/doc/source/faq.rst +++ b/doc/source/faq.rst @@ -71,8 +71,8 @@ VTK output appears corrupted when running with MPI (OpenMPI) This has resolved intermittent corruption for several users. See issue #348 for discussion and reports. -Developement ------------- +Development +----------- I have a serious bug (e.g. segmentation fault), in my setup, how do I proceed? Add ``-DIdefix_DEBUG=ON`` to ``cmake`` and recompile to find out exactly where the code crashes (see :ref:`debugging`). From f96950fbefab90e177834a69b09a9ee589ae07df Mon Sep 17 00:00:00 2001 From: Geoffroy Lesur Date: Sat, 18 Oct 2025 17:32:35 +0200 Subject: [PATCH 3/3] fix linting errors --- doc/source/testing.rst | 2 +- doc/source/testing/idfxTest.rst | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/doc/source/testing.rst b/doc/source/testing.rst index 44ff88d0..888ac205 100644 --- a/doc/source/testing.rst +++ b/doc/source/testing.rst @@ -114,4 +114,4 @@ Relevant files :maxdepth: 2 :caption: Contents: - testing/idfxTest.rst \ No newline at end of file + testing/idfxTest.rst diff --git a/doc/source/testing/idfxTest.rst b/doc/source/testing/idfxTest.rst index 93171102..b425d3fe 100644 --- a/doc/source/testing/idfxTest.rst +++ b/doc/source/testing/idfxTest.rst @@ -170,4 +170,3 @@ Regression Testing - The ``nonRegressionTest`` method compares output dumps to reference files using RMSE. - If the error exceeds the tolerance, the test fails and (optionally) plots the difference. -