Skip to content

BUG: output files after #0000 written incompletely #259

@birnstiel

Description

@birnstiel

Describe the issue:

I am running a slightly modified version of the VSI test (in 3D, different resolution, lower scale height, dumps+outputs every 10 orbits). The outputs are written fine according to the log file:

Vtk: Write file data.0002.vtk...done in 3.206488e+00 s.
Dump: Write file n 2...done in 1.527347e+00 s.

However this is the output directory listing, note the very small sizes of the vtk and dmp files after the first output.

6,6G  8. Sep 13:57 data.0000.vtk
8,0K  8. Sep 18:56 data.0001.vtk
8,0K  9. Sep 00:06 data.0002.vtk
7,6G  8. Sep 13:57 dump.0000.dmp
 56K  8. Sep 18:56 dump.0001.dmp
 56K  9. Sep 00:06 dump.0002.dmp

I checked the dump file, and it seems it wrote the header and coordinates fine, but fails when reading the data of the first field Vc-RHO.

Is this something you have seen before or an issue with the file system?

Error message:

No response

runtime information:

This is how the code is built in the slurm script:

module load spack/2024.04
module load cmake/3.20.2-gcc-11.4.1 cuda/11.8.0
module load openmpi/5.0.0-gcc-11.4.1-cuda11.8

# this is for an H100
cmake $IDEFIX_DIR -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_HOPPER90=ON
make -j 2

Dump file header is Idefix 2.1.01-2f15373c Dump Data little endian.

Below I add the beginning of the log file:

-- Setting default Kokkos CXX standard to 17
-- Kokkos version: 4.3.1
-- The project name is: Kokkos
-- Using internal gtest for testing
-- Compiler Version: 11.8.89
-- kokkos_launch_compiler (/idefix/src/kokkos/bin/kokkos_launch_compiler) is enabled...
-- Using -std=c++17 for C++17 standard as feature
-- Built-in Execution Spaces:
--     Device Parallel: Kokkos::Cuda
--     Host Parallel: NoTypeDefined
--       Host Serial: SERIAL
-- 
-- Architectures:
--  HOPPER90
-- Using internal desul_atomics copy
-- Kokkos Backends: SERIAL;CUDA
-- Idefix final configuration
--     MHD:  OFF
--     MPI:  OFF
--     HDF5: OFF
--     Reconstruction: Linear
--     Precision: Double
--     Version: 2.1.01-2f15373c
--     Problem definitions: 'definitions.hpp'
-- Configuring done (1.0s)
-- Generating done (0.4s)
-- Build files have been written to: /idefix-setups/idefix_VSI
[  3%] Built target kokkossimd
[  3%] Built target AlwaysCheckGit
[  6%] Built target impl_git_version
[ 40%] Built target kokkoscore
[ 43%] Built target kokkoscontainers
[ 46%] Building CXX object CMakeFiles/idefix.dir/src/output/dump.cpp.o
[ 46%] Building CXX object CMakeFiles/idefix.dir/src/dataBlock/dumpToFile.cpp.o
[ 47%] Building CXX object CMakeFiles/idefix.dir/src/output/vtk.cpp.o
[ 49%] Building CXX object CMakeFiles/idefix.dir/src/input.cpp.o
[ 50%] Linking CXX executable idefix
[100%] Built target idefix
Starting job
I'm on Host [...].physik.uni-muenchen.de
It's now So 8. Sep 13:56:51 CEST 2024
                                  .:HMMMMHn:.  ..:n..
                                .H*'``     `'%HM'''''!x.
         :x                    x*`           .(MH:    `#h.
        x.`M                   M>        :nMMMMMMMh.     `n.
         *kXk..                XL  nnx:.XMMMMMMMMMMML   .. 4X.
          )MMMMMx              'M   `^?M*MMMMMMMMMMMM:HMMMHHMM.
          MMMMMMMX              ?k    'X ..'*MMMMMMM.#MMMMMMMMMx
         XMMMMMMMX               4:    M:MhHxxHHHx`MMx`MMMMMMMMM>
         XM!`   ?M                `x   4MM'`''``HHhMMX  'MMMMMMMM
         4M      M                 `:   *>     `` .('MX   '*MMMM'
          MX     `X.nnx..                        ..XMx`     'M*X
           ?h.    ''```^'*!Hx.     :Mf     xHMh  M**MMM      4L`
            `*Mx           `'*n.x. 4M>   :M` `` 'M    `       %
             '%                ``*MHMX   X>      !
            :!                    `#MM>  X>      `   :x
           :M                        ?M  `X     .  ..'M
           XX                       .!*X  `x   XM( MMx`h
          'M>::                        `M: `+  MMX XMM `:
          'M> M                         'X    'MMX ?MMk.Xx..
          'M> ?L                     ...:!     MMX.H**'MMMM*h
           M>  #L                  :!'`MM.    . X*`.xHMMMMMnMk.
           `!   #h.      :L           XM'*hxHMM*MhHMMMMMMMMMM'#h
           +     XMh:    4!      x   :f   MM'   `*MMMMMMMMMM%  `X
           M     Mf``tHhxHM      M>  4k xxX'      `#MMMMMMMf    `M .>
          :f     M   `MMMMM:     M>   M!MMM:         '*MMf'     'MH*
          !     Xf   'MMMMMX     `X   X>'h.`          :P*Mx.   .d*~..
        :M      X     4MMMMM>     !   X~ `Mh.      .nHL..M#'%nnMhH!'`
       XM      d>     'X`'**h     'h  M   ^'MMHH+*'`  ''''   `'**'
    %nxM>      *x+x.:. XL.. `k     `::X
:nMMHMMM:.  X>  Mn`*MMMMMHM: `:     ?MMn.
    `'**MML M>  'MMhMMMMMMMM  #      `M:^*x
         ^*MMttnnMMMMMMMMMMMH>.        M:.4X
                        `MMMM>X   (   .MMM:MM!   .
                          `'''4x.dX  +^ `''MMMMHM?L..
                                ``'           `'`'`'`

              Idefix version 2.1.01-2f15373c
              Built against Kokkos 40301
              Compiled on Sep  8 2024 at 13:56:42


Main: initialization stage.
Main: initialisation finished.
Main: running on [...].physik.uni-muenchen.de
-----------------------------------------------------------------------------
Input Parameters using input file idefix.ini:
-----------------------------------------------------------------------------
[Boundary]
	X1-beg		userdef
	X1-end		outflow
	X2-beg		outflow
	X2-end		outflow
	X3-beg		periodic
	X3-end		periodic
[Gravity]
	Mcentral		1.0
	gravCst		1
	potential		central
	skip		1
[Grid]
	X1-grid		1	1.0	1280	l	3.0
	X2-grid		1	1.2707963267948965	384	u	1.8707963267948966
	X3-grid		1	0.0	512	u	1.5707963267948966
[Hydro]
	csiso		userdef
	solver		hllc
[Output]
	dmp		62.831853071795865
	dmp_dir		/idefix_VSI3D
	log		100
	vtk		62.831853071795865
	vtk_dir		/idefix_VSI3D
[Setup]
	epsilon		0.05
[TimeIntegrator]
	CFL		0.8
	CFL_max_var		1.1
	check_nan		100
	first_dt		1.e-3
	max_runtime		-1
	maxdivB		1e-06
	nstages		2
	tstop		1256.6370614359
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
Input: Kokkos configuration
Device Execution Space:
  KOKKOS_ENABLE_CUDA: yes
Cuda Options:
  KOKKOS_ENABLE_CUDA_LAMBDA: yes
  KOKKOS_ENABLE_CUDA_LDG_INTRINSIC: yes
  KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE: no
  KOKKOS_ENABLE_CUDA_UVM: no
  KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA: yes
  KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC: yes

Cuda Runtime Configuration:
macro  KOKKOS_ENABLE_CUDA      : defined
macro  CUDA_VERSION          = 11080 = version 11.8
Kokkos::Cuda[ 0 ] NVIDIA H100 NVL capability 9.0, Total Global Memory: 93.12 G, Shared Memory per Block: 48 K : Selected
-----------------------------------------------------------------------------
Input: Compiled with DOUBLE PRECISION arithmetic.
Input: DIMENSIONS=3.
Input: COMPONENTS=3.
Grid: full grid size is 
	 Direction X1: userdef	1....1280....3	outflow
	 Direction X2: outflow	1.2708....384....1.8708	outflow
	 Direction X3: periodic	0....512....1.5708	periodic
Hydro: solving HD equations.
Hydro: Reconstruction: 2nd order (PLM Van Leer)
EquationOfState: isothermal with user-defined cs function.
RiemannSolver: hllc (HD).
Gravity: ENABLED.
Gravity: G=1.
Gravity: central mass gravitational potential ENABLED with M=1
TimeIntegrator: using 2nd Order (RK2) integrator.
TimeIntegrator: Using adaptive dt with CFL=0.8 .
Main: Creating initial conditions.
Vtk: Write file data.0000.vtk...done in 9.42678 s.
Dump: Write file n 0...done in 8.82041 s.
Main: Cycling Time Integrator...
TimeIntegrator:             time |            cycle |        time step | cell (updates/s)
TimeIntegrator:     0.000000e+00 |                0 |     1.000000e-03 |              N/A
TimeIntegrator:     1.686485e-01 |              100 |     1.716926e-03 |     7.347258e+08

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions