On ITER clusters, using @munechika-koyo 's camera (which is big with 250,000 LOS) and an edge emissivity field from JINTRAC, the computation time (for res= 10 cm, just for testing) does not seem to be accelerated by parallelization:
Minimum working example (tofu 1.4.2-a5):
In [1]: import tofu as tf
/home/ITER/vezined/ToFu_All/tofu/tofu/__init__.py:95: UserWarning:
The following subpackages are not available:
- tofu.mag
=> see tofu.dsub[<subpackage>] for details.
warnings.warn(msg)
In [2]: cam = tf.load('/home/ITER/munechk/public/MyTofu/output/ITER_test_camera_config.npz')
Loaded from:
/home/ITER/munechk/public/MyTofu/output/ITER_test_camera_config.npz
In [3]: multi = tf.imas2tofu.MultiIDSLoader(user='hoeneno', tokamak='convert', shot=134000, run=29, ids=['core_sources', 'equilibrium', 'edge_sources'])
Getting ids [occ] tokamak user version shot run refshot refrun
------------ ----- ------- ------- ------- ------ --- ------- ------
core_sources [0] convert hoeneno 3 134000 29 -1 -1
edge_sources [0] " " " " " " "
equilibrium [0] " " " " " " "
In [4]: _dshort = {'core_sources': {'1drhotn':'source[identifier.name=radiation].profiles_1d[time].grid.rho_tor_norm',
...: ...: '1deEnergy':'source[identifier.name=radiation].profiles_1d[time].electrons.energy'
...: ...: },
...: ...: 'equilibrium': {'2dpsi': 'time_slice[time].profiles_2d[0].psi',
...: ...: '2dmeshR': 'time_slice[time].profiles_2d[0].r',
...: ...: '2dmeshZ': 'time_slice[time].profiles_2d[0].z'}}
...:
In [5]: multi.set_shortcuts(dshort=_dshort)
In [6]: plasma = multi.to_Plasma2D(shapeRZ=('R', 'Z'))
/home/ITER/vezined/ToFu_All/tofu/tofu/imas2tofu/_core.py:1972: UserWarning: The following data could not be retrieved:
- equilibrium:
2dB : '2dBT'
2dBR : list index out of range
2dBT : list index out of range
2dBZ : list index out of range
2djT : list index out of range
2dmeshFaces : list index out of range
2dmeshNodes : list index out of range
2dphi : list index out of range
2dpsi : list index out of range
2drhopn : '2dpsi'
2drhotn : '2dphi'
2dtheta : '2dmeshNodes'
strike0 : 'strike0R'
strike0R : list index out of range
strike0Z : list index out of range
strike1 : 'strike1R'
strike1R : list index out of range
strike1Z : list index out of range
x0 : 'x0R'
x0R : list index out of range
x0Z : list index out of range
x1 : 'x1R'
x1R : list index out of range
x1Z : list index out of range
- core_sources:
1dbrem : No / several matching signals for: - source[]['identifier', 'name'] = bremsstrahlung - nb.of matches: 0
1dline : No / several matching signals for: - source[]['identifier', 'name'] = lineradiation - nb.of matches: 0
1dprad : '1dbrem'
1dpsi : No / several matching signals for: - source[]['identifier', 'name'] = lineradiation - nb.of matches: 0
1drhopn : '1dpsi'
1drhotn : No / several matching signals for: - source[]['identifier', 'name'] = lineradiation - nb.of matches: 0
warnings.warn(msg)
In [7]: %timeit sig_sum, units = cam.calc_signal_from_Plasma2D(plasma, quant='edge_sources.2dradiation', plot=False, res=0.1, method='sum', minimize='calls', num_threads=1)
12.8 s ± 61.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [8]: %timeit sig_sum, units = cam.calc_signal_from_Plasma2D(plasma, quant='edge_sources.2dradiation', plot=False, res=0.1, method='sum', minimize='calls', num_threads=10)
13 s ± 88.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The computation time is virtually the same.
Checking the presence of openmp:
I checked the presence of openmp based on the test in the setup.py:
In [11]: omp_test = r"""
...: #include <omp.h>
...: #include <stdio.h>
...: int main() {
...: #pragma omp parallel
...: printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(),
...: omp_get_num_threads());
...: }
...: """
...:
...:
...: def check_for_openmp(cc_var):
...: import tempfile
...:
...: tmpdir = tempfile.mkdtemp()
...: curdir = os.getcwd()
...: os.chdir(tmpdir)
...:
...: filename = r"test.c"
...: with open(filename, "w") as file:
...: file.write(omp_test)
...: with open(os.devnull, "w") as fnull:
...: result = subprocess.call(
...: [cc_var, "-fopenmp", filename], stdout=fnull, stderr=fnull
...: )
...:
...: os.chdir(curdir)
...: # clean up
...: shutil.rmtree(tmpdir)
...: return result
...:
...:
In [12]: import subprocess
In [13]: import shutil
In [14]: openmp_installed = not check_for_openmp("cc")
In [15]: openmp_installed
Out[15]: True
So openmp is apparently available.
But, if I open another terminal in parallel and try to monitor the CPU usage during the executaiuon of the two %timeit commands above using
top -u vezined
I see that the CPU usage is effectively limited to 100%, meaning that despite the presence of openmp and num_threads=10, we are limited to 1 CPU only.
Possible causes on my opinion:
- I suspect this is due to the fact that we are running from inside the ipython console and that ipython was allocated only one CPU was it was first started.
=> this seems to hold some valuable information on that point:
http://ipython.org/ipython-doc/stable/parallel/parallel_intro.html
- Or, it could be that the system admins allocated, by default, only one CPU per user on the cluster
What do you think @lasofivec ?
On ITER clusters, using @munechika-koyo 's camera (which is big with 250,000 LOS) and an edge emissivity field from JINTRAC, the computation time (for res= 10 cm, just for testing) does not seem to be accelerated by parallelization:
Minimum working example (tofu 1.4.2-a5):
The computation time is virtually the same.
Checking the presence of openmp:
I checked the presence of openmp based on the test in the setup.py:
So openmp is apparently available.
But, if I open another terminal in parallel and try to monitor the CPU usage during the executaiuon of the two %timeit commands above using
top -u vezined
I see that the CPU usage is effectively limited to 100%, meaning that despite the presence of openmp and num_threads=10, we are limited to 1 CPU only.
Possible causes on my opinion:
=> this seems to hold some valuable information on that point:
http://ipython.org/ipython-doc/stable/parallel/parallel_intro.html
What do you think @lasofivec ?