Skip to content

Faster los calc signal sum calls for newcalc=False#308

Merged
lasofivec merged 3 commits intodevelfrom
Faster_LOSCalcSignalSumCalls
Jan 6, 2020
Merged

Faster los calc signal sum calls for newcalc=False#308
lasofivec merged 3 commits intodevelfrom
Faster_LOSCalcSignalSumCalls

Conversation

@Didou09
Copy link
Copy Markdown
Member

@Didou09 Didou09 commented Dec 2, 2019

Motivations:

In the case where newcalc is False, we use a for loop over the LOS to compute the sum (integral).
I wanted to know whether there might be more efficient function available in numpy.
I asked the question on Stackoverflow.

It exists, and here it is: np.add.reduceat()

Ref:
https://stackoverflow.com/questions/59079141/perform-numpy-sum-or-scipy-integrate-simps-on-large-splitted-array-efficient/59085492#59085492

This PR is interesting because the algorithm used when newcalc=False is very similar to the one used when (newcalc=True, method='sum', minimize='calls').
So acceleration in one case may be useful in the other.

Main changes:

  • Replaced the for loop in Rays.calc_signal(newcalc=False) and Rays.calc_signal_from_Plasma2D(newcalc=False) by np.add.reduceat()
  • While testing, a bug appeared in Plasma2D._get_indtu(), when t has only one time step. Fixed.

Benefits:

A small benchmark on ITER (git + python setup.py build_ext) yielded (on a case with 250,000 LOS but only res=0.1):

In [1]: import tofu as tf
m/home/ITER/vezined/ToFu_All/tofu/tofu/__init__.py:95: UserWarning: 
The following subpackages are not available:
    - tofu.mag
  => see tofu.dsub[<subpackage>] for details.
  warnings.warn(msg)

In [2]: multi = tf.imas2tofu.MultiIDSLoader(user='hoeneno', tokamak='convert', shot=134000, run=29, ids=['edge_sources'])
Getting ids   [occ]  tokamak  user     version  shot    run  refshot  refrun
------------  -----  -------  -------  -------  ------  ---  -------  ------
edge_sources  [0]    convert  hoeneno  3        134000  29   -1       -1    

In [3]: _dshort = {'core_sources': {'1drhotn':'source[identifier.name=radiation].profiles_1d[time].grid.rho_tor_norm',
   ...:    ...:                             '1deEnergy':'source[identifier.name=radiation].profiles_1d[time].electrons.energy'
   ...:    ...:                             },
   ...:    ...:            'equilibrium': {'2dpsi': 'time_slice[time].profiles_2d[0].psi',
   ...:    ...:                            '2dmeshR': 'time_slice[time].profiles_2d[0].r',
   ...:    ...:                            '2dmeshZ': 'time_slice[time].profiles_2d[0].z'}}
   ...:                            

In [4]: multi.set_shortcuts(dshort=_dshort)

In [5]: plasma = multi.to_Plasma2D()

In [6]: cam = tf.load('/home/ITER/munechk/public/MyTofu/output/ITER_test_camera_config.npz')
sLoaded from:
    /home/ITER/munechk/public/MyTofu/output/ITER_test_camera_config.npz

In [7]: sig_oldbasic, units = cam.calc_signal_from_Plasma2D(plasma, quant='edge_sources.2dradiation', plot=False, res=0.1, newcalc=False, ufunc=False)

In [8]: sig_oldufunc, units = cam.calc_signal_from_Plasma2D(plasma, quant='edge_sources.2dradiation', plot=False, res=0.1, newcalc=False, ufunc=True, fill_value=0.)

In [9]: sig_newbasic, units = cam.calc_signal_from_Plasma2D(plasma, quant='edge_sources.2dradiation', plot=False, res=0.1, newcalc=True, method='sum', minimize='calls', ufunc=False)

In [10]: np.allclose(sig_oldbasic.data, sig_oldufunc.data)
Out[10]: True

In [11]: np.allclose(sig_newbasic.data, sig_oldufunc.data)
Out[11]: True

In [12]: %timeit sig_oldbasic, units = cam.calc_signal_from_Plasma2D(plasma, quant='edge_sources.2dradiation', plot=False, res=0.1, newcalc=False, ufunc=False)
11.8 s ± 90.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [13]: %timeit sig_oldufunc, units = cam.calc_signal_from_Plasma2D(plasma, quant='edge_sources.2dradiation', plot=False, res=0.1, newcalc=False, ufunc=True, fill_value=0.)
8.77 s ± 19.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [14]: %timeit sig_newbasic, units = cam.calc_signal_from_Plasma2D(plasma, quant='edge_sources.2dradiation', plot=False, res=0.1, newcalc=True, method='sum', minimize='calls', ufunc=False)
9.94 s ± 19.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Notice the 25% speed gain when newcalc=False.

The comparison with newcalc=True and a look at the code shows that in _GG.pyx, in the case (method='sum', minimize='calls'), the summing operation (l. 2991-2997) is done is a loop which is not parallelized, we could either:

  • Parallelize this loop ?
  • Quick hotfix : use the same function np.add.reduceat() to suppress the loop ?

What do you think @lasofivec ?

P.S.: keep in mind that currently, running tofu on the ITER clusters in ipython does not make use of the parallelization (cf. issue #307 ), which explains why in the above benchmark newcalc=True is not faster than newcalc=True, since it runs on only one CPU

VEZINET Didier added 2 commits December 2, 2019 15:36
@Didou09 Didou09 requested a review from lasofivec December 2, 2019 15:17
@Didou09 Didou09 self-assigned this Dec 2, 2019
@pep8speaks
Copy link
Copy Markdown

pep8speaks commented Dec 2, 2019

Hello @Didou09! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-12-02 15:20:40 UTC

Comment thread tofu/data/_core.py
if t is not None:
indt = np.digitize(t, tbinall)
indtu = np.unique(indt)
if len(t) == len(tall) and np.allclose(t, tall):
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle trivial case first, particularly suited for cases with t.size = 1

Comment thread tofu/geom/_core.py
coefs=None,
ind=None,
out=object,
returnas=object,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much more explicit variable name, especially since we tend to use 'out' to store temporary output inside functions.
It as the case in the other method by the way (calc_signal_from_Plasma2D()), out was redefined at line 6254

Comment thread tofu/geom/_core.py
)
* reseff[ii]
)
sig = np.add.reduceat(val, np.r_[0, indpts],
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More concise and numerically faster, only advantages :-)
Usable in _GG.LOS_calc_signal(method='sum', minimize='calls') ?
Usable elsewhere ?

Compatible with parallelization ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, only functions that don't require gil can be parallelized, calling numpy functions requires the gil, so it is not possible. This is why it wasn't parallelized with newcalc=True

Comment thread tofu/geom/_core.py
indpts[0], np.diff(indpts), pts.shape[1] - indpts[-1]
]
vect = np.repeat(self.u, nbrep, axis=1)
if fill_value is None:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid nans, they are not handled by np.add.reduceat() => we set fill_value to 0 instead in this case (if not forced by the user)

@lasofivec
Copy link
Copy Markdown
Collaborator

It is not possible to parallelize the code that uses numpy function (GIL-requirement).
So I went for the "hotfix" which is not such a hotfix, more an enhancement.
I tried the new version, which seems to be working, I will create a new PR.

@lasofivec lasofivec merged commit 00d1e05 into devel Jan 6, 2020
@lasofivec lasofivec deleted the Faster_LOSCalcSignalSumCalls branch January 6, 2020 17:44
@lasofivec
Copy link
Copy Markdown
Collaborator

Oh, one comment: in your benchmark, you put the parameter ufunc = [True/False] which is not a valid parameter for the function. I wonder if you were using the wrong version of tofu ? or a newer that was never pushed ?

@Didou09
Copy link
Copy Markdown
Member Author

Didou09 commented Jan 7, 2020

It was a temporary version where all options were possible.
Since the becnhmark showed that using the ufunc (Universal Function, in this case np.add.reduceat) was significantly faster, I went for it and cleaned the function to leave only the ufunc=True implementation for newcalc=False, before pushing. That's why you didn't see it, it was a temporary version for benchmarking before pushing.

Copy link
Copy Markdown
Member Author

@Didou09 Didou09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for me when tests are passing,
But there seems to be an issue with array contiguity in unit tests, do you see why ?

Sorry, this comment was intended for the other PR, my mistake

@Didou09 Didou09 mentioned this pull request Jan 30, 2020
@Didou09 Didou09 mentioned this pull request Mar 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants