Faster los calc signal sum calls for newcalc=False#308
Conversation
…educeat and made Plasma2D.get_intu() more robust vs one-time step
| if t is not None: | ||
| indt = np.digitize(t, tbinall) | ||
| indtu = np.unique(indt) | ||
| if len(t) == len(tall) and np.allclose(t, tall): |
There was a problem hiding this comment.
handle trivial case first, particularly suited for cases with t.size = 1
| coefs=None, | ||
| ind=None, | ||
| out=object, | ||
| returnas=object, |
There was a problem hiding this comment.
much more explicit variable name, especially since we tend to use 'out' to store temporary output inside functions.
It as the case in the other method by the way (calc_signal_from_Plasma2D()), out was redefined at line 6254
| ) | ||
| * reseff[ii] | ||
| ) | ||
| sig = np.add.reduceat(val, np.r_[0, indpts], |
There was a problem hiding this comment.
More concise and numerically faster, only advantages :-)
Usable in _GG.LOS_calc_signal(method='sum', minimize='calls') ?
Usable elsewhere ?
Compatible with parallelization ?
There was a problem hiding this comment.
nope, only functions that don't require gil can be parallelized, calling numpy functions requires the gil, so it is not possible. This is why it wasn't parallelized with newcalc=True
| indpts[0], np.diff(indpts), pts.shape[1] - indpts[-1] | ||
| ] | ||
| vect = np.repeat(self.u, nbrep, axis=1) | ||
| if fill_value is None: |
There was a problem hiding this comment.
Avoid nans, they are not handled by np.add.reduceat() => we set fill_value to 0 instead in this case (if not forced by the user)
|
It is not possible to parallelize the code that uses numpy function (GIL-requirement). |
|
Oh, one comment: in your benchmark, you put the parameter |
|
It was a temporary version where all options were possible. |
Motivations:
In the case where newcalc is False, we use a for loop over the LOS to compute the sum (integral).
I wanted to know whether there might be more efficient function available in numpy.
I asked the question on Stackoverflow.
It exists, and here it is: np.add.reduceat()
Ref:
https://stackoverflow.com/questions/59079141/perform-numpy-sum-or-scipy-integrate-simps-on-large-splitted-array-efficient/59085492#59085492
This PR is interesting because the algorithm used when newcalc=False is very similar to the one used when (newcalc=True, method='sum', minimize='calls').
So acceleration in one case may be useful in the other.
Main changes:
Benefits:
A small benchmark on ITER (git + python setup.py build_ext) yielded (on a case with 250,000 LOS but only res=0.1):
Notice the 25% speed gain when newcalc=False.
The comparison with newcalc=True and a look at the code shows that in _GG.pyx, in the case (method='sum', minimize='calls'), the summing operation (l. 2991-2997) is done is a loop which is not parallelized, we could either:
What do you think @lasofivec ?
P.S.: keep in mind that currently, running tofu on the ITER clusters in ipython does not make use of the parallelization (cf. issue #307 ), which explains why in the above benchmark newcalc=True is not faster than newcalc=True, since it runs on only one CPU