-
Notifications
You must be signed in to change notification settings - Fork 31
Description
There's been some ambiguity (at least in my head) whether the time spent in this loop
CONQUEST-release/src/calc_matrix_elements_module.f90
Lines 531 to 539 in 4162a3c
| do nsf1=1, nonef | |
| ind1=n_pts_in_block*(naba_atm1%ibegin_blk_orb(iprim_blk)-1+ & | |
| naba_atm1%ibeg_orb_atom(naba1,iprim_blk)-1+(nsf1-1))+1 | |
| ii = ii+1 | |
| factor_M=send_array(ii) | |
| call axpy(n_pts_in_block, factor_M, & | |
| gridfunctions(gridtwo)%griddata(ind2:ind2+n_pts_in_block-1), 1, & | |
| gridfunctions(gridone)%griddata(ind1:ind1+n_pts_in_block-1), 1) | |
| end do |
Was being spent on the axpy call itself, or the data access. I had a look and it looks like it's very much the data access.
First, for reference, profile with #195 using 1 OpenMP thread (so essentially develop) shows the loop at L531 is the main hotspot in the code
with time being spent on the line that calls axpy
I moved the data accesses out of the axpy call to see where the time was actually being spent, here is the result
Copying the data to a buffer obviously made the code a lot slower, the time in act_on_vectors_new is now 373s instead of 56s for the same benchmark run. However, the time spent in the axpy call itself is now tiny which suggests to me that accessing the data stored in gridfunctions%griddata is the culprit here.


