-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Labels
area: main-sourceRelating to the src/ directory (main Conquest source code)Relating to the src/ directory (main Conquest source code)improves: speedSpeed-up of codeSpeed-up of code
Description
The second hotspot shown in profiling in #197 is
CONQUEST-release/src/PAO_grid_transform_module.f90
Lines 394 to 400 in 4162a3c
| do m1=-l1,l1 | |
| call pao_elem_derivative_2(direction,the_species,l1,acz,m1,x,y,z,val) | |
| if(position+(count1-1)*n_pts_in_block > gridfunctions(pao_fns)%size) & | |
| call cq_abort('single_pao_to_grad: position error ', & | |
| position, gridfunctions(pao_fns)%size) | |
| gridfunctions(pao_fns)%griddata(position+(count1-1)*n_pts_in_block) = val | |
| count1 = count1+1 |
Disregarding the if statement, which we can remove, It looks like the main issue is pao_elem_derivative_2 (and all functions it calls) are called on scalar arguments, but the call is being made from inside a loop nest. To speed this up, the loop nest should be moved to the bottom of the call chain, which would allow vectorization and get rid of a lot of function calling overheads.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area: main-sourceRelating to the src/ directory (main Conquest source code)Relating to the src/ directory (main Conquest source code)improves: speedSpeed-up of codeSpeed-up of code