276 thread exx phi on grid#324
Conversation
tkoskela
left a comment
There was a problem hiding this comment.
This is a nice, minimal change that gets a parallel performance boost! I like it.
| !print*, | ||
| xyz_offset = xyz + rst | ||
| !$omp parallel do collapse(3) schedule(runtime) default(none) & | ||
| !$omp shared(mx,my,mz,px,py,pz,grid_spacing,xyz_offset,pao,spec,phi_on_grid,i_dummy,exx_cartesian,extent) & |
| x = nx*grid_spacing + xyz_offset(1) | ||
| y = ny*grid_spacing + xyz_offset(2) | ||
| z = nz*grid_spacing + xyz_offset(3) |
There was a problem hiding this comment.
This is fine. If there's spare time, I'd like to try the following:
precompute x, y, and z into arrays outside of the loop. These are just 1d arrays so the memory footprint should be small, and we would avoid redundant recomputations of y and z. Could use a Array of Structures data format so that structures of x,y,z are aligned in memory (ie, real, dimension(3,N) :: xyz, where N = max(px-mx, py-my, pz-mz), or something like that. Then inside this loop you could just use xyz(1,nx), xyz(2,ny), xyz(3,nz). I'm not sure if this makes much performance difference. As we learned this week, "memory is expensive, flops are free".
There was a problem hiding this comment.
I'm not sure this would be worth it. I've just tested the concept with a short program and there seems to be no difference. When accessing xyz in the nested loop, the different values of nx, ny and nz make the memory accessed non-contiguous so we'll be getting lots of cache misses.
There was a problem hiding this comment.
It should be possible to arrange the data such that the nx,ny,nz accesses are contiguous. Maybe I got it wrong in my comment. In principle I agree, if it seems like this isn't worth it, let's not spend much time on it.
eff92cd to
cee9e0f
Compare
ec90a6a to
36848b7
Compare
36848b7 to
be386f1
Compare
Description
exx_phi_on_gridSpeedup plot
These plots shows the performance of test

test_004_isol_C2H4_4proc_PBE0CRIfor 1 mpi process