Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions src/exx_evalpao.f90
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ subroutine exx_phi_on_grid(inode,atom,spec,extent,xyz,nsuppfuncs,phi_on_grid,r_i
real(double) :: grid_spacing
real(double) :: x, y, z, r
real(double) :: int, n, rest
real(double) :: xyz_delta(3)
real(double) :: xyz_delta(3), xyz_offset(3)

integer :: count1, nsf1
integer :: ierr, stat
Expand Down Expand Up @@ -155,19 +155,21 @@ subroutine exx_phi_on_grid(inode,atom,spec,extent,xyz,nsuppfuncs,phi_on_grid,r_i
pz = pz -ijk(3)+1
end if overlap_box
!print*,
xyz_offset = xyz + rst
!$omp parallel do collapse(3) schedule(runtime) default(none) &
!$omp shared(mx,my,mz,px,py,pz,grid_spacing,xyz_offset,pao,spec,phi_on_grid,i_dummy,exx_cartesian,extent) &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorter lines please 🥺

!$omp private(nx,ny,nz,x,y,z,count1,l1,acz,m1,pao_val)
grid_x_loop: do nx = mx, px
x = xyz(1) + real(nx,double)*grid_spacing + rst(1)

grid_y_loop: do ny = my, py
y = xyz(2) + real(ny,double)*grid_spacing + rst(2)

grid_z_loop: do nz = mz, pz
z = xyz(3) + real(nz,double)*grid_spacing + rst(3)
x = nx*grid_spacing + xyz_offset(1)
y = ny*grid_spacing + xyz_offset(2)
z = nz*grid_spacing + xyz_offset(3)
Comment on lines +165 to +167
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. If there's spare time, I'd like to try the following:

precompute x, y, and z into arrays outside of the loop. These are just 1d arrays so the memory footprint should be small, and we would avoid redundant recomputations of y and z. Could use a Array of Structures data format so that structures of x,y,z are aligned in memory (ie, real, dimension(3,N) :: xyz, where N = max(px-mx, py-my, pz-mz), or something like that. Then inside this loop you could just use xyz(1,nx), xyz(2,ny), xyz(3,nz). I'm not sure if this makes much performance difference. As we learned this week, "memory is expensive, flops are free".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this would be worth it. I've just tested the concept with a short program and there seems to be no difference. When accessing xyz in the nested loop, the different values of nx, ny and nz make the memory accessed non-contiguous so we'll be getting lots of cache misses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to arrange the data such that the nx,ny,nz accesses are contiguous. Maybe I got it wrong in my comment. In principle I agree, if it seems like this isn't worth it, let's not spend much time on it.


!norm = sqrt((x-xyz(1))**2+(y-xyz(2))**2+(z-xyz(3))**2)
!if (norm <= r_h) then

r = sqrt(x*x+y*y+z*z)
!r = sqrt(x*x+y*y+z*z)
!if(r < very_small) then
! r = zero
!end if
Expand All @@ -180,10 +182,7 @@ subroutine exx_phi_on_grid(inode,atom,spec,extent,xyz,nsuppfuncs,phi_on_grid,r_i
zeta_loop: do acz = 1, pao(spec)%angmom(l1)%n_zeta_in_angmom

magn_loop: do m1 = -l1, l1

pao_val = zero
y_val = zero


call evaluate_pao(i_dummy,spec,l1,acz,m1,x,y,z,pao_val,exx_cartesian)

! Put pao_val directly into phi_on_grid
Expand All @@ -198,6 +197,7 @@ subroutine exx_phi_on_grid(inode,atom,spec,extent,xyz,nsuppfuncs,phi_on_grid,r_i
end do grid_z_loop
end do grid_y_loop
end do grid_x_loop
!$omp end parallel do

end if

Expand Down
3 changes: 2 additions & 1 deletion src/ol_ang_coeff_subs.f90
Original file line number Diff line number Diff line change
Expand Up @@ -881,7 +881,6 @@ subroutine evaluate_pao(i_vector,sp,l,nz,m,x,y,z,pao_val,system)
!
j = floor(r/del_r) + 1
!
pao_val = zero
if( j+1 <= npts ) then
rr = real(j,double)*del_r
a = (rr - r)/del_r
Expand All @@ -893,6 +892,8 @@ subroutine evaluate_pao(i_vector,sp,l,nz,m,x,y,z,pao_val,system)
r3 = pao(sp)%angmom(l)%zeta(nz)%table2(j)
r4 = pao(sp)%angmom(l)%zeta(nz)%table2(j+1)
pao_val = a*r1 + b*r2 + c*r3 + d*r4
else
pao_val = zero
end if
!
if ( .not. cartesian ) then ! if want to work in Polar coordinates
Expand Down