Skip to content

276 use blas#323

Merged
connoraird merged 17 commits into276-combine-nsf-loopsfrom
276-use-blas
Feb 16, 2024
Merged

276 use blas#323
connoraird merged 17 commits into276-combine-nsf-loopsfrom
276-use-blas

Conversation

@connoraird
Copy link
Contributor

@connoraird connoraird commented Feb 9, 2024

Description

  • Updates the rst nested loop in the kernel to use a BLAS dot product
    • This required refactoring some allocatable arrays into 1d buffers and using pointer arrays to reference them.
  • Some additional unused allocatable arrays were also removed

Speedup plot

This plot shows the performance of test test_004_isol_C2H4_4proc_PBE0CRI for 1 mpi process
276-use-blas

Copy link
Contributor

@tkoskela tkoskela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this PR look pretty straightforward to me. It would be interesting to time the loop do nsf3 = 1, ia%nsup with both versions of the code and see how much faster the blas call is. I guess you could do this with one thread so you don't have to worry about the thread-safety of the timers. With the graphs you've shown in the other PRs, I'm not entirely convinced this is making the code significantly faster.

@connoraird
Copy link
Contributor Author

connoraird commented Feb 12, 2024

The changes in this PR look pretty straightforward to me. It would be interesting to time the loop do nsf3 = 1, ia%nsup with both versions of the code and see how much faster the blas call is. I guess you could do this with one thread so you don't have to worry about the thread-safety of the timers. With the graphs you've shown in the other PRs, I'm not entirely convinced this is making the code significantly faster.

Results of timing nsf3 loop

threads 276-combine-nsf-loops 276-use-blas
1 2.47956 s 1.95329 s
2 1.45973 s 1.24583 s
4 0.77407 s 0.67395 s
8 0.46875 s 0.39146 s
16 0.27807 s 0.22739 s

276-combine-nsf-loops and 276-use-blas

@tkoskela tkoskela mentioned this pull request Feb 16, 2024
@tkoskela tkoskela added the improves: speed Speed-up of code label Feb 16, 2024
@connoraird connoraird marked this pull request as ready for review February 16, 2024 17:12
@connoraird connoraird merged commit 41c1c9f into 276-combine-nsf-loops Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improves: speed Speed-up of code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants