Conversation
tkoskela
left a comment
There was a problem hiding this comment.
The changes in this PR look pretty straightforward to me. It would be interesting to time the loop do nsf3 = 1, ia%nsup with both versions of the code and see how much faster the blas call is. I guess you could do this with one thread so you don't have to worry about the thread-safety of the timers. With the graphs you've shown in the other PRs, I'm not entirely convinced this is making the code significantly faster.
Results of timing nsf3 loop
|
eff92cd to
cee9e0f
Compare

Description
Speedup plot
This plot shows the performance of test

test_004_isol_C2H4_4proc_PBE0CRIfor 1 mpi process