Skip to content

Conversation

@rprospero
Copy link
Contributor

This adds a ProductIterator class that enables iterating over two separate indices. As a bit of a refresher:

  • PairIterator looks at distinct pairs from a single source set
  • ZipIterator examines pairs of indices for different iterators joined together, so each index is seen once.
  • ProductIterator examines every possible combination of indices from two different sources.

We then use this product iterator to parallelise the siteRDF calculation.

@rprospero rprospero changed the title pref: Parallelize siteRDF with ProductIterator perf: Parallelize siteRDF with ProductIterator Apr 23, 2024
@trisyoungs
Copy link
Member

@rprospero and @RobBuchananCompPhys I may have an answer for you. I assumed that you were benchmarking previously by running the SiteRDF system test, and wondered if the reason behind the lack of improvement in parallel was because the calculation was only a tiny part of the overall test, and the biggest bottleneck was disk I/O. I just implemented a proper benchmark of the SiteRDF module in its current form, removing all other overhead including the I/O, and the results for the 1k water box are:

std::execution::seq          39.7 ms per iteration over 18 iterations
std::execution::par_seq    4.1 ms per iteration over 139 iterations

So almost a factor 10 improvement! For the large 5k argon box:

std::execution::seq          3927 ms per iteration over 1 iteration
std::execution::par_seq    128 ms per iteration over 5 iterations

A factor 30 improvement. Not too shabby! I can push up the new benchmarks on the end of this PR if you want to test for yourself.

@RobBuchananCompPhys
Copy link
Contributor

@rprospero and @RobBuchananCompPhys I may have an answer for you. I assumed that you were benchmarking previously by running the SiteRDF system test, and wondered if the reason behind the lack of improvement in parallel was because the calculation was only a tiny part of the overall test, and the biggest bottleneck was disk I/O. I just implemented a proper benchmark of the SiteRDF module in its current form, removing all other overhead including the I/O, and the results for the 1k water box are:

std::execution::seq          39.7 ms per iteration over 18 iterations
std::execution::par_seq    4.1 ms per iteration over 139 iterations

So almost a factor 10 improvement! For the large 5k argon box:

std::execution::seq          3927 ms per iteration over 1 iteration
std::execution::par_seq    128 ms per iteration over 5 iterations

A factor 30 improvement. Not too shabby! I can push up the new benchmarks on the end of this PR if you want to test for yourself.

Wow, great stuff. Would be good to get those benchmarks.

@rprospero rprospero marked this pull request as ready for review June 6, 2024 12:53
@rprospero rprospero requested a review from trisyoungs June 6, 2024 12:53
@rprospero rprospero marked this pull request as draft June 6, 2024 12:55
@rprospero
Copy link
Contributor Author

Just realised that this PR is no longer actually using the ProductIterator. I'm going to refactor to use it and check the benchmarks. If it's faster, we'll use it. Otherwise, I'll drop the class for now.

@rprospero
Copy link
Contributor Author

After benchmarking, the ProductIterator had comparable speed in series, but was slower in parallel than just using the inner loop. Ultimately, the iterator is supposed to pay for its built in performance penalty by smoothing out the cases where certain pairs take significantly longer to calculate than others. I've removed the unused class, since I'm now less convinced that it will ever be needed.

@rprospero rprospero marked this pull request as ready for review June 6, 2024 13:49
@RobBuchananCompPhys RobBuchananCompPhys merged commit 449c6fd into develop Jun 10, 2024
@RobBuchananCompPhys RobBuchananCompPhys deleted the productIterator branch June 10, 2024 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants