Fit many replicas in parallel#1153
Conversation
|
Greetings from your nice fit 🤖 !
Check the report carefully, and please buy me a ☕ , or better, a GPU 😉! |
| mcseed: 3 | ||
| sum_rules: "All" | ||
|
|
||
| genrep: False # true = generate MC replicas, false = use real data |
There was a problem hiding this comment.
Was this to save time because the generation of reps is slow? Or was it some other reason? I worry about this propagating if it's set in the example.
There was a problem hiding this comment.
No, just because I didn't do it in the original.
It shouldn't be complicated because the logic is the same as the one with 1 -r 30 but now in parallel. But doing it while wasting as little memory as possible might need some work (or might not, but need to check).
|
This is now ready. I'm running things like hyperopt to ensure nothing is broken. Nothing was in #1039 but this merge/rebase was not trivial so I want to check. The main limitation is that at the moment one cannot run in parallel while generating pseudodata (so it is useless for a fit!). It shouldn't be difficult (specially now with #1081), just telling the provider we want X replicas, but wasn't part of the original PR that was already reviewed so it will go to a separate one. |
|
The failed fit: https://vp.nnpdf.science/ccTgYKZbRdO7l97w8RAShg==/ |
RoyStegeman
left a comment
There was a problem hiding this comment.
This looks good and I haven't noticed any problems while running the code. I left two comments for things I would personally change, but otherwise it's fine.
53a4914 to
8d1e363
Compare
|
Given that we are pushing back a bit NNPDF I'd rather have this merged, but in doing the final tests I've noticed that while for TF 2.2 I didn't notice a difference on speed, with 2.4.1 (the current Conda default) this branch is about 10/15% slower, so I need to investigate that (as the main way of running the code will still be in CPU). |
8b5dd60 to
19d235f
Compare
|
Ok, it was the usage of |
|
That's actually good to know! I like einsum but wasn't aware it's significantly slower on cpu than some alternatives. But how does that explain the difference between tf2.2 and tf2.4.1 (or related numpy version)? |
|
Probably the speed difference was drowned by something else so I didn't notice it (both are faster now in Boogiepop than before). |
|
Ok, tests performed: Standard:
Non standard fits:25 replicas with only a few datasets, to see that 1) it runs 2) positivity work well 3) nothing obviously break
BenchmarksOn doing this I noted the einsum problem.
If I haven't forgotten anything important, this can be merged. |
9b2edce to
04f2a22
Compare
|
First I need to have a good fit bot with the new data (#1218) and then test the bot in this branch and then it can merged. |
Co-authored-by: Roy Stegeman <roystegeman@live.nl>
7e773f5 to
1e4ab93
Compare
|
Greetings from your nice fit 🤖 !
Check the report carefully, and please buy me a ☕ , or better, a GPU 😉! |
|
I'll merge this as soon as #1211 is green-lighted |
A rebase for #1039 seemed way more complicated than manually porting the changes so I did that.
Missing: a few review comments that I still need to implement and ensuring I haven't broken anything along the way.