Move replica axis to front everywhere#1877
Conversation
0090645 to
453a98f
Compare
scarlehoff
left a comment
There was a problem hiding this comment.
looks fairly harmless to me... if nothing crashed and you tested it with 1 and more than 1 replicas...
Could you run the qed example as well? (since it is the other one you did modify). But again, if it didn't crash it is probably ok. |
|
Greetings from your nice fit 🤖 !
Check the report carefully, and please buy me a ☕ , or better, a GPU 😉! |
|
I tested with 1 and 3 replicas, printing at every epoch for the first 200 and all digits match. It seems that in the fitbot there are small differences though, perhaps again differences were always there but not visible at only 200 epochs. |
|
The fitbot needs to be updated with the latest master though |
|
Ah ok, that explains it. do you want to wait for that? |
|
I did a manual comparison to the fit produced for #1883, and they're not exactly the same: https://vp.nnpdf.science/1vkMFHCmQd63Pnvwg2V23A==/ The dependencies did change a bit, but I didn't expect that would be in a way that would impact the result - I think it's worth rebasing this on top of #1883 and rerunning the fitbot, just to be sure. I had a look at the changes and they indeed seem harmless, but I would be more comfortable if that could be confirmed in a test. |
453a98f to
d5e14f9
Compare
|
Thanks for the check, will do! |
|
Greetings from your nice fit 🤖 !
Check the report carefully, and please buy me a ☕ , or better, a GPU 😉! |
|
Great, are you ok to merge as well @scarlehoff? |
This just puts the replica axis first wherever it wasn't yet.
It gives identical results, to the last digit, on:
In the MSR normalization layer, I do transpose them back to their old shape just because having the flavor axis first there makes for cleaner code. I think the performance hit there is negligible, but this can always be optimized later on.