Generate replicas when fitting models in parallel by scarlehoff · Pull Request #1261 · NNPDF/nnpdf

scarlehoff · 2021-06-01T10:58:43Z

As the title says, this enables using genrep=True when fitting models in parallel.

Since we already have the mechanism for generating more than one replica (with the replica range) I've changed the parallel_models: int to parallel_models: bool and the replicas are given with the replica range. In this way running in parallel is the same as running sequentially in terms of seeds and data.

The process is:

Generate all replicas before starting to fit (just as it was done before when using -r)
If parallel_models: True then all replicas are exactly the same (same trvl, same fktable, same invcovmat)
Create an output which is a stack of all the replicas (so the output is (n_replicas, n_data) instead of (1, n_data))
Fit normally (I left the output in PR Fit many replicas in parallel #1153 zs (1, n_data) so that this could be implemented easily and it worked out! :) )

For reviewers: this is implemented in the first commit in n3fit/src/n3fit/model_trainer.py . Most other changes are changes of variable names to use the plural form (like seed -> seeds) plus some typos I found (quite a few in checks.py)

Working on top of PR #1251 (which is smaller than this PR and the changes are even less impactful so please go to that one first)

A draft for now since I'd like to know how reproducible results are between running sequential / parallel / one by one and add that to the docs.

github-actions · 2021-06-01T13:31:20Z

Greetings from your nice fit 🤖 !
I have good news for you, I just finished my tasks:

Fit Name: NNBOT-7957741b2-2021-06-01
Fit Report: https://vp.nnpdf.science/aSJ2AzJ7QNmJQxEThgyf0Q==
Fit Data: https://data.nnpdf.science/fits/NNBOT-7957741b2-2021-06-01.tar.gz

Check the report carefully, and please buy me a ☕ , or better, a GPU 😉!

scarlehoff · 2021-06-14T08:28:05Z

Perfect also for full fits https://vp.nnpdf.science/h2_Zz2ySRU6BnqaOjgGiOg==/

(done all 100 replicas in ~3.5 hours in 2 GPUs, so about 5 minutes per replica per GPU, 11-12GB of memory each.)

Radonirinaunimi · 2021-06-22T08:15:48Z

I have had a look into this and have been playing with it and did not notice any issues. On a single rtx 2060 with 6 GB of memory, it took about 10 hrs to perform the exact same fits, and the results are exactly the same as @scarlehoff reported. The changes LGTM and the comments I added both here and in #1251 are very minor (nothing conceptual).

scarlehoff marked this pull request as draft June 1, 2021 10:58

scarlehoff added the run-fit-bot Starts fit bot from a PR. label Jun 1, 2021

scarlehoff removed the run-fit-bot Starts fit bot from a PR. label Jun 1, 2021

scarlehoff marked this pull request as ready for review June 14, 2021 08:28

scarlehoff changed the title ~~[WIP] Generate replicas when fitting models in parallel~~ Generate replicas when fitting models in parallel Jun 14, 2021

Radonirinaunimi reviewed Jun 22, 2021

View reviewed changes

Comment thread doc/sphinx/source/n3fit/runcard_detailed.rst Outdated

Radonirinaunimi reviewed Jun 22, 2021

View reviewed changes

Comment thread doc/sphinx/source/n3fit/runcard_detailed.rst

Radonirinaunimi reviewed Jun 22, 2021

View reviewed changes

Comment thread n3fit/src/n3fit/performfit.py

scarlehoff force-pushed the use_n3pdf_interface_with_members branch from 99000fb to 67a22b2 Compare June 23, 2021 12:36

scarlehoff added 5 commits June 23, 2021 14:43

parallel replicas now accept genrep=true

2517a09

change docs, beautify

59eb42d

remove offset

cc182b6

apply comments

5020187

fix rebase

b6426c9

scarlehoff force-pushed the n3fit_replica_range_parallel branch from a510d7c to b6426c9 Compare June 23, 2021 12:56

isolate the checks that need replicas

66e0368

scarrazza merged commit 78e45f0 into use_n3pdf_interface_with_members Jun 23, 2021

scarrazza deleted the n3fit_replica_range_parallel branch June 23, 2021 18:06

Zaharid added the enhancement New feature or request label Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate replicas when fitting models in parallel#1261

Generate replicas when fitting models in parallel#1261
scarrazza merged 6 commits into
use_n3pdf_interface_with_membersfrom
n3fit_replica_range_parallel

scarlehoff commented Jun 1, 2021 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 1, 2021

Uh oh!

scarlehoff commented Jun 14, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Radonirinaunimi commented Jun 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

scarlehoff commented Jun 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 1, 2021

Uh oh!

scarlehoff commented Jun 14, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Radonirinaunimi commented Jun 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

scarlehoff commented Jun 1, 2021 •

edited

Loading