Restart hyperopt by Cmurilochem · Pull Request #1824 · NNPDF/nnpdf

Cmurilochem · 2023-10-24T15:55:06Z

This PR addresses the issue of restarting an hyperoptimization with the Hyperopt library as discussed in #1800.

Comments on the initial changes made

1. `hyper_optimization/filetrials.py`

To the FileTrials class I have added the from_pkl and to_pkl methods. The last one is a @classmethod that is useful to create instances of the class when tries.pkl file is available from a previous run.
The to_pkl method saves the current state of FileTrials to a pickle file, although this is currently being indeed done for every trial in hyperopt.fmin directly via the trials_save_file argument.
In this regard, I also added an attribute self.pkl_file which will be responsible for generating a tries.pkl file in the same directory as the tries.json
An additional attribute self._rstate is also added that will store the last numpy.random.Generator of the hyperopt algorithm and will be passed as rstate in the hyperopt.fmin function so that we warrant that
by restarting we do so with the same history as if we were doing a direct calculation. The initial fixed seed in trials.rstate = np.random.default_rng(42) here can still be relaxed and provided as input later.

2. `hyper_optimization/hyper_scan.py`

In case of restarts, an extra boolean attribute is added to HyperScanner, named self.restart_hyperopt which is set true in case of the --continue option in n3fit command line (details to be discussed below).
I have adapted hyper_scan_wrapper to allow it to check if hyperscanner.restart_hyperopt is true. If so, it will generate an initial FileTrial instance (trials) from tries.pkl, which contains by built-in the history of the previous hyperopt and also the trials.rstate attribute with the previous numpy random generator.

3. `scripts/n3fit_exec.py`

This is perhaps the most fragile of the changes and where I would need help to adapt it properly.

To the N3FitApp I added a new parser --continue that will be the keyword to hyperopt restarts.
To its run method I add a new self.environment.restart = self.args["continue"] attribute.
The way I found to pass this keyword to HyperScanner later is to use it in connection with produce_hyperscanner. If this is true, I then update hyperscan_config with hyperscan_config.update({'restart': 'true'}) and this will later be part of the HyperScanner's
sampling_dict argument.

Questions and requested feedback

It looks to me that the adaptations made in the scripts/n3fit_exec.py file to allow for --continue are not optimal. Maybe a more experienced developer could suggest a more convenient way to do so.
Despite all our efforts to make sure that the hyperopt restarts have the same history as if we were just making direct experiments, it seems that, despite having the same hyperparameter guesses, restart calculations will afterall show
differences in the obtained final losses for different k folds. This might be due to the fact that the seeds for the initial weights for each k-fold in difference runs are inherently different (see below).
For example, I have done a test in which I make a simple hyperoptimization with 2 trials, and then restart it to make another 2 trials (4 in total). Then I run another experiment and calculate (with the same runcard) 4 trials directly and compare the results.

Restart 0
 {'validation_losses': '[[ 8.973183]\n [12.285054]]', 'experimental_losses': '[[52.548443]\n [42.66381 ]]', 'hyper_losses': '[[52.548443]\n [42.66381 ]]'}
 {'Adam_clipnorm': [], 'Adam_learning_rate': [], 'Nadam_clipnorm': [2.0117276880696024e-05], 'Nadam_learning_rate': [0.002011336799276246], 'nl2:-0/2': [41.0], 'nl2:-1/2': [32.0], 'nodes_per_layer': ['0'], 'optimizer': ['0']}
Direct 0
 {'validation_losses': '[[ 8.973183]\n [12.285054]]', 'experimental_losses': '[[52.548443]\n [42.66381 ]]', 'hyper_losses': '[[52.548443]\n [42.66381 ]]'}
 {'Adam_clipnorm': [], 'Adam_learning_rate': [], 'Nadam_clipnorm': [2.0117276880696024e-05], 'Nadam_learning_rate': [0.002011336799276246], 'nl2:-0/2': [41.0], 'nl2:-1/2': [32.0], 'nodes_per_layer': ['0'], 'optimizer': ['0']}

Restart 1
 {'validation_losses': '[[14.459031]\n [29.935106]]', 'experimental_losses': '[[64.00286 ]\n [74.955086]]', 'hyper_losses': '[[64.00286 ]\n [74.955086]]'}
 {'Adam_clipnorm': [4.9234053337502195e-06], 'Adam_learning_rate': [0.00013178694123594783], 'Nadam_clipnorm': [], 'Nadam_learning_rate': [], 'nl2:-0/2': [40.0], 'nl2:-1/2': [31.0], 'nodes_per_layer': ['0'], 'optimizer': ['1']}
Direct 1
 {'validation_losses': '[[14.459031]\n [29.935106]]', 'experimental_losses': '[[64.00286 ]\n [74.955086]]', 'hyper_losses': '[[64.00286 ]\n [74.955086]]'}
 {'Adam_clipnorm': [4.9234053337502195e-06], 'Adam_learning_rate': [0.00013178694123594783], 'Nadam_clipnorm': [], 'Nadam_learning_rate': [], 'nl2:-0/2': [40.0], 'nl2:-1/2': [31.0], 'nodes_per_layer': ['0'], 'optimizer': ['1']}

-- restarting and performing more two trials

Restart 2
 {'validation_losses': '[[19.959248]\n [39.484573]]', 'experimental_losses': '[[55.306988]\n [92.67653 ]]', 'hyper_losses': '[[55.306988]\n [92.67653 ]]'}
 {'Adam_clipnorm': [], 'Adam_learning_rate': [], 'Nadam_clipnorm': [5.226179920719625e-06], 'Nadam_learning_rate': [0.00020623358967934892], 'nl2:-0/2': [32.0], 'nl2:-1/2': [13.0], 'nodes_per_layer': ['0'], 'optimizer': ['0']}
Direct 2
 {'validation_losses': '[[19.959248]\n [43.446335]]', 'experimental_losses': '[[ 55.306988]\n [104.45502 ]]', 'hyper_losses': '[[ 55.306988]\n [104.45502 ]]'}
 {'Adam_clipnorm': [], 'Adam_learning_rate': [], 'Nadam_clipnorm': [5.226179920719625e-06], 'Nadam_learning_rate': [0.00020623358967934892], 'nl2:-0/2': [32.0], 'nl2:-1/2': [13.0], 'nodes_per_layer': ['0'], 'optimizer': ['0']}

Restart 3
 {'validation_losses': '[[23.391615]\n [65.55123 ]]', 'experimental_losses': '[[ 69.19588]\n [137.06268]]', 'hyper_losses': '[[ 69.19588]\n [137.06268]]'}
 {'Adam_clipnorm': [1.6662863474168997e-07], 'Adam_learning_rate': [0.0025640118767782183], 'Nadam_clipnorm': [], 'Nadam_learning_rate': [], 'nl2:-0/2': [31.0], 'nl2:-1/2': [17.0], 'nodes_per_layer': ['0'], 'optimizer': ['1']}
Direct 3
 {'validation_losses': '[[23.391615]\n [44.021515]]', 'experimental_losses': '[[69.19588 ]\n [93.464554]]', 'hyper_losses': '[[69.19588 ]\n [93.464554]]'}
 {'Adam_clipnorm': [1.6662863474168997e-07], 'Adam_learning_rate': [0.0025640118767782183], 'Nadam_clipnorm': [], 'Nadam_learning_rate': [], 'nl2:-0/2': [31.0], 'nl2:-1/2': [17.0], 'nodes_per_layer': ['0'], 'optimizer': ['1']}

By looking at the above results, we can see that Restart 2/3 have the same hyperparameters as Direct 2/3, with the 2 folds having different losses however. Maybe the 1st fold can still keep up with the losses but not the second fold.
With the help of @goord and @APJansen, I investigated this issue and have printed the generated random integers passed as seeds to generate the PDF models for each fold in MoldelTrainer.hyperparametrizable(); see here. They are shown in the Table below:

	Trial 0		Trial 1		Trial 2		Trial 3
	Fold 1	Fold 2	Fold 1	Fold 2	Fold 1	Fold 2	Fold 1	Fold 2
Restart Job -random integers	1181867710	461027504	1181867710	1020231754	1181867710	461027504	1181867710	1020231754
Direct Job - random integers	1181867710	461027504	1181867710	1020231754	1181867710	1543757328	1181867710	1392765670

As foreseen, it is clear from the table that the seeds are different for the second fold every time we run a new calculation, despite the fact that the runs start with the same hyperparameters. This clearly reflects in the different losses shown above. I suspect that if we want to make hyperopt
runs completely reproducible we could think of alternatives to

for k, partition in enumerate(self.kpartitions):
            # Each partition of the kfolding needs to have its own separate model
            # and the seed needs to be updated accordingly
            seeds = self._nn_seeds
            if k > 0:
                seeds = [np.random.randint(0, pow(2, 31)) for _ in seeds]

to initialise the seeds.

Solution to the random integer issue described above

4. `model_trainer.py`

To ensure that these seeds are generated in reproducible way, @RoyStegeman helped me to devise a new form that changes the way they are generated by defining:

        for k, partition in enumerate(self.kpartitions):
            # Each partition of the kfolding needs to have its own separate model
            # and the seed needs to be updated accordingly
            seeds = self._nn_seeds
            if k > 0:
                # generate random integers for each k-fold from the input `nnseeds`
                # we generate new seeds to avoid the integer overflow that may
                # occur when doing k*nnseeds
                rngs = [np.random.default_rng(seed=seed) for seed in seeds]
                seeds = [generator.integers(1, pow(2, 30)) * k for generator in rngs]

With all the above modifications, I have repeated my previous 4 trial experiment. The results are shown below for both restart and direct runs:

Restart 0
 {'validation_losses': ['2.2993183', '4.4195056'], 'experimental_losses': [10.660690008425245, 13.892794249487705], 'hyper_losses': [19.669736106403093, 21.73920023647384]}
 {'Adadelta_clipnorm': [], 'Adadelta_learning_rate': [], 'RMSprop_learning_rate': [0.015380823956886622], 'activation_per_layer': ['0'], 'dropout': [0.15], 'epochs': [35.0], 'initializer': ['0'], 'multiplier': [1.074400261320179], 'nl2:-0/2': [], 'nl2:-1/2': [], 'nl3:-0/3': [], 'nl3:-1/3': [], 'nl3:-2/3': [], 'nl4:-0/4': [15.0], 'nl4:-1/4': [41.0], 'nl4:-2/4': [36.0], 'nl4:-3/4': [45.0], 'nl5:-0/5': [], 'nl5:-1/5': [], 'nl5:-2/5': [], 'nl5:-3/5': [], 'nl5:-4/5': [], 'nodes_per_layer': ['2'], 'optimizer': ['1'], 'stopping_patience': [0.3600000000000001]}
Direct 0
 {'validation_losses': ['2.2993183', '4.4195056'], 'experimental_losses': [10.660690008425245, 13.892794249487705], 'hyper_losses': [19.669736106403093, 21.73920023647384]}
 {'Adadelta_clipnorm': [], 'Adadelta_learning_rate': [], 'RMSprop_learning_rate': [0.015380823956886622], 'activation_per_layer': ['0'], 'dropout': [0.15], 'epochs': [35.0], 'initializer': ['0'], 'multiplier': [1.074400261320179], 'nl2:-0/2': [], 'nl2:-1/2': [], 'nl3:-0/3': [], 'nl3:-1/3': [], 'nl3:-2/3': [], 'nl4:-0/4': [15.0], 'nl4:-1/4': [41.0], 'nl4:-2/4': [36.0], 'nl4:-3/4': [45.0], 'nl5:-0/5': [], 'nl5:-1/5': [], 'nl5:-2/5': [], 'nl5:-3/5': [], 'nl5:-4/5': [], 'nodes_per_layer': ['2'], 'optimizer': ['1'], 'stopping_patience': [0.3600000000000001]}

Restart 1
 {'validation_losses': ['10.667141', '18.144234'], 'experimental_losses': [14.569936714920344, 25.68137247054303], 'hyper_losses': [46.88904701966194, 52.881341569995435]}
 {'Adadelta_clipnorm': [1.7558937825962389], 'Adadelta_learning_rate': [0.02971486397602543], 'RMSprop_learning_rate': [], 'activation_per_layer': ['0'], 'dropout': [0.03], 'epochs': [30.0], 'initializer': ['0'], 'multiplier': [1.0896393776712885], 'nl2:-0/2': [], 'nl2:-1/2': [], 'nl3:-0/3': [], 'nl3:-1/3': [], 'nl3:-2/3': [], 'nl4:-0/4': [13.0], 'nl4:-1/4': [33.0], 'nl4:-2/4': [12.0], 'nl4:-3/4': [44.0], 'nl5:-0/5': [], 'nl5:-1/5': [], 'nl5:-2/5': [], 'nl5:-3/5': [], 'nl5:-4/5': [], 'nodes_per_layer': ['2'], 'optimizer': ['0'], 'stopping_patience': [0.18000000000000005]}
Direct 1
 {'validation_losses': ['10.667141', '18.144234'], 'experimental_losses': [14.569936714920344, 25.68137247054303], 'hyper_losses': [46.88904701966194, 52.881341569995435]}
 {'Adadelta_clipnorm': [1.7558937825962389], 'Adadelta_learning_rate': [0.02971486397602543], 'RMSprop_learning_rate': [], 'activation_per_layer': ['0'], 'dropout': [0.03], 'epochs': [30.0], 'initializer': ['0'], 'multiplier': [1.0896393776712885], 'nl2:-0/2': [], 'nl2:-1/2': [], 'nl3:-0/3': [], 'nl3:-1/3': [], 'nl3:-2/3': [], 'nl4:-0/4': [13.0], 'nl4:-1/4': [33.0], 'nl4:-2/4': [12.0], 'nl4:-3/4': [44.0], 'nl5:-0/5': [], 'nl5:-1/5': [], 'nl5:-2/5': [], 'nl5:-3/5': [], 'nl5:-4/5': [], 'nodes_per_layer': ['2'], 'optimizer': ['0'], 'stopping_patience': [0.18000000000000005]}

-- restarting and performing more two trials

Restart 2
 {'validation_losses': ['18.18834', '52.55721'], 'experimental_losses': [21.345310585171568, 49.5125512295082], 'hyper_losses': [144.60983298921894, 105.20777437819639]}
 {'Adadelta_clipnorm': [0.8411342478713798], 'Adadelta_learning_rate': [0.04928810632634438], 'RMSprop_learning_rate': [], 'activation_per_layer': ['1'], 'dropout': [0.09], 'epochs': [47.0], 'initializer': ['1'], 'multiplier': [1.0615455307107098], 'nl2:-0/2': [16.0], 'nl2:-1/2': [35.0], 'nl3:-0/3': [], 'nl3:-1/3': [], 'nl3:-2/3': [], 'nl4:-0/4': [], 'nl4:-1/4': [], 'nl4:-2/4': [], 'nl4:-3/4': [], 'nl5:-0/5': [], 'nl5:-1/5': [], 'nl5:-2/5': [], 'nl5:-3/5': [], 'nl5:-4/5': [], 'nodes_per_layer': ['0'], 'optimizer': ['0'], 'stopping_patience': [0.12000000000000002]}
Direct 2
 {'validation_losses': ['18.18834', '52.55721'], 'experimental_losses': [21.345310585171568, 49.5125512295082], 'hyper_losses': [144.60983298921894, 105.20777437819639]}
 {'Adadelta_clipnorm': [0.8411342478713798], 'Adadelta_learning_rate': [0.04928810632634438], 'RMSprop_learning_rate': [], 'activation_per_layer': ['1'], 'dropout': [0.09], 'epochs': [47.0], 'initializer': ['1'], 'multiplier': [1.0615455307107098], 'nl2:-0/2': [16.0], 'nl2:-1/2': [35.0], 'nl3:-0/3': [], 'nl3:-1/3': [], 'nl3:-2/3': [], 'nl4:-0/4': [], 'nl4:-1/4': [], 'nl4:-2/4': [], 'nl4:-3/4': [], 'nl5:-0/5': [], 'nl5:-1/5': [], 'nl5:-2/5': [], 'nl5:-3/5': [], 'nl5:-4/5': [], 'nodes_per_layer': ['0'], 'optimizer': ['0'], 'stopping_patience': [0.12000000000000002]}

Restart 3
 {'validation_losses': ['26.753922', '24.388603'], 'experimental_losses': [52.71014284620098, 35.8982934170082], 'hyper_losses': [82.31994112766945, 3697.219938467043]}
 {'Adadelta_clipnorm': [0.44633727461389994], 'Adadelta_learning_rate': [0.023650226340698025], 'RMSprop_learning_rate': [], 'activation_per_layer': ['1'], 'dropout': [0.09], 'epochs': [26.0], 'initializer': ['1'], 'multiplier': [1.0166524890792967], 'nl2:-0/2': [38.0], 'nl2:-1/2': [34.0], 'nl3:-0/3': [], 'nl3:-1/3': [], 'nl3:-2/3': [], 'nl4:-0/4': [], 'nl4:-1/4': [], 'nl4:-2/4': [], 'nl4:-3/4': [], 'nl5:-0/5': [], 'nl5:-1/5': [], 'nl5:-2/5': [], 'nl5:-3/5': [], 'nl5:-4/5': [], 'nodes_per_layer': ['0'], 'optimizer': ['0'], 'stopping_patience': [0.24000000000000005]}
Direct 3
 {'validation_losses': ['26.753922', '24.388603'], 'experimental_losses': [52.71014284620098, 35.8982934170082], 'hyper_losses': [82.31994112766945, 3697.219938467043]}
 {'Adadelta_clipnorm': [0.44633727461389994], 'Adadelta_learning_rate': [0.023650226340698025], 'RMSprop_learning_rate': [], 'activation_per_layer': ['1'], 'dropout': [0.09], 'epochs': [26.0], 'initializer': ['1'], 'multiplier': [1.0166524890792967], 'nl2:-0/2': [38.0], 'nl2:-1/2': [34.0], 'nl3:-0/3': [], 'nl3:-1/3': [], 'nl3:-2/3': [], 'nl4:-0/4': [], 'nl4:-1/4': [], 'nl4:-2/4': [], 'nl4:-3/4': [], 'nl5:-0/5': [], 'nl5:-1/5': [], 'nl5:-2/5': [], 'nl5:-3/5': [], 'nl5:-4/5': [], 'nodes_per_layer': ['0'], 'optimizer': ['0'], 'stopping_patience': [0.24000000000000005]}

	Trial 0		Trial 1		Trial 2		Trial 3
	Fold 1	Fold 2	Fold 1	Fold 2	Fold 1	Fold 2	Fold 1	Fold 2
Restart Job -random integers	1872583848	203138455	1872583848	203138455	1872583848	203138455	1872583848	203138455
Direct Job - random integers	1872583848	203138455	1872583848	203138455	1872583848	203138455	1872583848	203138455

As seen, we are now able to ensure that both the hyperparameter space and the initial weights for each k-fold are reproducible when restarting.

Note

As can be seen from the above (last) table, because the seeds to generate the random integers for each k-fold are now derived from the fixed value self._nn_seeds here, the generated random integers will always be the same in every trial; see #1824 (comment). This is an important aspect to keep in mind.

…pickle files

…Generator

RoyStegeman · 2023-10-25T10:28:46Z

Thanks @Cmurilochem , do you understand why the test is failing in ubuntu?

Cmurilochem · 2023-10-26T08:01:00Z

Thanks @Cmurilochem , do you understand why the test is failing in ubuntu?

Hi @RoyStegeman, I added a new test that compares the results of one restart and direct hyperopt runs. This then checks for files and depend on paths and from where you run pytest. I will soon try to correct for this.
I do not know why this is not being run in macos. Do you have any idea?

Note: The test (as it is) is not expected to pass entirely since (among other asserts) it requires that the final json ['results'] dictionaries of both runs should match here. This relates to my comments above regarding the differences in the hyper losses for different folds. @goord gave me a nice idea on how to investigate this issue; further details to come.

RoyStegeman · 2023-10-26T09:01:00Z

I do not know why this is not being run in macos. Do you have any idea?

That's actually a good question. Since you added the pytest.mark.skip decorator I would have expected it to be skipped for both ubuntu and macos, so my question would be why it does run in ubuntu instead of why it doesn't in macos. Either way, since you're still investigating an issue related to this specific test, let's not worry too much about it now.

Radonirinaunimi · 2023-10-26T20:37:25Z

Thanks a lot @Cmurilochem for these works! Regarding the issue you are facing now, are you sure that the other seeds (tr/vl, MC replicas) aren't also different? In any case, I think that there should be some (simple) ways to trick the random generators that it is starting from a $n$-th trial (as you can see from the table, the seeds for the continued hyperopt exactly restarted from the 2nd Trial).

RoyStegeman

Some comments but not a complete review

Cmurilochem · 2023-10-30T11:00:26Z

Thanks a lot @Cmurilochem for these works! Regarding the issue you are facing now, are you sure that the other seeds (tr/vl, MC replicas) aren't also different? In any case, I think that there should be some (simple) ways to trick the random generators that it is starting from a n-th trial (as you can see from the table, the seeds for the continued hyperopt exactly restarted from the 2nd Trial).

Thanks @Radonirinaunimi. This is something that I will need to check. Thanks for pointing this out. At least, the provisory change to seeds = [seed * k for seed in seeds] was enough to solve the problem. My test is passing now ate least locally. I am now struggling a bit to make it work in the CI/CD. I hope to solve it soon.

RoyStegeman

And these are my comments for now. Thanks for the work so far!

I suspect that this might be due to the fact that the seeds for the initial weights for each k-fold in difference runs are inherently different (see below).

I suspect so as well - I noticed you froze the seed for the folds but probably not the tensorflow/numpy/python seeds. If the disagreement is due to the effect of random seeds it's of course not a problem for a real run of the hyperoptimization, and for your tests you already freeze them by setting debug: true so that should also be fine.

Radonirinaunimi

Hi @Cmurilochem, here is a quick review. These are mainly formatting/styling and asking clarifications for various points.

In relation to the replica dependence. In the multireplica case I suspect that self._nn_seeds (= nnseeds argument in ModelTrainer) is already a list of seeds for each replica. I am not quite certain but, please, correct if I am wrong.

Yes, that is correct! The status of master right now is that it can have different seeds per replica for MCseed and NNseed during a multireplica fit, the only seed that is always the same is the TrVlseed $-$ which #1788 will fix.

Cmurilochem · 2023-11-06T06:37:20Z

@Cmurilochem Other than moving the HYPEROPT_SEED to the runcard, is there anything else you want to do in this PR before merging?

If not I'll have a quick last look but then I'd say this is done?

Hi @Radonirinaunimi. I have corrected for all @RoyStegeman's suggestions and now will proceed to yours. Thanks for you time in reviewing and for your excellent suggestions.

Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>

…irectories of tmp_path

…ttribute

RoyStegeman

Thanks, looks good!

HYPEROPT_SEED is going to remain fixed for now?

Cmurilochem · 2023-11-06T13:27:16Z

Thanks, looks good!

HYPEROPT_SEED is going to remain fixed for now?

Hi @RoyStegeman that is what we initially thought and implemented so far. But If you think it would be better, I could try to add a new entry into the runcard so that (like other seeds) the user could have control over it. Please, just let me know what do you think ?

RoyStegeman · 2023-11-06T13:35:58Z

Don't worry I'm fine either way, was just making sure it wasn't something you had forgotten about as I had understood you planned to take it from a runcard seed.

Cmurilochem · 2023-11-06T13:51:55Z

Don't worry I'm fine either way, was just making sure it wasn't something you had forgotten about as I had understood you planned to take it from a runcard seed.

Great! Maybe we could add this feature in the near future as long as we feel the need to do so. I will keep this in mind.

Radonirinaunimi · 2023-11-06T13:53:05Z

Thanks a lot @Cmurilochem! I guess the only minor thing missing in order to merge this PR is a small note in the documentation (at the end of https://docs.nnpdf.science/n3fit/hyperopt.html?highlight=hyperopt) describing how one can restart hyperopt.

Cmurilochem · 2023-11-06T14:00:24Z

Thanks a lot @Cmurilochem! I guess the only minor thing missing in order to merge this PR is a small note in the documentation (at the end of https://docs.nnpdf.science/n3fit/hyperopt.html?highlight=hyperopt) describing how one can restart hyperopt.

Hi @Radonirinaunimi. I could add a note after Changing the hyperoptimization target and let you know after the commit has been done.

RoyStegeman · 2023-11-06T14:39:26Z

Ah good point! It needs to be documented of course. Completely forgot about that 😅

Cmurilochem · 2023-11-06T15:11:22Z

Ah good point! It needs to be documented of course. Completely forgot about that 😅

Thanks @Radonirinaunimi and @Radonirinaunimi. Documentation added! Please, feel free to suggest any possible changes and/or additions.

Co-authored-by: Roy Stegeman <roystegeman@live.nl>

Cmurilochem · 2023-11-07T14:13:48Z

Hi @RoyStegeman, @Radonirinaunimi and @scarlehoff. Please, let me know whether I could merge this PR after the approval of Roy and Tanjona. Thanks you all for your very valuable suggestions.

RoyStegeman · 2023-11-07T14:26:18Z

Yes, please merge this

Cmurilochem added 7 commits October 24, 2023 16:38

Added pkl_file attribute and extra FileTrials methods that save/load …

d9992ae

…pickle files

Added --continue option to N3FitApp

062b030

Added restart_hyperopt attribute to HyperScanner

7dd614b

Added rstate attribute to FileTrials needed to store the last random.…

3050923

…Generator

Added restart option in hyper_scan_wrapper

632ff19

Added docs to hyper_scan_wrapper

4f1bdfa

Added provisory test for hyperopt restart

6db6c30

This was linked to issues Oct 24, 2023

Allow for hyperoptimization restart from tries.json #1807

Closed

Restart Hyperopt from pickle tries.pkl file #1815

Closed

Cmurilochem requested review from Radonirinaunimi, RoyStegeman and scarlehoff October 24, 2023 17:50

Cmurilochem added 2 commits October 26, 2023 21:24

Change in the way seeds are generated for each k-fold

46ccc57

Fix in test_hyperopt.py

6d00ade

Cmurilochem added 2 commits October 30, 2023 07:40

Fix in test_hyperopt.py

e80895d

Removed tmp_path argument from test

4992ba8

RoyStegeman reviewed Oct 30, 2023

View reviewed changes

RoyStegeman mentioned this pull request Oct 30, 2023

add skip_magic_trailing_comma to black config #1828

Merged

Updated test to make it work in CI/CD

04fd585

RoyStegeman reviewed Oct 30, 2023

View reviewed changes

Cmurilochem added 5 commits October 30, 2023 13:28

Updated tries.json and tries.pkl paths definition without format

1490757

Run isort in test_hyperopt.py

01842c8

Changed keyword from continue to restart

3ea6fea

Changed step_size to 10 in stopping method

693fde1

Fix test_hyperopt to use runcard in regressions folder

53237df

Radonirinaunimi reviewed Nov 1, 2023

View reviewed changes

Cmurilochem and others added 8 commits November 6, 2023 07:40

Updated filetrials.py; fix in rstate docstring

08af2dd

Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>

Updated filetrials.py; fix in rstate docstring

1f2445c

Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>

Update n3fit/src/n3fit/hyper_optimization/filetrials.py

9dd531c

Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>

Updated filetrials.py; fix in from_pkl docstring

5ce4c02

Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>

Updated hyper_scan.py; fix in HyperScanner

2e2b9f9

Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>

Fix in test_hyperopt; added output_restart and output_direct as sub-d…

7d3c219

…irectories of tmp_path

Fix in from_pkl exception

200c707

Removed static typing and added more descritive docstring to rstate a…

ba84878

…ttribute

Cmurilochem marked this pull request as ready for review November 6, 2023 11:03

RoyStegeman approved these changes Nov 6, 2023

View reviewed changes

Cmurilochem added 3 commits November 6, 2023 15:59

Added documentation on hyperopt restarts

d720280

Fix in reference docs

260716f

Fix in reference to pickle

a00cc9f

RoyStegeman reviewed Nov 6, 2023

View reviewed changes

Comment thread doc/sphinx/source/n3fit/hyperopt.rst Outdated

Fix in hyperopt.rst docs

ec4d507

Co-authored-by: Roy Stegeman <roystegeman@live.nl>

Radonirinaunimi approved these changes Nov 6, 2023

View reviewed changes

Cmurilochem merged commit 68a372f into master Nov 7, 2023

Cmurilochem deleted the restart_hyperopt branch November 7, 2023 18:35

scarlehoff mentioned this pull request Nov 14, 2023

Hyperopt checkpointing/restarting #1800

Closed

Conversation

Cmurilochem commented Oct 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comments on the initial changes made

1. hyper_optimization/filetrials.py

2. hyper_optimization/hyper_scan.py

3. scripts/n3fit_exec.py

Questions and requested feedback

Solution to the random integer issue described above

4. model_trainer.py

Note

Uh oh!

RoyStegeman commented Oct 25, 2023

Uh oh!

Cmurilochem commented Oct 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RoyStegeman commented Oct 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Radonirinaunimi commented Oct 26, 2023

Uh oh!

RoyStegeman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Cmurilochem commented Oct 30, 2023

Uh oh!

RoyStegeman left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Radonirinaunimi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Cmurilochem commented Nov 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RoyStegeman left a comment

Choose a reason for hiding this comment

Uh oh!

Cmurilochem commented Nov 6, 2023

Uh oh!

RoyStegeman commented Nov 6, 2023

Uh oh!

Cmurilochem commented Nov 6, 2023

Uh oh!

Radonirinaunimi commented Nov 6, 2023

Uh oh!

Cmurilochem commented Nov 6, 2023

Uh oh!

RoyStegeman commented Nov 6, 2023

Uh oh!

Cmurilochem commented Nov 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Cmurilochem commented Nov 7, 2023

Uh oh!

RoyStegeman commented Nov 7, 2023

Uh oh!

Reviewers

Cmurilochem commented Oct 24, 2023 •

edited

Loading

1. `hyper_optimization/filetrials.py`

2. `hyper_optimization/hyper_scan.py`

3. `scripts/n3fit_exec.py`

4. `model_trainer.py`

Cmurilochem commented Oct 26, 2023 •

edited

Loading

RoyStegeman commented Oct 26, 2023 •

edited

Loading

RoyStegeman left a comment •

edited

Loading

Cmurilochem commented Nov 6, 2023 •

edited

Loading

Cmurilochem commented Nov 6, 2023 •

edited

Loading