Hyperopt loss#1726
Conversation
Description of what happens with hyperoptimizationFor the benefit of my future self and my colleagues @goord and @Cmurilochem I describe here exactly what happens during hyperoptimization. It mainly describes how it works in master, so far this branch only changes it on two points, as indicated. Single TrialA "trial" is the entire evaluation of one set of hyperparameters. One trial in the code is one execution of the function FoldsTrials are scored by their For each fold, the datasets in that fold are left out completely, and the model is trained as normal on the remaining k-1 folds. The model is trained on the k-1 folds (masked with the training mask). Early exit 1 Then the Early exit 2 LossWe get Managing trialsResults are saved in The trials start from the function Inside possible issueon a test with 2 trials varying only the nodes per layer, it tested (27,12,8) and then (30,36,8), but the best parameters printed were the default (25,20,8). I'm not sure what exactly happens in this final call to Some thoughts on improvementsParallelizing trials, checkpointingTrials can be run in parallel using MongoDB, as described here. K-foldingWhen using K-folding, we now have 3 classes of data at every fold:
Both 2 and 3 are not seen during training. The difference in how they are used is that 2. is used for the "early exit 1" above, while 3 is used to compute the hyper loss. If that was all, I'd say 2 is wasted as this condition is very unlikely to trigger even with one replica. We also discussed a long time ago that perhaps K-folding won't be necessary anymore. The point of it is to have a more accurate validation loss, but if we run with say 100 replicas, it is already more accurate. We would still need 3 groups of data, to avoid the bias of joining 2 and 3, however they could be split randomly per dataset, (per replica). The advantage would be that it'd require only a single run per trial (slightly bigger, but still nearly 4x speedup), and simplify the code. Not sure though what the effect on the learning would be. Code improvements
|
Implementation of a new metricsAs a continuation of the work by @APJansen, I intent to add a new hyperoptimization metrics to the where the first term represents our usual averaged-over-replicas hyper loss, Refactoring
|
|
We talked again about refactoring this, and we decided to keep the changes minimal for now, just to allow implementation of the phi2 loss. We can refactor the code about hyperoptimization inside model_trainer later, once we have things running. We made some assumptions listed below, are these correct @scarlehoff? AssumptionsPreviously So now we assume instead that first the losses are computed per fold, so already having some kind of aggregation of the replicas. This is done in the training loop, as it is quit early if it is too high. And then there are also penalties, which currently are all computed per replica. We assumed for now that this is general enough, and that we always want to take an average over the per-replica penalties, and add this to the loss. |
Ah, since there were no questions I thought that the example I put there was enough. Not sure what to add (i.e., not sure what the showstoppers are?)
The thing is that, while Regarding the assumptions:
No. But at the end of the fold (when you have an ensemble of model) you can easily create another model which is the average of all previous models (ideally discarding outliers) over the axis of the replicas. With that you can compute the phi.
Yes. Let's not care about the penalties for the time being. I don't think they are needed. |
Start implementation of
|
Question regarding added tests@scarlehoff, I added new tests into test_hyperopt.py. To test the calculation of |
Make sure the seeds are also fixed for numpy. Note that when the fit is set to however while the tests should be robust in linux (they empirically seem to be) I'm not so sure about mac m1 (i.e., if you are locally running on a mac and then trying to test in the ci... not sure what will happen) |
Thanks @scarlehoff. I have added a custom |
a94f6a2 to
edd5e4e
Compare
8439ff9 to
c499725
Compare
09f929d to
f88fbb5
Compare
Done. Just added docs in f88fbb5 |
RoyStegeman
left a comment
There was a problem hiding this comment.
Thanks, the docs look good
Co-authored-by: Roy Stegeman <roystegeman@live.nl>
Co-authored-by: Roy Stegeman <roystegeman@live.nl>
Co-authored-by: Roy Stegeman <roystegeman@live.nl>
Co-authored-by: Roy Stegeman <roystegeman@live.nl>
Co-authored-by: Roy Stegeman <roystegeman@live.nl>
9cee50c to
2050911
Compare
Co-authored-by: Roy Stegeman <roystegeman@live.nl>
Co-authored-by: Roy Stegeman <roystegeman@live.nl>
Improving hyperoptimization, experimenting with different hyperoptimization loss functions
Tasks done in this PR
HyperLossclass with buit-in methods that can automatically perform statistics over replicas and then folds. The user can select statistics viareplica_statisticandfold_statisticin runcardkfold.loss_typeoption in runcardkfold.Hyperopt(tries.json(specifically withinkfold_metaentry) a matrix (folds x replicas) of calculatedhyper_losses_chi2and a vector (folds) ofhyper_losses_phi2.Description
The implemented
HyperLossis instantiated withinModelTrainerand later on used inModelTrainer.hyperparametrizable. The user must pass three paramaters that are set in the runcard:loss_type: The type of loss to be used. Options arechi2orphi2.replica_statistic: The statistics over replicas to be used within each fold. Forloss_type = chi2, it can assume the usual statistics:average,best_worstandstd. Note:replica_statisticis inactive ifloss_type = phi2asfold_statistic: The statistics over folds. Options are:average,best_worstandstd.loss_type: phi2andfold_statistic: average, the figure of merit to be minimised is actually:The current implementation of$\varphi^2_{k}$ is based on
validphysfunctions. It is evaluated using only experimental data within the hold out fold (as expected).Runcard examples
hyper_lossis set as thehyper_lossas the inverse of the max value ofNotes
It must be merged after #1788 as the current
hyperopt_lossbranch has been created fromtrvl-mask-layers.