Skip to content

Automated selection of models via a best_chi2_worse_phi2 algorithm #1962

@Cmurilochem

Description

@Cmurilochem

As a continuation of #1943, I managed to automate the selection of best models via the @juanrojochacon's hyperopt algorithm wherein data of 1/$\varphi^{2}$ is used to decide on the best $\chi^{2}$ hyperpoint. Here I am just referring to it as best_chi2_worse_phi2 algorithm.

To this end, I made a post-fit script which is primarily based on the validphys vp_hyperoptplot.py module. I did so in such a way to make our implementation easier later. Just in case I attach it here analysis_hyperopt.zip.

The core of the idea is presented in the code snippet below:

args = {
    'loss_target': 'best_chi2_worst_phi2',    # select Juan & Roy's algorithm
    'max_phi2_points': 10,                             # select the n lowest values of 1/phi2
    'threshold': 3.0,
}

if args.loss_target == "best_chi2_worst_phi2":
        minimum = dataframe.loss[best_idx]
        std = np.std(dataframe.loss)
        lim_max = dataframe.loss[best_idx] + std
        # select rows with chi2 losses within the best point and lim_max
        selected_chi2 = dataframe[(dataframe.loss >= minimum) & (dataframe.loss <= lim_max)]
        # among the selected points, select the nth lowest in 1/phi2
        selected_phi2 = selected_chi2.loss_reciprocal_phi2.nsmallest(args.max_phi2_points)
        # find the location of these points in the dataframe
        indices = dataframe[dataframe['loss_reciprocal_phi2'].isin(selected_phi2)].index
        best_trial = dataframe.loc[indices]

Here, I define an internal between the chi2 minimum and 1 standard deviation std from which I will monitor later on the corresponding 1/phi2 values. For these, I get the nth lowest 1/phi2 hyperpoints and save the selected models into best_trial. In the zip attached file I take as example the runs I discussed on Monday using 10 replicas (because I have much more points to test the algorithm). The final plot is show below:
best_chi2_worst_phi2_plot

The yellow region defines the interval between chi2 minimum (grey circle) and 1 standard deviation std of the loss data. I also asked the script to give me 10 models within this region which show the lowest 1/phi2's (cyan circles).

Questions

  • Is 1 std sufficient for our purposes ? Note that for the analysis I selected a loss threshold of 3. So, all models showing higher losses were excluded from the DataFrame and analysis.
  • When looking at 1/phi2 values which option is more physically sound and the best: (i). 1/ < phi2 > or (ii). <1/phi2> ? Note that in the analysis I use <1/phi2>.
  • Is the idea to implement this later in validphys ? I tried to run the vp-hyperoptplot but it always complains about the need for pandoc (even if I have pandoc installed).

I would appreciate any comments and idea to improve are always welcome.

Metadata

Metadata

Labels

esciencequestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions