Skip to content

Get covariant matrices from DataGroupSpec #1887

@Cmurilochem

Description

@Cmurilochem

Background

At the current state of #1726, I am now able to calculate $\varphi^{2}_k$ within each $k$ fold with the help of validphys. However, one important thing that should be discussed is that so far my $\varphi^{2}_k$'s are calculated considering all input data sets as contained in experiments_data which is injected by validphys into performfit.

More on experiments_data

Just to clarify, experiments_data is a list of validphys.core.DataGroupSpec. For example, a HERACOMB instance of DataGroupSpec can be composed of HERACOMBCCEM, HERACOMBCCEP, etc,.. exp datasets which are in turn instances of validphys.core.DataSetSpec.

Group covariant matrices

I am currently extracting group covariant matrices from self.exp_info, e.g., via [exp_dict["covmat"] for exp_dict in self.exp_info[0]] which gives the covmats corresponding to the validphys.core.DataGroupSpec. (note: I could also have taken covmats from self.experimental["output"] which is a list of n3fit.model_gen.ObservableWrapper)

Problem

Now, I wanted instead to calculate $\varphi^{2}_k$ using only exp data within the hold out fold (which is actually what we do when calculating hyper_losses with $\chi^{2}_k$).

The only way I found was to extract from each element of experiments_data (DataGroupSpec), DataSetSpecs that are within partition["datasets"] that contain the names of the datasets used for training/validation in that fold.
For example:

   filtered_datagroupspec_list = []

    # loop over `DataGroupSpec`
    for datagroup in experiments_data:
        filtered_datasetspec_list = []

        # each `DataGroupSpec` is composed by several `DataSetSpec` instances
        for dataset in datagroup.datasets:

            # exclude `DataSetSpec`s that are used for training/validation within that fold. 
            if dataset.name not in partition["datasets"]:
                filtered_datasetspec_list.append(dataset)

        # list of experiments as `DataGroupSpec` in the hold out fold
        filtered_datagroupspec_list.append(
            DataGroupSpec(name=f"{datagroup.name}_red", datasets=filtered_datasetspec_list)
        )

Questions

  • The problem is that once I have excluded datasets, my group covariant matrices should not have the same dimensionality as before and I have to somehow process them as well. How could I do that ? Maybe the problem is that I am not quite sure If I understand what they truly represent and why they have to be defined in groups. Why I do not have covmats for each experiment ?
  • Would it be easier to then calculate/get (if not computationally intensive) covmats from the list of DataGroupSpecs ? I could not find a way so far by examining validphys2/src/validphys/covmats.py. How could I do that ?

Please, does anyone have an idea ? Thanks once again for you help!

Metadata

Metadata

Labels

esciencequestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions