Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
233 changes: 233 additions & 0 deletions doc/validphys2/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -1942,6 +1942,239 @@ In addition, errors will be raised if the input directory is not a valid fit (fo
If the user wishes to add their own, non-standard files, then it is advisable to avoid using the fit name in these
files as the `fitrename` command will also rename these files.

### Fits with a Theory Covariance Matrix

Fits can be ran with a contribution to the covarince matrix obtained from
performing scale variations. `validphys` also has flags which control the how
whether the covariance matrix used to calculate statistical estimators should
include a contribution from the theory covariance matrix. Getting the various
flags that control these behaviours correct in both fit and `validphys` runcards
is critical to getting sensible results. Examples with explanation will be
provided here to demonstrate how to run a fit with a theory covariance matrix
and then use the various `validphys` analysis tools on the fit.

#### Running a fit with Theory Covariance Matrix

In order to run a fit with a theory covariance the user must first specify that
`datasets` are all part of the same `experiment`. An example of how this is done
in practise is provided with the `experiments` section of a DIS only fit
runcard:

```yaml
experiments:
- experiment: BIGEXP
datasets:
- {dataset: NMCPD, frac: 0.5}
- {dataset: NMC, frac: 0.5}
- {dataset: SLACP, frac: 0.5}
- {dataset: SLACD, frac: 0.5}
- {dataset: BCDMSP, frac: 0.5}
- {dataset: BCDMSD, frac: 0.5}
- {dataset: CHORUSNU, frac: 0.5}
- {dataset: CHORUSNB, frac: 0.5}
- {dataset: NTVNUDMN, frac: 0.5}
- {dataset: NTVNBDMN, frac: 0.5}
- {dataset: HERACOMBNCEM, frac: 0.5}
- {dataset: HERACOMBNCEP460, frac: 0.5}
- {dataset: HERACOMBNCEP575, frac: 0.5}
- {dataset: HERACOMBNCEP820, frac: 0.5}
- {dataset: HERACOMBNCEP920, frac: 0.5}
- {dataset: HERACOMBCCEM, frac: 0.5}
- {dataset: HERACOMBCCEP, frac: 0.5}
- {dataset: HERAF2CHARM, frac: 0.5}
```

The `datasets` must be part of the same single `experiment` for the theory
covariance matrix which is generated when the user runs `vp-setupfit` to be
compatible with the fitting infrastructure.

The next step is to specify the `theorycovmatconfig`. This namespace controls
which point prescription is used to generate the theory covariance matrix, and
whether the theory covariance matrix will be used in the sampling of the
pseudodata, the fitting of the data or both.

The different prescriptions for scale variation are 3-point, 5-point,
5bar-point, 7-point and 9-point, depending on which presciption the user decides
to use, they must provide the correct number and combination of `theoryids`. In
addition to this if 5 or 5bar is being used then the user must specify which of
these prescriptions to use with the `fivetheories` flag. There are also two
options for the 7-point presciption, the default is 'Gavin's' prescription but
the user can also specify `seventheories: original`.

The various configuration options might seem overwhelming, so for each of the
presciptions the appropriate `theoryids` and additional flags required are
provided below, ready to be pasted into a report runcard.

---------------------------------------------------------------------

##### 3-point
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitively need to make it so that this information is stored somewhere and one does not need to duplicate it all the time.


```yaml
theorycovmatconfig:
theoryids:
- 163
- 180
- 173

```

##### 5-point

```yaml
theorycovmatconfig:
theoryids:
- 163
- 177
- 176
- 179
- 174
fivetheories: nobar

```

##### 5bar-point

```yaml
theorycovmatconfig:
theoryids:
- 163
- 180
- 173
- 175
- 178
fivetheories: bar

```

##### 7-point original

```yaml
theorycovmatconfig:
theoryids:
- 163
- 177
- 176
- 179
- 174
- 180
- 173
seventheories: original
```

##### 7-point Gavin (default)

```yaml
theorycovmatconfig:
theoryids:
- 163
- 177
- 176
- 179
- 174
- 180
- 173

```

##### 9-point

```yaml
theorycovmatconfig:
theoryids:
- 163
- 177
- 176
- 179
- 174
- 180
- 173
- 175
- 178
```

---------------------------------------------------------------------

Once the user has correctly specified the `theoryids` and additional flags for
their chosen prescription then the user must specify which PDF will be used to
generate the theory 'points' required to construct the theory covariance matrix.
The user must additionally specify where the theory covariance is to be used.
The theory covariance can be used to sample the pseudodata by setting
`use_thcovmat_in_sampling: true`, likewise the theory covariance can be included
in covariance matrix used in the fit by specifying
`use_thcovmat_in_fitting: true`.

Combining all of the above information, if one wanted to run a fit using the
theory covariance, calculated using the 9-point prescription, in both the
fitting and sampling with `NNPDF31_nlo_as_0118` used to generate the
covariance matrix then the complete `theorycovmatconfig` would be:

```yaml
theorycovmatconfig:
theoryids:
- 163
- 177
- 176
- 179
- 174
- 180
- 173
- 175
- 178
pdf: NNPDF31_nlo_as_0118
use_thcovmat_in_fitting: true
use_thcovmat_in_sampling: true
```

#### Using `validphys` statistical estimators with theory covariance

Once a fit has been ran with the theory covariance, it is necessary to use the
theory covariance matrix in estimators such as calculating the chi² in order to
get meaningful values. This behaviour is controlled by the flag
`use_thcovmat_if_present`, which by default the flag is set to `False`.

If the user specifies `use_thcovmat_if_present: True` then they must also
specify a corresponding `fit`. The configuration file for that `fit` will be
read. If `use_thcovmat_in_fitting: True` then validphys will locate the theory
covariance matrix used during the fit and add it to the experimental
covariance matrix, for use in statistical estimators such as chi². A simple
example of this would be

```yaml
dataset_input: {dataset: HERAF2CHARM}

use_thcovmat_if_present: True

fit: 190310-tg-nlo-global-7pts

use_cuts: "fromfit"

pdf:
from_: fit

theory:
from_: fit

theoryid:
from_: theory

actions_:
- dataset_chi2_table
```

It should be noted that any `dataset_input` specified in the same runcard that
`use_thcovmat_if_present: True` must have been fitted in the corresponding
`fit`. If the corresponding fit has `use_thcovmat_if_present: False` then the
user will be warned and there will be no contribution from the theory covariance
matrix used in calculating statistical estimators for that runcard.

When using the `vp-comparefits` application, the user **must** either specify
the commandline flag `--thcovmat_if_present` or `--no-thcovmat_if_present`
which set `use_thcovmat_if_present` to `True` or `False` respectively.

If the user uses the interactive mode, `vp-comparefits -i`, then they will be
prompted to select whether or not to use the theory covariance matrix, if
available, in the report if they have not already specified on the command line.

Parallel mode
-------------

Expand Down
31 changes: 31 additions & 0 deletions validphys2/src/validphys/checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,37 @@ def check_cuts_considered(use_cuts):
raise CheckError(f"Cuts must be computed for this action, but they are set to {use_cuts.value}")


@make_argcheck
def check_dataset_cuts_match_theorycovmat(dataset, fitthcovmat):
if fitthcovmat:
ds_index = fitthcovmat.load().index.get_level_values(1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would keep very expensive things outside the checks, and instead have a way of loading metadata on one side for verification purposes and full big objects on the other side. Anyway, that is just a comment.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess ideally the actual cuts would be stored in the metadata, so that not only could the metadata be used here but also the check would be better than just comparing number of points

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is knowing the namespace that produced the cuts, as per #224. But probably we should have both. At some point I thought about keeping the point index always relative to the full dataset (in e.g. experiments_index) but I think I decided against it because nnfit doesn't do that, and could be confusing. We should probably revisit that.

Copy link
Copy Markdown
Contributor Author

@wilsonmr wilsonmr Apr 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sure

At some point I thought about keeping the point index always relative to the full dataset

Yeah I can see pros and cons for this - could be worth looking at though

ncovmat = (ds_index == dataset.name).sum()

cuts = dataset.cuts
if cuts:
ndata = len(dataset.cuts.load())
else:
ndata = dataset.commondata.ndata
check(ndata == ncovmat)


@make_argcheck
def check_experiment_cuts_match_theorycovmat(
experiment, fitthcovmat):
for dataset in experiment.datasets:
if fitthcovmat:
ds_index = fitthcovmat.load().index.get_level_values(1)
ncovmat = (ds_index == dataset.name).sum()

cuts = dataset.cuts
if cuts:
ndata = len(dataset.cuts.load())
else:
ndata = dataset.commondata.ndata
check(ndata == ncovmat)



@make_argcheck
def check_have_two_pdfs(pdfs):
check(len(pdfs) == 2,'Expecting exactly two pdfs.')
Expand Down
10 changes: 6 additions & 4 deletions validphys2/src/validphys/comparefittemplates/comparecard.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,14 +74,14 @@ pdf_report:

template: report.md

experiments:
from_: fit

positivity:
from_: fit

dataspecs:
- theoryid:
- experiments:
from_: fit
theoryid:
from_: current
pdf:
from_: current
Expand All @@ -91,7 +91,9 @@ dataspecs:
from_: current


- theoryid:
- experiments:
from_: fit
theoryid:
from_: reference
pdf:
from_: reference
Expand Down
9 changes: 5 additions & 4 deletions validphys2/src/validphys/comparefittemplates/report.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ Fit summary
------------------
{@ summarise_fits @}

Theory Covariance Summary
-------------------------
{@summarise_theory_covmat_fits@}

Dataset properties
------------------
{@current datasets_properties_table@}
Expand Down Expand Up @@ -44,10 +48,7 @@ $\chi^2$ by dataset comparisons

$\phi$ by experiment
--------------------
{@with dataspecs@}
### {@fit@}
{@plot_phi@}
{@endwith@}
{@plot_fits_experiments_phi@}

Experiment plots
---------------
Expand Down
Loading