NNPDF · Zaharid · May 7, 2019 · Apr 10, 2019 · Apr 12, 2019 · Apr 12, 2019
diff --git a/doc/validphys2/guide.md b/doc/validphys2/guide.md
@@ -1942,6 +1942,239 @@ In addition, errors will be raised if the input directory is not a valid fit (fo
 If the user wishes to add their own, non-standard files, then it is advisable to avoid using the fit name in these
 files as the `fitrename` command will also rename these files.
 
+### Fits with a Theory Covariance Matrix
+
+Fits can be ran with a contribution to the covarince matrix obtained from
+performing scale variations. `validphys` also has flags which control the how
+whether the covariance matrix used to calculate statistical estimators should
+include a contribution from the theory covariance matrix. Getting the various
+flags that control these behaviours correct in both fit and `validphys` runcards
+is critical to getting sensible results. Examples with explanation will be
+provided here to demonstrate how to run a fit with a theory covariance matrix
+and then use the various `validphys` analysis tools on the fit.
+
+#### Running a fit with Theory Covariance Matrix
+
+In order to run a fit with a theory covariance the user must first specify that
+`datasets` are all part of the same `experiment`. An example of how this is done
+in practise is provided with the `experiments` section of a DIS only fit
+runcard:
+
+```yaml
+experiments:
+- experiment: BIGEXP
+  datasets:
+  - {dataset: NMCPD, frac: 0.5}
+  - {dataset: NMC, frac: 0.5}
+  - {dataset: SLACP, frac: 0.5}
+  - {dataset: SLACD, frac: 0.5}
+  - {dataset: BCDMSP, frac: 0.5}
+  - {dataset: BCDMSD, frac: 0.5}
+  - {dataset: CHORUSNU, frac: 0.5}
+  - {dataset: CHORUSNB, frac: 0.5}
+  - {dataset: NTVNUDMN, frac: 0.5}
+  - {dataset: NTVNBDMN, frac: 0.5}
+  - {dataset: HERACOMBNCEM, frac: 0.5}
+  - {dataset: HERACOMBNCEP460, frac: 0.5}
+  - {dataset: HERACOMBNCEP575, frac: 0.5}
+  - {dataset: HERACOMBNCEP820, frac: 0.5}
+  - {dataset: HERACOMBNCEP920, frac: 0.5}
+  - {dataset: HERACOMBCCEM, frac: 0.5}
+  - {dataset: HERACOMBCCEP, frac: 0.5}
+  - {dataset: HERAF2CHARM, frac: 0.5}
+```
+
+The `datasets` must be part of the same single `experiment` for the theory
+covariance matrix which is generated when the user runs `vp-setupfit` to be
+compatible with the fitting infrastructure.
+
+The next step is to specify the `theorycovmatconfig`. This namespace controls
+which point prescription is used to generate the theory covariance matrix, and
+whether the theory covariance matrix will be used in the sampling of the
+pseudodata, the fitting of the data or both.
+
+The different prescriptions for scale variation are 3-point, 5-point,
+5bar-point, 7-point and 9-point, depending on which presciption the user decides
+to use, they must provide the correct number and combination of `theoryids`. In
+addition to this if 5 or 5bar is being used then the user must specify which of
+these prescriptions to use with the `fivetheories` flag. There are also two
+options for the 7-point presciption, the default is 'Gavin's' prescription but
+the user can also specify `seventheories: original`.
+
+The various configuration options might seem overwhelming, so for each of the
+presciptions the appropriate `theoryids` and additional flags required are
+provided below, ready to be pasted into a report runcard.
+
+---------------------------------------------------------------------
+
+##### 3-point
+
+```yaml
+theorycovmatconfig:
+  theoryids:
+  - 163
+  - 180
+  - 173
+
+```
+
+##### 5-point
+
+```yaml
+theorycovmatconfig:
+  theoryids:
+  - 163
+  - 177
+  - 176
+  - 179
+  - 174
+  fivetheories: nobar
+
+```
+
+##### 5bar-point
+
+```yaml
+theorycovmatconfig:
+  theoryids:
+  - 163
+  - 180
+  - 173
+  - 175
+  - 178
+  fivetheories: bar
+
+```
+
+##### 7-point original
+
+```yaml
+theorycovmatconfig:
+  theoryids:
+  - 163
+  - 177
+  - 176
+  - 179
+  - 174
+  - 180
+  - 173
+  seventheories: original
+```
+
+##### 7-point Gavin (default)
+
+```yaml
+theorycovmatconfig:
+  theoryids:
+  - 163
+  - 177
+  - 176
+  - 179
+  - 174
+  - 180
+  - 173
+
+```
+
+##### 9-point
+
+```yaml
+theorycovmatconfig:
+  theoryids:
+  - 163
+  - 177
+  - 176
+  - 179
+  - 174
+  - 180
+  - 173
+  - 175
+  - 178
+```
+
+---------------------------------------------------------------------
+
+Once the user has correctly specified the `theoryids` and additional flags for
+their chosen prescription then the user must specify which PDF will be used to
+generate the theory 'points' required to construct the theory covariance matrix.
+The user must additionally specify where the theory covariance is to be used.
+The theory covariance can be used to sample the pseudodata by setting
+`use_thcovmat_in_sampling: true`, likewise the theory covariance can be included
+in covariance matrix used in the fit by specifying
+`use_thcovmat_in_fitting: true`.
+
+Combining all of the above information, if one wanted to run a fit using the
+theory covariance, calculated using the 9-point prescription, in both the
+fitting and sampling with `NNPDF31_nlo_as_0118` used to generate the
+covariance matrix then the complete `theorycovmatconfig` would be:
+
+```yaml
+theorycovmatconfig:
+  theoryids:
+  - 163
+  - 177
+  - 176
+  - 179
+  - 174
+  - 180
+  - 173
+  - 175
+  - 178
+  pdf: NNPDF31_nlo_as_0118
+  use_thcovmat_in_fitting: true
+  use_thcovmat_in_sampling: true
+```
+
+#### Using `validphys` statistical estimators with theory covariance
+
+Once a fit has been ran with the theory covariance, it is necessary to use the
+theory covariance matrix in estimators such as calculating the chi² in order to
+get meaningful values. This behaviour is controlled by the flag
+`use_thcovmat_if_present`, which by default the flag is set to `False`.
+
+If the user specifies `use_thcovmat_if_present: True` then they must also
+specify a corresponding `fit`. The configuration file for that `fit` will be
+read. If `use_thcovmat_in_fitting: True` then validphys will locate the theory
+covariance matrix used during the fit and add it to the experimental
+covariance matrix, for use in statistical estimators such as chi². A simple
+example of this would be
+
+```yaml
+dataset_input: {dataset: HERAF2CHARM}
+
+use_thcovmat_if_present: True
+
+fit: 190310-tg-nlo-global-7pts
+
+use_cuts: "fromfit"
+
+pdf:
+  from_: fit
+
+theory:
+  from_: fit
+
+theoryid:
+  from_: theory
+
+actions_:
+    - dataset_chi2_table
+```
+
+It should be noted that any `dataset_input` specified in the same runcard that
+`use_thcovmat_if_present: True` must have been fitted in the corresponding
+`fit`. If the corresponding fit has `use_thcovmat_if_present: False` then the
+user will be warned and there will be no contribution from the theory covariance
+matrix used in calculating statistical estimators for that runcard.
+
+When using the `vp-comparefits` application, the user **must** either specify
+the commandline flag `--thcovmat_if_present` or `--no-thcovmat_if_present`
+which set `use_thcovmat_if_present` to `True` or `False` respectively.
+
+If the user uses the interactive mode, `vp-comparefits -i`, then they will be
+prompted to select whether or not to use the theory covariance matrix, if
+available, in the report if they have not already specified on the command line.
+
 Parallel mode
 -------------
 

diff --git a/validphys2/src/validphys/checks.py b/validphys2/src/validphys/checks.py
@@ -103,6 +103,37 @@ def check_cuts_considered(use_cuts):
         raise CheckError(f"Cuts must be computed for this action, but they are set to {use_cuts.value}")
 
 
+@make_argcheck
+def check_dataset_cuts_match_theorycovmat(dataset, fitthcovmat):
+    if fitthcovmat:
+        ds_index = fitthcovmat.load().index.get_level_values(1)
+        ncovmat = (ds_index == dataset.name).sum()
+
+        cuts = dataset.cuts
+        if cuts:
+            ndata = len(dataset.cuts.load())
+        else:
+            ndata = dataset.commondata.ndata
+        check(ndata == ncovmat)
+
+
+@make_argcheck
+def check_experiment_cuts_match_theorycovmat(
+        experiment, fitthcovmat):
+    for dataset in experiment.datasets:
+        if fitthcovmat:
+            ds_index = fitthcovmat.load().index.get_level_values(1)
+            ncovmat = (ds_index == dataset.name).sum()
+
+            cuts = dataset.cuts
+            if cuts:
+                ndata = len(dataset.cuts.load())
+            else:
+                ndata = dataset.commondata.ndata
+            check(ndata == ncovmat)
+
+
+
 @make_argcheck
 def check_have_two_pdfs(pdfs):
     check(len(pdfs) == 2,'Expecting exactly two pdfs.')

diff --git a/validphys2/src/validphys/comparefittemplates/comparecard.yaml b/validphys2/src/validphys/comparefittemplates/comparecard.yaml
@@ -74,14 +74,14 @@ pdf_report:
 
 template: report.md
 
-experiments:
-    from_: fit
 
 positivity:
       from_: fit
 
 dataspecs:
-  - theoryid:
+  - experiments:
+      from_: fit
+    theoryid:
       from_: current
     pdf:
       from_: current
@@ -91,7 +91,9 @@ dataspecs:
       from_: current
 
 
-  - theoryid:
+  - experiments:
+      from_: fit
+    theoryid:
       from_: reference
     pdf:
       from_: reference

diff --git a/validphys2/src/validphys/comparefittemplates/report.md b/validphys2/src/validphys/comparefittemplates/report.md
@@ -4,6 +4,10 @@ Fit summary
 ------------------
 {@ summarise_fits @}
 
+Theory Covariance Summary
+-------------------------
+{@summarise_theory_covmat_fits@}
+
 Dataset properties
 ------------------
 {@current datasets_properties_table@}
@@ -44,10 +48,7 @@ $\chi^2$ by dataset comparisons
 
 $\phi$ by experiment
 --------------------
-{@with dataspecs@}
-### {@fit@}
-{@plot_phi@}
-{@endwith@}
+{@plot_fits_experiments_phi@}
 
 Experiment plots
 ---------------