From 883ca61180a3ee865c6f85fed9ed9c8b27199169 Mon Sep 17 00:00:00 2001
From: RoyStegeman <roystegeman@live.nl>
Date: Wed, 1 Dec 2021 12:16:14 +0100
Subject: [PATCH 1/7] add description of different figures of merit to docs

---
 doc/sphinx/source/figuresofmerit/index.rst    | 149 ++++++++++++++++++
 doc/sphinx/source/index.rst                   |   1 +
 .../source/tutorials/thcov_tutorial.rst       |   1 +
 .../theorycovariance/theorycovarianceutils.py |   2 +-
 4 files changed, 152 insertions(+), 1 deletion(-)
 create mode 100644 doc/sphinx/source/figuresofmerit/index.rst

diff --git a/doc/sphinx/source/figuresofmerit/index.rst b/doc/sphinx/source/figuresofmerit/index.rst
new file mode 100644
index 0000000000..be9c09d05c
--- /dev/null
+++ b/doc/sphinx/source/figuresofmerit/index.rst
@@ -0,0 +1,149 @@
+Chi square figures of merit
+================================================================================
+
+Within the NNPDF methodology various figures of merit are used, which can all 
+be used depending on the situation. To avoid confusion, it is important to
+understand the differences between the various figures of merit, and to 
+understand which definition we are referring to in a given context. In 
+particular, it is worth stressing that whenever a figure of merit is discussed,
+the :math:`t_0` method (discussed below) applies.
+
+Here we we provide an overview of the different figures of merit, and discuss
+when each of them is used.
+
+
+The basis of the loss functions: 𝜒²
+--------------------------------------------------------------------------------
+The figures of merit used in the NNPDF methodology are all variations of the 
+chi square distribution:
+
+.. math::
+    \chi^{2}=\sum_{i, j}^{N_{\text {dat }}}(D-P)_{i} \sigma_{i j}^{-1}(D-P)_{j},
+
+where :math:`D_i` is the :math:`i`-th data point, :math:`P_i` is the convolution product
+between the FastKernel tables (insert link here) for point :math:`i` and the PDF model, and 
+:math:`\sigma_{ij}` is the covariance matrix between datapoints :math:`i` and 
+:math:`j`.
+
+The covariance matrix includes both uncorrelated and correlated experimental 
+statistical and systematic uncertainties, as given by the experimental 
+collaborations. 
+
+Note that this definition of :math:`\chi^2` is not used as a figure of merit
+anywhere in the NNDPF methodology. Instead, variations of this :math:`\chi^2`
+are used. These variations can are based on considering only subsets of data, 
+thus limiting the datasets that are summed over, or on an adjustment to the 
+covariance matrix :math:`\sigma_{ij}`.
+
+
+Avoiding bias: t₀ method
+--------------------------------------------------------------------------------
+The :math:`t_0` method introduce in https://arxiv.org/abs/0912.2276 aims to 
+remove systematic biases as a result of a naive treatment of multiplicative 
+uncertainties. This is done by redefining the covariance matrix in the 
+definition of :math:`\chi^2`, resulting in a covariance matrix
+:math:`\sigma_{t_0}` and a corresponding figure of merit sometimes denoted by 
+:math:`\chi^2_{t_0}`, though often simply written as :math:`\chi^2`.
+
+.. note::
+    From NNPDF2.0 onwards the t₀ formalism has been used to define the figure of
+    merit used during the fitting of the PDFs.
+
+
+Missing higher order uncertainties
+--------------------------------------------------------------------------------
+Another source of uncertainties that we may want to include in the covariance 
+matrix are theoretical uncertainties, particularly missing higher order 
+uncertainties estimated through scale variations. These unceratinties can be 
+considered in the figure of merit through the implementation of a 'theory 
+covariance matrix'. A paper discussing the formalism can be found here:
+https://arxiv.org/abs/1905.04311. For a tutorial see :ref:`How to include a
+theory covariance matrix in a fit<thcov_tutorial>`.
+
+
+Future test: including PDF errors
+--------------------------------------------------------------------------------
+To test the generalization power of the NNPDF fitting framework in the region
+where PDFs are not constrained by data, the 'future test' has been developed.
+The figure of merit considered in a future test is again the :math:`\chi^2`, 
+however, in this case the covariance matrix is not only the covariance matrix
+corresponding to the datasets, but it is instead the sum of the covariance 
+matrix describing the data uncertainties and the covariance matrix describing
+the PDF uncertainties. 
+
+For a more detailed discussion of the future test formalism see e.g. 
+https://arxiv.org/abs/2103.08606, or learn :ref:`How to run a Future Test
+<futuretests>`
+
+
+Regularized covariance matrices
+--------------------------------------------------------------------------------
+To provide a decorrelated (diagonal) covariance matrix that is as close as 
+possible to a corresponding experimental covariance matrix, the decorrelation
+procedure is applied. This procedure involves clipping the eigenvectors 
+until a target value if the stability metric :math:`Z_{\rm reg}` is achieved.
+For instance, if the target value is chosen to be :math:`Z_{\rm reg}=4`, then 
+the clipping algorithm transforms the original experimental eigenvalues that 
+were smaller than :math:`1/Z_{\rm reg}^2=1/16` are replaced by 
+:math:`1/16`.
+
+A more detailed discussion of the decorrelation procedure can be found in 
+sections 4.2 and 8.7 of the NNPDF4.0 paper :cite:p:`nnpdf40`. 
+
+
+The weighted fit method
+--------------------------------------------------------------------------------
+To determine whether a specific measurement is inconsistent with the global 
+dataset, one can produce a PDF determination that provides the best agreement
+to this dataset. One may then check whether this best agreement does or does not 
+lead to the deterioration of the agreement with one or more of the other data 
+included in the global dataset.
+
+When performing a weighted fit the figure of merit is hence redefined as 
+
+.. math::
+    \chi^{2}=\frac{1}{N_{\text {dat }}-N_{\text {dat }}^{(j)}}
+    \sum_{i \neq j}^{n_{\text {exp }}}N_{\text {dat }}^{(i)}\chi_{i}^{2}
+    +\omega^{(j)} \chi_{j}^{2}
+
+with :math:`w^{(j)}=N_{\rm dat}/N^{(j)}_{\rm dat}`.
+
+
+Experimental, validation, and training 𝜒²
+--------------------------------------------------------------------------------
+When performing a PDF fit we distinguish three different definitions of the 
+:math:`\chi^2` loss function, namely the experimental loss 
+:math:`\chi^2_{\rm exp}`, the training loss :math:`\chi^2_{rm tr}` and the 
+validation loss :math:`\chi^2_{val}`, all of which are defined using the 
+:math:`t_0` method. Here the experimental loss is calculated with respect to the
+experimental covariance matrix and corresponding central values, while the 
+training and validation losses are defined with respect to the psuedodata 
+replicas. 
+
+The training and validation losses are used for cross-correlation in the 
+early stopping algorithm, and can further be adjusted to ensure positivity and
+integrability of the resulting PDFs after the fit by adding a component to the 
+loss funciton. 
+
+More details of these loss functions and the role they play within the training
+of the neural network can be found in the :ref:`methodology overview
+<methodology>`.
+
+
+Hyperoptimized figure of merit
+--------------------------------------------------------------------------------
+To test the generalization power of a given methodology (a specific set of
+hyperparameter values), we employ hyperoptimization, specifically we use 
+K-folds cross-validation. The idea of K-folds cross-validation is to create
+subsets of data representative of the global dataset, and then perform a 
+fit to :math:`K-1` subsets while using the :math:`K^{\rm th}` subset as a test
+set to check the generalization performance after the neural network has been 
+trained. The figure of merit that is minimized during the hyperoptimization 
+routine is obtained by summing over all :math:`K` test losses that are obtained
+after performing :math:`K` fits to each possible combination of :math:`K-1`
+datasets. 
+
+For a more detailed description of the hyperoptimization loss see the 
+documentation of the :ref:`hyperoptimization algorithm<hyperoptimization>`.
+
+
diff --git a/doc/sphinx/source/index.rst b/doc/sphinx/source/index.rst
index 187ad737c9..6f2ac624b1 100644
--- a/doc/sphinx/source/index.rst
+++ b/doc/sphinx/source/index.rst
@@ -123,6 +123,7 @@ Contents
    ./buildmaster.md
    data/index
    theory/index
+   figuresofmerit/index
    contributing/index
    releases
    ci/index
diff --git a/doc/sphinx/source/tutorials/thcov_tutorial.rst b/doc/sphinx/source/tutorials/thcov_tutorial.rst
index e5e25dfb7d..4a52d0f10c 100644
--- a/doc/sphinx/source/tutorials/thcov_tutorial.rst
+++ b/doc/sphinx/source/tutorials/thcov_tutorial.rst
@@ -1,3 +1,4 @@
+.. _thcov_tutorial:
 How to include a theory covariance matrix in a fit
 ==================================================
 
diff --git a/validphys2/src/validphys/theorycovariance/theorycovarianceutils.py b/validphys2/src/validphys/theorycovariance/theorycovarianceutils.py
index 8060391010..c942313fa4 100644
--- a/validphys2/src/validphys/theorycovariance/theorycovarianceutils.py
+++ b/validphys2/src/validphys/theorycovariance/theorycovarianceutils.py
@@ -1,5 +1,5 @@
 """
-theorycovariance.py
+theorycovarianceutils.py
 
 Low level utilities for theorycovariance module
 """

From 6e1ed875c1c08df5c909131d9b2bb303de3a5464 Mon Sep 17 00:00:00 2001
From: RoyStegeman <roystegeman@live.nl>
Date: Wed, 5 Jan 2022 13:35:47 +0100
Subject: [PATCH 2/7] update chi2 overview docs

---
 doc/sphinx/source/figuresofmerit/index.rst | 176 ++++++++++++---------
 1 file changed, 105 insertions(+), 71 deletions(-)

diff --git a/doc/sphinx/source/figuresofmerit/index.rst b/doc/sphinx/source/figuresofmerit/index.rst
index be9c09d05c..790749f65e 100644
--- a/doc/sphinx/source/figuresofmerit/index.rst
+++ b/doc/sphinx/source/figuresofmerit/index.rst
@@ -1,10 +1,10 @@
 Chi square figures of merit
 ================================================================================
 
-Within the NNPDF methodology various figures of merit are used, which can all 
-be used depending on the situation. To avoid confusion, it is important to
-understand the differences between the various figures of merit, and to 
-understand which definition we are referring to in a given context. In 
+Within the NNPDF methodology various figures of merit are used, each of which
+can be used in different situations. To avoid confusion, it is important to
+understand the differences between the various figures of merit, and to
+understand which definition we are referring to in a given context. In
 particular, it is worth stressing that whenever a figure of merit is discussed,
 the :math:`t_0` method (discussed below) applies.
 
@@ -14,35 +14,36 @@ when each of them is used.
 
 The basis of the loss functions: 𝜒²
 --------------------------------------------------------------------------------
-The figures of merit used in the NNPDF methodology are all variations of the 
-chi square distribution:
+The :math:`\chi^2` figures of merit used in the NNPDF methodology are all
+based on the chi square distribution:
 
 .. math::
-    \chi^{2}=\sum_{i, j}^{N_{\text {dat }}}(D-P)_{i} \sigma_{i j}^{-1}(D-P)_{j},
+    \chi^{2}=\sum_{i, j}^{N_{\text {dat }}}(D-P)_{i} C_{i j}^{-1}(D-P)_{j},
 
-where :math:`D_i` is the :math:`i`-th data point, :math:`P_i` is the convolution product
-between the FastKernel tables (insert link here) for point :math:`i` and the PDF model, and 
-:math:`\sigma_{ij}` is the covariance matrix between datapoints :math:`i` and 
-:math:`j`.
+where :math:`D_i` is the :math:`i`-th datapoint, :math:`P_i` is the prediction
+of the corresponding datapoint calculated by performing the convolution product
+between the :ref:`FastKernel tables<fktables>` for point :math:`i` and the PDF
+model, and :math:`C_{ij}` is the covariance matrix between datapoints :math:`i`
+and :math:`j`.
 
-The covariance matrix includes both uncorrelated and correlated experimental 
-statistical and systematic uncertainties, as given by the experimental 
-collaborations. 
+The covariance matrix accounts for correlated systematic uncertainties,
+normalization uncertainties, and statistical uncertainties as provided by the
+experimental collaborations.
 
-Note that this definition of :math:`\chi^2` is not used as a figure of merit
-anywhere in the NNDPF methodology. Instead, variations of this :math:`\chi^2`
-are used. These variations can are based on considering only subsets of data, 
-thus limiting the datasets that are summed over, or on an adjustment to the 
-covariance matrix :math:`\sigma_{ij}`.
+.. note::
+    This definition of :math:`\chi^2` is not used as a figure of merit
+    anywhere in the NNDPF methodology. Instead, variations of this :math:`\chi^2`
+    are used.
 
 
 Avoiding bias: t₀ method
 --------------------------------------------------------------------------------
-The :math:`t_0` method introduce in https://arxiv.org/abs/0912.2276 aims to 
-remove systematic biases as a result of a naive treatment of multiplicative 
-uncertainties. This is done by redefining the covariance matrix in the 
-definition of :math:`\chi^2`, resulting in a covariance matrix
-:math:`\sigma_{t_0}` and a corresponding figure of merit sometimes denoted by 
+The :math:`t_0` method introduced in
+`arXiv:0912.2276 <https://arxiv.org/abs/0912.2276>`_ aims to
+remove systematic biases as a result of a naive treatment of multiplicative
+uncertainties. This is done by redefining the covariance matrix in the
+definition of :math:`\chi^2`, resulting in a :math:`t_0` covariance matrix
+:math:`C_{t_0}` and a corresponding figure of merit sometimes denoted by
 :math:`\chi^2_{t_0}`, though often simply written as :math:`\chi^2`.
 
 .. note::
@@ -52,54 +53,62 @@ definition of :math:`\chi^2`, resulting in a covariance matrix
 
 Missing higher order uncertainties
 --------------------------------------------------------------------------------
-Another source of uncertainties that we may want to include in the covariance 
-matrix are theoretical uncertainties, particularly missing higher order 
-uncertainties estimated through scale variations. These unceratinties can be 
-considered in the figure of merit through the implementation of a 'theory 
+Another source of uncertainties that we may want to include in the covariance
+matrix are theoretical uncertainties, particularly missing higher order
+uncertainties estimated through scale variations. These unceratinties can be
+considered in the figure of merit through the implementation of a 'theory
 covariance matrix'. A paper discussing the formalism can be found here:
-https://arxiv.org/abs/1905.04311. For a tutorial see :ref:`How to include a
-theory covariance matrix in a fit<thcov_tutorial>`.
+`arXiv:1905.04311 <https://arxiv.org/abs/1905.04311>`_. For a tutorial see
+:ref:`How to include a theory covariance matrix in a fit <thcov_tutorial>`.
 
 
 Future test: including PDF errors
 --------------------------------------------------------------------------------
 To test the generalization power of the NNPDF fitting framework in the region
 where PDFs are not constrained by data, the 'future test' has been developed.
-The figure of merit considered in a future test is again the :math:`\chi^2`, 
+The figure of merit considered in a future test is again the :math:`\chi^2`,
 however, in this case the covariance matrix is not only the covariance matrix
-corresponding to the datasets, but it is instead the sum of the covariance 
+corresponding to the datasets, but it is instead the sum of the covariance
 matrix describing the data uncertainties and the covariance matrix describing
-the PDF uncertainties. 
+the PDF uncertainties.
 
-For a more detailed discussion of the future test formalism see e.g. 
-https://arxiv.org/abs/2103.08606, or learn :ref:`How to run a Future Test
-<futuretests>`
+For a more detailed discussion of the future test formalism see e.g.
+`arXiv:2103.08606 <https://arxiv.org/abs/2103.08606>`_, or learn
+:ref:`How to run a Future Test <futuretests>`
 
 
 Regularized covariance matrices
 --------------------------------------------------------------------------------
-To provide a decorrelated (diagonal) covariance matrix that is as close as 
-possible to a corresponding experimental covariance matrix, the decorrelation
-procedure is applied. This procedure involves clipping the eigenvectors 
-until a target value if the stability metric :math:`Z_{\rm reg}` is achieved.
-For instance, if the target value is chosen to be :math:`Z_{\rm reg}=4`, then 
-the clipping algorithm transforms the original experimental eigenvalues that 
-were smaller than :math:`1/Z_{\rm reg}^2=1/16` are replaced by 
-:math:`1/16`.
-
-A more detailed discussion of the decorrelation procedure can be found in 
-sections 4.2 and 8.7 of the NNPDF4.0 paper :cite:p:`nnpdf40`. 
+Information about the accuracy of the experimental uncertainty is generally not
+available, nevertheless inaccuracies in an experimental covariance matrix can
+lead to problems during optimization. Simply making a conservative estimate of
+the correlations does not always guarantee this problem is avoided and this is
+where the regularized covariance matrix comes in: it aims to provide a matrix
+which is closely related to the original experimental covariance matrix while
+avoiding the problems during optimization.
+
+The function that performs the regularization is
+:py:meth:`validphys.calcutils.regularize_l2`. A regularized covarinace marix
+cannot be generated while performing a fit as it is necesarry to produce
+corresponding :ref:`FastKernel tables<fktables>` and include it in the theory
+as a separete dataset. For instructions on how to do this see
+:ref:`tutorialfktables`
+
+A more detailed discussion of regularization procedure, and how it is used
+within NNPDF can be found in sections 4.2 and 8.7 of the NNPDF4.0 paper
+:cite:p:`nnpdf40`.
 
 
 The weighted fit method
 --------------------------------------------------------------------------------
-To determine whether a specific measurement is inconsistent with the global 
-dataset, one can produce a PDF determination that provides the best agreement
-to this dataset. One may then check whether this best agreement does or does not 
-lead to the deterioration of the agreement with one or more of the other data 
-included in the global dataset.
+To determine whether a specific dataset shows inconsistencies with the
+global dataset, one can produce a PDF determination in which that measurement
+is given an increased weight (usually equal to the combined weight of the other
+datasets). The idea being that if -- in oder to accommodate the dataset under
+investigation -- the agreement to the other datasets deteriorates, this dataset
+is likely inconsistent with the global dataset.
 
-When performing a weighted fit the figure of merit is hence redefined as 
+When performing a weighted fit the figure of merit is hence redefined as
 
 .. math::
     \chi^{2}=\frac{1}{N_{\text {dat }}-N_{\text {dat }}^{(j)}}
@@ -108,42 +117,67 @@ When performing a weighted fit the figure of merit is hence redefined as
 
 with :math:`w^{(j)}=N_{\rm dat}/N^{(j)}_{\rm dat}`.
 
+A dataset can be given an additional weight by explictitly writing a weight key
+for a given dataset in the :ref:`n3fit runcard <runcard-detailed>`. For example,
+while the default weight is 1, one can set the weight of the
+HERACOMB_SIGMARED_C dataset to 100 by adding the following to the runcard:
+
+.. code-block:: yaml
+
+    dataset_inputs:
+        - {dataset: HERACOMB_SIGMARED_C, frac: 0.75, weight: 100}
+
 
 Experimental, validation, and training 𝜒²
 --------------------------------------------------------------------------------
-When performing a PDF fit we distinguish three different definitions of the 
-:math:`\chi^2` loss function, namely the experimental loss 
-:math:`\chi^2_{\rm exp}`, the training loss :math:`\chi^2_{rm tr}` and the 
-validation loss :math:`\chi^2_{val}`, all of which are defined using the 
+When performing a PDF fit we generally distinguish three different definitions
+of the :math:`\chi^2` loss function, namely the experimental loss
+:math:`\chi^2_{\rm exp}`, the training loss :math:`\chi^2_{rm tr}` and the
+validation loss :math:`\chi^2_{val}`, all of which are defined using the
 :math:`t_0` method. Here the experimental loss is calculated with respect to the
-experimental covariance matrix and corresponding central values, while the 
-training and validation losses are defined with respect to the psuedodata 
-replicas. 
+experimental covariance matrix and corresponding central values, while the
+training and validation losses are defined with respect to the central values 
+of the psuedodata replicas.
 
-The training and validation losses are used for cross-correlation in the 
+The training and validation losses are used for cross-correlation in the
 early stopping algorithm, and can further be adjusted to ensure positivity and
-integrability of the resulting PDFs after the fit by adding a component to the 
-loss funciton. 
+integrability of the resulting PDFs after the fit by adding a component to the
+loss function (see :ref:`below <lagrange-multipliers>`).
 
 More details of these loss functions and the role they play within the training
 of the neural network can be found in the :ref:`methodology overview
 <methodology>`.
 
 
+.. _lagrange-multipliers:
+Positivity and integrability: Lagrange multipliers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Generally in an NNPDF fit we will want to ensure positivity and integrability of
+the resulting PDFs. This is enforced by means of Lagrange multipliers, which 
+provide an additional contribution to the definition of the chi squared 
+loss function. 
+
+For an discussion of how exactly the loss function is adjusted upon including
+the Lagrange multipliers, see sections 3.1.3 and 3.1.4 of the NNPDF4.0 paper
+:cite:p:`nnpdf40`.
+
+An explanation of how the runcard should be adjusted to include the additional
+positivity Lagrange multiplier can be found :ref:`elsewhere in the documentation
+<positivity-label>`
+
+
 Hyperoptimized figure of merit
 --------------------------------------------------------------------------------
 To test the generalization power of a given methodology (a specific set of
-hyperparameter values), we employ hyperoptimization, specifically we use 
+hyperparameter values), we employ hyperoptimization, specifically we use
 K-folds cross-validation. The idea of K-folds cross-validation is to create
-subsets of data representative of the global dataset, and then perform a 
+subsets of data representative of the global dataset, and then perform a
 fit to :math:`K-1` subsets while using the :math:`K^{\rm th}` subset as a test
-set to check the generalization performance after the neural network has been 
-trained. The figure of merit that is minimized during the hyperoptimization 
+set to check the generalization performance after the neural network has been
+trained. The figure of merit that is minimized during the hyperoptimization
 routine is obtained by summing over all :math:`K` test losses that are obtained
 after performing :math:`K` fits to each possible combination of :math:`K-1`
-datasets. 
+datasets.
 
-For a more detailed description of the hyperoptimization loss see the 
+For a more detailed description of the hyperoptimization loss see the
 documentation of the :ref:`hyperoptimization algorithm<hyperoptimization>`.
-
-

From 51c88d5ef0dec129a99f960de50fd9093085ef37 Mon Sep 17 00:00:00 2001
From: RoyStegeman <roystegeman@live.nl>
Date: Wed, 5 Jan 2022 13:37:41 +0100
Subject: [PATCH 3/7] add instruction for integrability in the runcard to docs

---
 doc/sphinx/source/figuresofmerit/index.rst   | 13 ++++---
 doc/sphinx/source/n3fit/runcard_detailed.rst | 41 +++++++++++++++++---
 2 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/doc/sphinx/source/figuresofmerit/index.rst b/doc/sphinx/source/figuresofmerit/index.rst
index 790749f65e..078ab67661 100644
--- a/doc/sphinx/source/figuresofmerit/index.rst
+++ b/doc/sphinx/source/figuresofmerit/index.rst
@@ -136,7 +136,7 @@ of the :math:`\chi^2` loss function, namely the experimental loss
 validation loss :math:`\chi^2_{val}`, all of which are defined using the
 :math:`t_0` method. Here the experimental loss is calculated with respect to the
 experimental covariance matrix and corresponding central values, while the
-training and validation losses are defined with respect to the central values 
+training and validation losses are defined with respect to the central values
 of the psuedodata replicas.
 
 The training and validation losses are used for cross-correlation in the
@@ -153,17 +153,18 @@ of the neural network can be found in the :ref:`methodology overview
 Positivity and integrability: Lagrange multipliers
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Generally in an NNPDF fit we will want to ensure positivity and integrability of
-the resulting PDFs. This is enforced by means of Lagrange multipliers, which 
-provide an additional contribution to the definition of the chi squared 
-loss function. 
+the resulting PDFs. This is enforced by means of Lagrange multipliers, which
+provide an additional contribution to the definition of the chi squared
+loss function.
 
 For an discussion of how exactly the loss function is adjusted upon including
 the Lagrange multipliers, see sections 3.1.3 and 3.1.4 of the NNPDF4.0 paper
 :cite:p:`nnpdf40`.
 
 An explanation of how the runcard should be adjusted to include the additional
-positivity Lagrange multiplier can be found :ref:`elsewhere in the documentation
-<positivity-label>`
+positivity Lagrange multiplier can be found :ref:`here <positivity-label>`,
+while the analogous information for integrability can be found 
+:ref:`here <integrability-label>`.
 
 
 Hyperoptimized figure of merit
diff --git a/doc/sphinx/source/n3fit/runcard_detailed.rst b/doc/sphinx/source/n3fit/runcard_detailed.rst
index db01f16b21..5ae74b4501 100644
--- a/doc/sphinx/source/n3fit/runcard_detailed.rst
+++ b/doc/sphinx/source/n3fit/runcard_detailed.rst
@@ -11,6 +11,7 @@ In this section we fine-grain the explanation of the different parameters that e
 - :ref:`networkarch-label`
 - :ref:`optimizer-label`
 - :ref:`positivity-label`
+- :ref:`integrability-label`
 - :ref:`tensorboard-label`
 - :ref:`parallel-label`
 - :ref:`otheroptions-label`
@@ -209,11 +210,11 @@ the Neural Network as:
 .. code-block:: yaml
 
     parameters:
-        positivity:
-          threshold: 1e-6
-          multiplier: 1.05
-          initial: 14.5
-              
+      positivity:
+        threshold: 1e-6
+        multiplier: 1.05
+        initial: 14.5
+
 Note that by defining the positivity in this way all datasets will share the same Lagrange multiplier.
 
 It is also possible to not define the positivity hyperparameters (or define them only partially).
@@ -228,6 +229,36 @@ If the replica reaches the maximum number of epochs with the positivity loss abo
 this value, it will be tagged as ``POS_VETO`` and the replica removed from postfit.
 
 
+.. _integrability-label:
+
+Integrability
+-------------
+Integrability in ``n3fit`` is enforced through a Lagrange multiplier, this is 
+the same basic concept as how positivity is enforced, and therefore the 
+input in the runcard is analogous to the case of positivity where one can 
+apply the integrability contraints through an optional ``integrability`` 
+dictionary as (not that as opposed to positivity, for integrability no 
+threshold value can be set):
+
+.. code-block:: yaml
+
+    parameters:
+      integrability:
+        multiplier: 1.05
+        initial: 14.5
+
+
+Again similar to positivity, it is also possible to leave either the ``initial``
+or ``multiplier`` keys empty and instead define a ``maxlambda`` per dataset:
+
+.. code-block:: yaml
+
+    integrability:
+      integdatasets:
+        - {dataset: INTEGXT8, maxlambda: 1e2}
+
+
+
 .. _tensorboard-label:
 
 Inspecting and profiling the code

From 2a2c7ab0b9264210af6efac03a7a7efd0b7794f4 Mon Sep 17 00:00:00 2001
From: RoyStegeman <roystegeman@live.nl>
Date: Wed, 5 Jan 2022 13:45:32 +0100
Subject: [PATCH 4/7] make t0 a subsection of the basis chi squared description
 in docs

---
 doc/sphinx/source/figuresofmerit/index.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/sphinx/source/figuresofmerit/index.rst b/doc/sphinx/source/figuresofmerit/index.rst
index 078ab67661..807c7b6e1f 100644
--- a/doc/sphinx/source/figuresofmerit/index.rst
+++ b/doc/sphinx/source/figuresofmerit/index.rst
@@ -37,7 +37,7 @@ experimental collaborations.
 
 
 Avoiding bias: t₀ method
---------------------------------------------------------------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~
 The :math:`t_0` method introduced in
 `arXiv:0912.2276 <https://arxiv.org/abs/0912.2276>`_ aims to
 remove systematic biases as a result of a naive treatment of multiplicative

From 845a4fb40f29e41aa16488c750af4db2ac282b80 Mon Sep 17 00:00:00 2001
From: Roy Stegeman <roystegeman@live.nl>
Date: Thu, 6 Jan 2022 16:16:24 +0100
Subject: [PATCH 5/7] replace chi square distribution with chi square statistic

Co-authored-by: Zaharid <zk261@cam.ac.uk>
---
 doc/sphinx/source/figuresofmerit/index.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/sphinx/source/figuresofmerit/index.rst b/doc/sphinx/source/figuresofmerit/index.rst
index 807c7b6e1f..1ab2136bd4 100644
--- a/doc/sphinx/source/figuresofmerit/index.rst
+++ b/doc/sphinx/source/figuresofmerit/index.rst
@@ -15,7 +15,7 @@ when each of them is used.
 The basis of the loss functions: 𝜒²
 --------------------------------------------------------------------------------
 The :math:`\chi^2` figures of merit used in the NNPDF methodology are all
-based on the chi square distribution:
+based on the chi square statistic:
 
 .. math::
     \chi^{2}=\sum_{i, j}^{N_{\text {dat }}}(D-P)_{i} C_{i j}^{-1}(D-P)_{j},

From 2dc7aa015039f8b893073c2368e279554bfb73ca Mon Sep 17 00:00:00 2001
From: RoyStegeman <roystegeman@live.nl>
Date: Thu, 6 Jan 2022 16:34:43 +0100
Subject: [PATCH 6/7] replace arxiv links with internal citation in docs

Replace arxiv links with internal BibTex citations in the figure of merit page.
---
 doc/sphinx/source/figuresofmerit/index.rst |  6 +++---
 doc/sphinx/source/references.bib           | 15 +++++++++++++++
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/doc/sphinx/source/figuresofmerit/index.rst b/doc/sphinx/source/figuresofmerit/index.rst
index 807c7b6e1f..4477a260ee 100644
--- a/doc/sphinx/source/figuresofmerit/index.rst
+++ b/doc/sphinx/source/figuresofmerit/index.rst
@@ -39,7 +39,7 @@ experimental collaborations.
 Avoiding bias: t₀ method
 ~~~~~~~~~~~~~~~~~~~~~~~~
 The :math:`t_0` method introduced in
-`arXiv:0912.2276 <https://arxiv.org/abs/0912.2276>`_ aims to
+:cite:p:`Ball:2009qv` aims to
 remove systematic biases as a result of a naive treatment of multiplicative
 uncertainties. This is done by redefining the covariance matrix in the
 definition of :math:`\chi^2`, resulting in a :math:`t_0` covariance matrix
@@ -58,7 +58,7 @@ matrix are theoretical uncertainties, particularly missing higher order
 uncertainties estimated through scale variations. These unceratinties can be
 considered in the figure of merit through the implementation of a 'theory
 covariance matrix'. A paper discussing the formalism can be found here:
-`arXiv:1905.04311 <https://arxiv.org/abs/1905.04311>`_. For a tutorial see
+:cite:p:`AbdulKhalek:2019bux`. For a tutorial see
 :ref:`How to include a theory covariance matrix in a fit <thcov_tutorial>`.
 
 
@@ -73,7 +73,7 @@ matrix describing the data uncertainties and the covariance matrix describing
 the PDF uncertainties.
 
 For a more detailed discussion of the future test formalism see e.g.
-`arXiv:2103.08606 <https://arxiv.org/abs/2103.08606>`_, or learn
+:cite:p:`Cruz-Martinez:2021rgy`, or learn
 :ref:`How to run a Future Test <futuretests>`
 
 
diff --git a/doc/sphinx/source/references.bib b/doc/sphinx/source/references.bib
index 9508e91a99..4bbe639bb8 100644
--- a/doc/sphinx/source/references.bib
+++ b/doc/sphinx/source/references.bib
@@ -10,6 +10,21 @@ @misc{zahari_kassabov_2019_2571601
   url          = {https://doi.org/10.5281/zenodo.2571601}
 }
 
+@article{Ball:2009qv,
+    author = "Ball, Richard D. and Del Debbio, Luigi and Forte, Stefano and Guffanti, Alberto and Latorre, Jose I. and Rojo, Juan and Ubiali, Maria",
+    collaboration = "NNPDF",
+    title = "{Fitting Parton Distribution Data with Multiplicative Normalization Uncertainties}",
+    eprint = "0912.2276",
+    archivePrefix = "arXiv",
+    primaryClass = "hep-ph",
+    reportNumber = "EDINBURGH-2009-22, IFUM-950-FT, FREIBURG-PHENO-09-09, CP3-09-51",
+    doi = "10.1007/JHEP05(2010)075",
+    journal = "JHEP",
+    volume = "05",
+    pages = "075",
+    year = "2010"
+}
+
 @article{Cruz-Martinez:2021rgy,
     author = "Cruz-Martinez, Juan and Forte, Stefano and Nocera, Emanuele R.",
     title = "{Future tests of parton distributions}",

From 9668104e4640f6896bb94c023d651bc742e5681f Mon Sep 17 00:00:00 2001
From: Roy Stegeman <roystegeman@live.nl>
Date: Fri, 7 Jan 2022 10:39:06 +0100
Subject: [PATCH 7/7] Apply suggestions from code review

comments from ZK on the chi squared overview

Co-authored-by: Zaharid <zk261@cam.ac.uk>
---
 doc/sphinx/source/figuresofmerit/index.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/sphinx/source/figuresofmerit/index.rst b/doc/sphinx/source/figuresofmerit/index.rst
index 2d461919ef..74f660f72f 100644
--- a/doc/sphinx/source/figuresofmerit/index.rst
+++ b/doc/sphinx/source/figuresofmerit/index.rst
@@ -21,9 +21,9 @@ based on the chi square statistic:
     \chi^{2}=\sum_{i, j}^{N_{\text {dat }}}(D-P)_{i} C_{i j}^{-1}(D-P)_{j},
 
 where :math:`D_i` is the :math:`i`-th datapoint, :math:`P_i` is the prediction
-of the corresponding datapoint calculated by performing the convolution product
+of the corresponding datapoint calculated from the convolution product
 between the :ref:`FastKernel tables<fktables>` for point :math:`i` and the PDF
-model, and :math:`C_{ij}` is the covariance matrix between datapoints :math:`i`
+model, and :math:`C_{ij}` is the covariance between datapoints :math:`i`
 and :math:`j`.
 
 The covariance matrix accounts for correlated systematic uncertainties,
@@ -32,7 +32,7 @@ experimental collaborations.
 
 .. note::
     This definition of :math:`\chi^2` is not used as a figure of merit
-    anywhere in the NNDPF methodology. Instead, variations of this :math:`\chi^2`
+    anywhere in NNDPF fits. Instead, variations discussed below
     are used.