From 4d5c0899028d319eba7193916b11beda00e5043f Mon Sep 17 00:00:00 2001 From: wilsonm Date: Wed, 18 Nov 2020 13:25:10 +0000 Subject: [PATCH 1/5] try and dissuade people from specifying data_input in runcard --- doc/sphinx/source/vp/dataspecification.rst | 58 ++++++++++++++++++++-- 1 file changed, 53 insertions(+), 5 deletions(-) diff --git a/doc/sphinx/source/vp/dataspecification.rst b/doc/sphinx/source/vp/dataspecification.rst index f6a6f080d9..e0f79f2f43 100644 --- a/doc/sphinx/source/vp/dataspecification.rst +++ b/doc/sphinx/source/vp/dataspecification.rst @@ -284,7 +284,7 @@ If we specify a grouping in the runcard, like so dataspecs: - pdf: NNPDF31_nnlo_as_0118 - - speclabel: "3.1 NNLO" + speclabel: "3.1 NNLO" use_cuts: internal @@ -373,7 +373,7 @@ input dataspecs: - pdf: NNPDF31_nnlo_as_0118 - - speclabel: "3.1 NNLO" + speclabel: "3.1 NNLO" use_cuts: internal @@ -400,8 +400,56 @@ Runcards that request actions that have been renamed will not work anymore. Generally, actions that were previously named ``experiments_*`` have been renamed to highlight the fact that they work with more general groupings. -Currently ``n3fit``, as well as the ``pseudodata``, ``closuretest`` and -``chi2grids`` modules have not been updated to use ``dataset_inputs`` and so -require ``experiments`` to be specified in the runcard. The C++ fitting code +If you are writing a runcard whereby you want to take the data from a ``fit``, +and either do not know whether the fit uses the new or old data specification or +require the runcard to be agnostic to the data specification in the fit, +there are a couple of options. + +First and foremost try using the ``fitinputcontext`` production rule to extract +the data from the fit. This production rule handles both styles of runcard +out of the box: + +.. code:: yaml + + metadata_group: nnpdf31_process + + fit: NNPDF31_nnlo_as_0118_DISonly + + dataspecs: + - pdf: NNPDF31_nnlo_as_0118 + speclabel: "3.1 NNLO" + + use_cuts: internal + + actions_: + - fitinputcontext dataspecs_groups_chi2_table + +The production rule sets the ``theoryid`` and ``data_input`` based on the +runcard for the specified ``fit``. Note that you can also use ``fitcontext`` +which does all of the above, and additionally sets the ``pdf`` to be the +``fitpdf``. + +In many cases where an action is prefixed with ``dataspecs``, indicating that +a table or plot will contain some results collected over the ``dataspecs``, +there will be a similar action prefixed with ``fits``, where instead the +results in the table or plot will have been collected over ``fits`` with +``fitcontext`` taken into account. + +.. warning:: + Whilst it is possible to specify ``data_input: {from_: fitinputcontext}`` + directly in the runcard, it is highly recommended **not** to do this where + possible. Specifying ``data_input`` explicitly at the level of the + runcard will overwrite any subsequent grouping which is done on the data + and instead each ``metadata_group`` will contain all of the datasets, which + will cause the resulting reports/actions to contain incorrect results + as well as taking a lot of time and resources to produce. The only exception + to this rule is when using the ``matched_datasets_from_dataspecs`` production + rule, however the user should take **extreme** care to not pollute any + namespace which will be used to compute actions which rely on dataset + grouping with ``data_input: {from_: fitinputcontext}``. + +Currently ``pseudodata`` and ``chi2grids`` modules have not been updated to +use ``dataset_inputs`` and so require ``experiments`` to be specified in the +runcard. The C++ fitting code ``nnfit`` is not scheduled to be updated to use ``dataset_inputs`` and so will always require ``experiments`` to be specified in the runcard. From b12acf63c051fe634fdc2022e681795f9f573688 Mon Sep 17 00:00:00 2001 From: wilsonm Date: Wed, 25 Nov 2020 14:11:50 +0000 Subject: [PATCH 2/5] added warnings in example runcards, update docs to better explain issue of taking data input from fitinputcontext --- doc/sphinx/source/vp/dataspecification.rst | 20 ++++++++++--------- .../examples/closure_templates/runcard.yaml | 2 ++ .../comparefittemplates/comparecard.yaml | 5 ++++- 3 files changed, 17 insertions(+), 10 deletions(-) diff --git a/doc/sphinx/source/vp/dataspecification.rst b/doc/sphinx/source/vp/dataspecification.rst index e0f79f2f43..e4b1f5e1c5 100644 --- a/doc/sphinx/source/vp/dataspecification.rst +++ b/doc/sphinx/source/vp/dataspecification.rst @@ -438,15 +438,17 @@ results in the table or plot will have been collected over ``fits`` with .. warning:: Whilst it is possible to specify ``data_input: {from_: fitinputcontext}`` directly in the runcard, it is highly recommended **not** to do this where - possible. Specifying ``data_input`` explicitly at the level of the - runcard will overwrite any subsequent grouping which is done on the data - and instead each ``metadata_group`` will contain all of the datasets, which - will cause the resulting reports/actions to contain incorrect results - as well as taking a lot of time and resources to produce. The only exception - to this rule is when using the ``matched_datasets_from_dataspecs`` production - rule, however the user should take **extreme** care to not pollute any - namespace which will be used to compute actions which rely on dataset - grouping with ``data_input: {from_: fitinputcontext}``. + possible. Taking a key `from_` a production rule causes that key to be + overwritten in inner namespaces. The grouping function, essentially returns a + namespace list with each item in the list specifying a different namespace + where ``data_input`` is defined as the datasets within that group. If + the user specifies ``data_input: {from_: fitinputcontext}`` in the runcard, + the inner ``data_input`` for each group will be overwritten and instead each + group will contain all of the datasets from the fit - which is incorrect. + It's rare that a user should need a runcard to be agnostic to whether the + fit used the old or new data specification - instead take either + ``dataset_inputs`` or ``experiments`` directly ``from_: fit`` depending on + whether the fit uses new or old data specification respectively. Currently ``pseudodata`` and ``chi2grids`` modules have not been updated to use ``dataset_inputs`` and so require ``experiments`` to be specified in the diff --git a/validphys2/examples/closure_templates/runcard.yaml b/validphys2/examples/closure_templates/runcard.yaml index 7d497a62fc..29e6d3018c 100644 --- a/validphys2/examples/closure_templates/runcard.yaml +++ b/validphys2/examples/closure_templates/runcard.yaml @@ -75,6 +75,8 @@ dataspecs: from_: closure speclabel: from_: closure + # WARNING: do not blindly copy and paste this: it can overwrite the datasets + # for any actions which rely on grouping datasets. data_input: from_: fitinputcontext diff --git a/validphys2/src/validphys/comparefittemplates/comparecard.yaml b/validphys2/src/validphys/comparefittemplates/comparecard.yaml index c2fbdfc80e..e6221f4d70 100644 --- a/validphys2/src/validphys/comparefittemplates/comparecard.yaml +++ b/validphys2/src/validphys/comparefittemplates/comparecard.yaml @@ -82,6 +82,8 @@ description: from_: fit dataspecs: + # WARNING: do not blindly copy and paste this: it can overwrite the datasets + # for any actions which rely on grouping datasets. - data_input: from_: fitinputcontext theoryid: @@ -93,7 +95,8 @@ dataspecs: speclabel: from_: current - + # WARNING: do not blindly copy and paste this: it can overwrite the datasets + # for any actions which rely on grouping datasets. - data_input: from_: fitinputcontext theoryid: From 400dc8c642a465d5b91b2466452baa450c95906d Mon Sep 17 00:00:00 2001 From: wilsonm Date: Thu, 26 Nov 2020 14:43:28 +0000 Subject: [PATCH 3/5] distilled useful warning and put explanation at bottom for the avid reader --- doc/sphinx/source/vp/dataspecification.rst | 24 ++++++++++++---------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/doc/sphinx/source/vp/dataspecification.rst b/doc/sphinx/source/vp/dataspecification.rst index e4b1f5e1c5..41212d9aaa 100644 --- a/doc/sphinx/source/vp/dataspecification.rst +++ b/doc/sphinx/source/vp/dataspecification.rst @@ -438,20 +438,22 @@ results in the table or plot will have been collected over ``fits`` with .. warning:: Whilst it is possible to specify ``data_input: {from_: fitinputcontext}`` directly in the runcard, it is highly recommended **not** to do this where - possible. Taking a key `from_` a production rule causes that key to be - overwritten in inner namespaces. The grouping function, essentially returns a - namespace list with each item in the list specifying a different namespace - where ``data_input`` is defined as the datasets within that group. If - the user specifies ``data_input: {from_: fitinputcontext}`` in the runcard, - the inner ``data_input`` for each group will be overwritten and instead each - group will contain all of the datasets from the fit - which is incorrect. - It's rare that a user should need a runcard to be agnostic to whether the - fit used the old or new data specification - instead take either - ``dataset_inputs`` or ``experiments`` directly ``from_: fit`` depending on - whether the fit uses new or old data specification respectively. + possible. Instead take either ``dataset_inputs`` or ``experiments`` + directly ``from_: fit`` depending on whether the fit uses new or old data + specification respectively. (See below for a detailed explanation). Currently ``pseudodata`` and ``chi2grids`` modules have not been updated to use ``dataset_inputs`` and so require ``experiments`` to be specified in the runcard. The C++ fitting code ``nnfit`` is not scheduled to be updated to use ``dataset_inputs`` and so will always require ``experiments`` to be specified in the runcard. + +.. seealso:: Why not to use ``data_input: {from_: fitinputcontext}``? + + Taking a key ``from_`` a production rule causes that key to be + overwritten in inner namespaces. The grouping function, essentially returns a + namespace list with each item in the list specifying a different namespace + where ``data_input`` is defined as the datasets within that group. If + the user specifies ``data_input: {from_: fitinputcontext}`` in the runcard, + the inner ``data_input`` for each group will be overwritten and instead each + group will contain all of the datasets from the fit - which is incorrect. From 38d697587c7e804d7a39e2c95fd5c4765212a14b Mon Sep 17 00:00:00 2001 From: wilsonm Date: Thu, 26 Nov 2020 14:48:25 +0000 Subject: [PATCH 4/5] link issue to docs --- doc/sphinx/source/vp/dataspecification.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/sphinx/source/vp/dataspecification.rst b/doc/sphinx/source/vp/dataspecification.rst index 41212d9aaa..6a13d76cc5 100644 --- a/doc/sphinx/source/vp/dataspecification.rst +++ b/doc/sphinx/source/vp/dataspecification.rst @@ -457,3 +457,5 @@ always require ``experiments`` to be specified in the runcard. the user specifies ``data_input: {from_: fitinputcontext}`` in the runcard, the inner ``data_input`` for each group will be overwritten and instead each group will contain all of the datasets from the fit - which is incorrect. + This is regarded as a bug, the relevant issue is: + https://github.com/NNPDF/reportengine/issues/38 From 95b7f884b19bcd74e9f7fb857bae7e3e6b1bc9e6 Mon Sep 17 00:00:00 2001 From: wilsonmr <33907451+wilsonmr@users.noreply.github.com> Date: Wed, 2 Dec 2020 13:44:37 +0000 Subject: [PATCH 5/5] Apply grammatical suggestions from code review Co-authored-by: Cameron Voisey <32741139+voisey@users.noreply.github.com> --- doc/sphinx/source/vp/dataspecification.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/sphinx/source/vp/dataspecification.rst b/doc/sphinx/source/vp/dataspecification.rst index 6a13d76cc5..21274e5811 100644 --- a/doc/sphinx/source/vp/dataspecification.rst +++ b/doc/sphinx/source/vp/dataspecification.rst @@ -442,7 +442,7 @@ results in the table or plot will have been collected over ``fits`` with directly ``from_: fit`` depending on whether the fit uses new or old data specification respectively. (See below for a detailed explanation). -Currently ``pseudodata`` and ``chi2grids`` modules have not been updated to +Currently the ``pseudodata`` and ``chi2grids`` modules have not been updated to use ``dataset_inputs`` and so require ``experiments`` to be specified in the runcard. The C++ fitting code ``nnfit`` is not scheduled to be updated to use ``dataset_inputs`` and so will @@ -451,8 +451,8 @@ always require ``experiments`` to be specified in the runcard. .. seealso:: Why not to use ``data_input: {from_: fitinputcontext}``? Taking a key ``from_`` a production rule causes that key to be - overwritten in inner namespaces. The grouping function, essentially returns a - namespace list with each item in the list specifying a different namespace + overwritten in inner namespaces. The grouping function essentially returns a + namespace list with each item in the list specifying a different namespace, where ``data_input`` is defined as the datasets within that group. If the user specifies ``data_input: {from_: fitinputcontext}`` in the runcard, the inner ``data_input`` for each group will be overwritten and instead each