Re-implementation of ATLAS single top by jacoterh · Pull Request #2189 · NNPDF/nnpdf

jacoterh · 2024-10-29T11:45:36Z

I am opening this PR as draft to make discussion easier - it is not ready to merge yet. I have a question regarding the old implementation of the systematics in the ATLAS_SINGLETOP_7TEV_T-Y-NORM dataset.

Looking at the legacy implementation, I find 17 systematics per datapoint while the experimental paper only reports 7, see Table XV in [1406.7844]. Now, 3 of these correspond to correlated statistical uncertainties, which leaves 14 systematics, which is twice the number of real systematics! I went back to the old common data implementation (the c++ implementation) and found that both the upper and lower bounds are stored, e.g. relevant in case of asymmetric uncertainties. See line 208 in the c++ script.

However, my understanding was that symmetrising was performed before writing the commondata! So I am not sure why should keep both bounds for each source.

scarlehoff · 2024-10-29T12:11:00Z

if you get the same covmat (and t0 covmat) synmetrizing I'd say we can keep the symmetrized version and drop the other

but @enocera might know better

enocera · 2024-10-29T12:37:37Z

Dear @jacoterh @scarlehoff , let me say three things.

Operational consideration: symmetrise the data uncertainties and forget about the previous implementation.
Historical consideration: the reason why we used to store both the left and right uncertainties is because, at some point, the CMS collaboration provided us with a recipe to account for asymmetric uncertainties. The recipe is found in Eq. (6) of https://arxiv.org/pdf/1703.01630. This recipe is implemented for all the (single) top data.
Personal consideration: despite the recipe was recommended by the CMS collaboration, and that was indeed implemented in NNPDF4.0, with hindsight I consider that it is wiser to treat asymmetric uncertainties consistently across all data sets, specifically by symmetrising them according to the D'Agostini recipe. Therefore I recommend to symmetrise the uncertainties, as @jacoterh suggests. The legacy version will remain there and we will be able to check later if the different treatement of asymmetric uncertainties has any effect on the determination of PDFs.

jacoterh · 2024-10-30T14:01:15Z

Many thanks @enocera for clarifying. For this dataset we have both a stat covmat and a breakdown of the separate sources of systematics. Usually I would convert the covmat to artificial systematics and add the statistics to the systematic entry even though they're not strictly systematics. But how can we have both artificial systematics and a breakdown of systematics in the commondata? I'm not sure how to make these two compatible. I hope it's clear, otherwise I can clarify in the code-meeting.

jacoterh · 2024-11-01T16:14:06Z

+        data_central = get_data_values(yaml_content_data, bin_index=range(NB_POINTS - 1), indx=0)
+        uncertainties = get_errors(yaml_sys_sources, bin_index=range(NB_POINTS - 1))
+
+        # TODO: do we multiply relative uncertainties by the shifted central value or the unshifted one?


In order to obtain absolute uncertainties, do we multiply the relative uncertainties by the shifted central value or the unshifted one? This is relevant in the presence of asymmetric uncertainties.

jacoterh · 2024-11-01T16:16:23Z

The implementation is done and the theory/data comparison and x-Q2 plots can be found here for legacy and here for the new implementation . The covariance matrices were compared with

import numpy as np
from validphys.api import API

observables = ["T-Y-NORM", "TBAR-Y-NORM", "TCHANNEL-XSEC"]

for obs in observables:

    new_implementation = f"ATLAS_SINGLETOP_7TEV_{obs}"
    old_implementation = f"ATLAS_SINGLETOP_7TEV_{obs}"

    inp1 = {
        "dataset_input": {"dataset": f"{new_implementation}"},
        "theoryid": 40_000_000,
        "use_cuts": "internal",
        "t0pdfset": "NNPDF40_nnlo_as_01180",
        "use_t0": True,
    }
    inp2 = {
        "dataset_input": {"dataset": f"{old_implementation}", "variant": "legacy"},
        "theoryid": 40_000_000,
        "use_cuts": "internal",
        "t0pdfset": "NNPDF40_nnlo_as_01180",
        "use_t0": True,
    }

    covmat1 = API.covmat_from_systematics(**inp1)
    covmat2 = API.covmat_from_systematics(**inp2)

    t0_covmat1 = API.t0_covmat_from_systematics(**inp1)
    t0_covmat2 = API.t0_covmat_from_systematics(**inp2)

    print(f"Comparison for {new_implementation}")
    print(np.all(np.isclose(covmat1, covmat2)))
    print(np.all(np.isclose(t0_covmat1, t0_covmat2)))

with output

Comparison for ATLAS_SINGLETOP_7TEV_T-Y-NORM
False
False
Comparison for ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM
False
False
Comparison for ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC
False
False

The reasons they don't match are:

ATLAS_SINGLETOP_7TEV_T-Y-NORM: treatment of asymmetric uncertainties differs from the legacy implementation as discussed above.
ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM: dito
ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC: The legacy implementation discards systematics below 1%. Uncertainties contributing less than 1.0 % are marked with "" in the paper, but on HEPdata these uncertainties are approximated with 0.5% . I implemented also these. The ones that contribute more than 1% agree with the legacy implementation.

Some questions that came up:

Do we multiply relative uncertainties by the shifted central value or the unshifted one? See inline comment.
I'm really not sure how to find out the correct type of uncertainties for some of the systematics. The luminositiy is clearly multiplicative, but for e.g. lepton uncertainties, scale variations, Parton shower, etc... I'm not at all sure.
Which value of m_t, m_W, m_Z, etc.. should we use for the scales? I noticed this is not consistent for all datasets, i.e. sometimes m_t = 172.5, sometimes 173.3

jacoterh · 2024-11-11T11:22:47Z

Following up on @scarlehoff suggestion in PR#2185 to compare the overall experimental chi2 with all datasets combined, I'm attaching here the reports:

Regarding the ordering, legacy always appears first and is followed by the new commondata implementation, e.g

- { dataset: ATLAS_SINGLETOP_8TEV_T-RAP-NORM, variant: legacy }
- { dataset: ATLAS_SINGLETOP_8TEV_T-RAP-NORM}
- { dataset: ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM, variant: legacy }
- { dataset: ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM}
- { dataset: ATLAS_SINGLETOP_8TEV_TCHANNEL-XSEC, variant: legacy }
- { dataset: ATLAS_SINGLETOP_8TEV_TCHANNEL-XSEC}

scarlehoff

Hi @jacoterh, looks good. The only thing, could you please bump the verison number and change the comment that currently says "port of old commondata", which happily is no longer true :)

RE the comparison you need to separate the legacy and the new (so that you can check whether the correlations between datasets are captured in the same way).

For reference, this is the runcard I'm using to test:

vp runcard

meta:
  title: Data vs Th
  keywords: comparison
  author: juacrumar

pdfs:
  - id: NNPDF40_nnlo_as_01180
    label: NNPDF4.0 NNLO
pdf: NNPDF40_nnlo_as_01180

#theoryid: 40_000_000
theoryid: 200

use_cuts: "internal"
marker_by: "dataset"

old_and_new:
  - temporal: "OLD data"
    dataset_inputs:
    - { dataset: ATLAS_SINGLETOP_7TEV_T-Y-NORM, variant: legacy }
    - { dataset: ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM, variant: legacy }
    - { dataset: ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC, variant: legacy }
    - { dataset: ATLAS_SINGLETOP_8TEV_T-RAP-NORM, variant: legacy }
    - { dataset: ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM, variant: legacy }
    - { dataset: ATLAS_SINGLETOP_8TEV_TCHANNEL-XSEC, variant: legacy }
    - { dataset: ATLAS_SINGLETOP_13TEV_TCHANNEL-XSEC, variant: legacy }
  - temporal: "NEW data"
    dataset_inputs:
    - { dataset: ATLAS_SINGLETOP_7TEV_T-Y-NORM}
    - { dataset: ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM}
    - { dataset: ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC}
    - { dataset: ATLAS_SINGLETOP_8TEV_T-RAP-NORM}
    - { dataset: ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM}
    - { dataset: ATLAS_SINGLETOP_8TEV_TCHANNEL-XSEC}
    - { dataset: ATLAS_SINGLETOP_13TEV_TCHANNEL-XSEC}

template_text: |
  {@ with old_and_new @}
  {@ temporal @}
  ==============

  Data-TH comparison
  ------------------
  {@ dataset_inputs plot_fancy @}

  Normalized
  ----------
  {@ dataset_inputs::pdfs plot_fancy(normalize_to=data)@}

  xq mapping
  ----------
  {@ plot_xq2 @}

  Chi Data
  --------
  {@ experiments_chi2_table @}
  {@ total_chi2_data @}
  {@ endwith @}

actions_:
  - report(main=true)

scarlehoff · 2024-11-12T11:44:50Z

Btw, before the merge, please rebase on top of master (you can do that from github by pressing the arrow where it says update branch and selecting update with rebase)

scarlehoff · 2024-11-12T11:50:02Z

RE the questions (but @enocera is the right person to answer)

Do we multiply relative uncertainties by the shifted central value or the unshifted one? See inline comment.

I think we have decided to keep al uncertainties are multiplicative?

I'm really not sure how to find out the correct type of uncertainties for some of the systematics. The luminositiy is clearly multiplicative, but for e.g. lepton uncertainties, scale variations, Parton shower, etc... I'm not at all sure.

See above.

Which value of m_t, m_W, m_Z, etc.. should we use for the scales? I noticed this is not consistent for all datasets, i.e. sometimes m_t = 172.5, sometimes 173.3

It's not really important for the data since the scale has an effect only on the theory. Eventually this might come from the PineaAPPL grid as @felixhekhorn has been advocating for (since that's the right place, a theory might be using q = m_t and another one q = Et). So whatever the dataset says (if it says something) is ok.

enocera · 2024-11-12T13:05:22Z

* Do we multiply relative uncertainties by the shifted central value or the unshifted one? See inline comment.

I understand that we store only absolute uncertainties in the uncertainties.yaml file, don't we? If so, when experimentalists give relative uncertainties, I would first transform all of them to absolute values and then symmetrise them and shift the central value.

* I'm really not sure how to find out the correct type of uncertainties for some of the systematics. The luminositiy is clearly multiplicative, but for e.g. lepton uncertainties, scale variations, Parton shower, etc... I'm not at all sure.

If it is unclear whether an uncertainty is additive or multiplicative, which happens all the times, please set it to multiplicative. The rationale is that if you treat a multiplicative uncertainty as additive, then you may incur in the D'Agostini bias; if you treat an additive uncertainty as multiplicative, you do not incur in the bias. In other words: it is less harmful to treat an additive uncertainty as multiplicative than a multiplicative uncertainty as additive.

* Which value of m_t, m_W, m_Z, etc.. should we use for the scales? I noticed this is not consistent for all datasets, i.e. sometimes m_t = 172.5, sometimes 173.3

As @scarlehoff says, physical parameters are mostly relevant for theoretical computations. But there are cases in which these are also needed for data implementation, e.g. if the data comes rescaled by some variable which is a function of the physical parameters. In general, we should avoid to use these parameters to manipulate the data, and re-define the theory accordingly, in such a way that the parameters are called and controlled only at the level of theoretical predictions. If this is unavoidable, please be consistent with the parameters used for theory predictions (m_t must be equal to 172.5 GeV).

scarlehoff · 2024-11-21T21:56:16Z

@jacoterh is this final now? Could you reabase (preferably) on top of master or merge master into this branch? (so that the tests and the bot that @Radonirinaunimi prepared can run on this data)

thanks

scarlehoff · 2024-11-22T17:41:53Z

I think something went wrong in the merge because you removed also some EICC data ^^U

RoyStegeman · 2024-11-22T17:48:06Z

We're aware, it's the data with the capitalisation issue and macos isn't sensitive to capitalisation in filenames so should be resolved on a linux cluster.

scarlehoff · 2024-11-22T18:05:18Z

I see, I can fix it then.

I'll merge/rebase from aee1454 if you give me the ok.

RoyStegeman · 2024-11-22T18:12:20Z

Up to @jacoterh

I think we should also remove the commit that changes these EIC files so we don't get new changes to them in master and keep having to deal with it when rebasing

scarlehoff · 2024-11-22T18:15:13Z

If he only has a mac I don't think he can fix it?

I will squash and rebase, that should minimize the changes and make the rebase easy-ish.

RoyStegeman · 2024-11-22T18:19:40Z

He has a cluster

I will squash and rebase, that should minimize the changes and make the rebase easy-ish.

Thanks

scarlehoff · 2024-11-22T18:49:56Z

He has a cluster

Then let me know what you prefer @jacoterh

scarlehoff

Thanks. I just left a few comments about things that should be removed (repeated data files).

For the point about the kinematic variables (removing the unused ones and changing mt2 -> mt) do as you prefer.

jacoterh · 2024-11-25T12:18:55Z

This should be ready for merging. Final report can be found at https://vp.nnpdf.science/nfGUvdTyQhCX_EV_C0BLkQ==/

jacoterh requested a review from scarlehoff October 29, 2024 11:46

scarlehoff added the data toolchain label Nov 1, 2024

jacoterh commented Nov 1, 2024

View reviewed changes

Comment thread nnpdf_data/nnpdf_data/commondata/ATLAS_SINGLETOP_7TEV/metadata.yaml

jacoterh marked this pull request as ready for review November 8, 2024 11:33

scarlehoff approved these changes Nov 12, 2024

View reviewed changes

scarlehoff added the Done PRs that are done but waiting on something else to merge/approve label Nov 12, 2024

jacoterh force-pushed the ATLAS-single-top-production branch from 317850c to aee1454 Compare November 22, 2024 17:12

jacoterh force-pushed the ATLAS-single-top-production branch from b72bf39 to 9a4f8b2 Compare November 25, 2024 10:55

re-implementing ATLAS single top production in new commondata format

0de5f63

jacoterh force-pushed the ATLAS-single-top-production branch from 9a4f8b2 to 0de5f63 Compare November 25, 2024 11:02

scarlehoff approved these changes Nov 25, 2024

View reviewed changes

implementing last comments

51792f1

scarlehoff merged commit be0ed18 into master Nov 25, 2024

scarlehoff deleted the ATLAS-single-top-production branch November 25, 2024 12:47

Conversation

jacoterh commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scarlehoff commented Oct 29, 2024

Uh oh!

enocera commented Oct 29, 2024

Uh oh!

jacoterh commented Oct 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacoterh Nov 1, 2024

Choose a reason for hiding this comment

Uh oh!

jacoterh commented Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jacoterh commented Nov 11, 2024

Uh oh!

scarlehoff left a comment

Choose a reason for hiding this comment

Uh oh!

scarlehoff commented Nov 12, 2024

Uh oh!

scarlehoff commented Nov 12, 2024

Uh oh!

enocera commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scarlehoff commented Nov 21, 2024

Uh oh!

scarlehoff commented Nov 22, 2024

Uh oh!

RoyStegeman commented Nov 22, 2024

Uh oh!

scarlehoff commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RoyStegeman commented Nov 22, 2024

Uh oh!

scarlehoff commented Nov 22, 2024

Uh oh!

RoyStegeman commented Nov 22, 2024

Uh oh!

scarlehoff commented Nov 22, 2024

Uh oh!

scarlehoff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jacoterh commented Nov 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jacoterh commented Oct 29, 2024 •

edited

Loading

jacoterh commented Oct 30, 2024 •

edited

Loading

jacoterh commented Nov 1, 2024 •

edited

Loading

enocera commented Nov 12, 2024 •

edited

Loading

scarlehoff commented Nov 22, 2024 •

edited

Loading