Skip to content

Re-implementation of ATLAS single top#2189

Merged
scarlehoff merged 2 commits into
masterfrom
ATLAS-single-top-production
Nov 25, 2024
Merged

Re-implementation of ATLAS single top#2189
scarlehoff merged 2 commits into
masterfrom
ATLAS-single-top-production

Conversation

@jacoterh
Copy link
Copy Markdown
Collaborator

@jacoterh jacoterh commented Oct 29, 2024

I am opening this PR as draft to make discussion easier - it is not ready to merge yet. I have a question regarding the old implementation of the systematics in the ATLAS_SINGLETOP_7TEV_T-Y-NORM dataset.

Looking at the legacy implementation, I find 17 systematics per datapoint while the experimental paper only reports 7, see Table XV in [1406.7844]. Now, 3 of these correspond to correlated statistical uncertainties, which leaves 14 systematics, which is twice the number of real systematics! I went back to the old common data implementation (the c++ implementation) and found that both the upper and lower bounds are stored, e.g. relevant in case of asymmetric uncertainties. See line 208 in the c++ script.

However, my understanding was that symmetrising was performed before writing the commondata! So I am not sure why should keep both bounds for each source.

@jacoterh jacoterh requested a review from scarlehoff October 29, 2024 11:46
@scarlehoff
Copy link
Copy Markdown
Member

if you get the same covmat (and t0 covmat) synmetrizing I'd say we can keep the symmetrized version and drop the other

but @enocera might know better

@enocera
Copy link
Copy Markdown
Contributor

enocera commented Oct 29, 2024

Dear @jacoterh @scarlehoff , let me say three things.

  1. Operational consideration: symmetrise the data uncertainties and forget about the previous implementation.
  2. Historical consideration: the reason why we used to store both the left and right uncertainties is because, at some point, the CMS collaboration provided us with a recipe to account for asymmetric uncertainties. The recipe is found in Eq. (6) of https://arxiv.org/pdf/1703.01630. This recipe is implemented for all the (single) top data.
  3. Personal consideration: despite the recipe was recommended by the CMS collaboration, and that was indeed implemented in NNPDF4.0, with hindsight I consider that it is wiser to treat asymmetric uncertainties consistently across all data sets, specifically by symmetrising them according to the D'Agostini recipe. Therefore I recommend to symmetrise the uncertainties, as @jacoterh suggests. The legacy version will remain there and we will be able to check later if the different treatement of asymmetric uncertainties has any effect on the determination of PDFs.

@jacoterh
Copy link
Copy Markdown
Collaborator Author

jacoterh commented Oct 30, 2024

Many thanks @enocera for clarifying. For this dataset we have both a stat covmat and a breakdown of the separate sources of systematics. Usually I would convert the covmat to artificial systematics and add the statistics to the systematic entry even though they're not strictly systematics. But how can we have both artificial systematics and a breakdown of systematics in the commondata? I'm not sure how to make these two compatible. I hope it's clear, otherwise I can clarify in the code-meeting.

data_central = get_data_values(yaml_content_data, bin_index=range(NB_POINTS - 1), indx=0)
uncertainties = get_errors(yaml_sys_sources, bin_index=range(NB_POINTS - 1))

# TODO: do we multiply relative uncertainties by the shifted central value or the unshifted one?
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to obtain absolute uncertainties, do we multiply the relative uncertainties by the shifted central value or the unshifted one? This is relevant in the presence of asymmetric uncertainties.

@jacoterh
Copy link
Copy Markdown
Collaborator Author

jacoterh commented Nov 1, 2024

The implementation is done and the theory/data comparison and x-Q2 plots can be found here for legacy and here for the new implementation . The covariance matrices were compared with

import numpy as np
from validphys.api import API

observables = ["T-Y-NORM", "TBAR-Y-NORM", "TCHANNEL-XSEC"]

for obs in observables:

    new_implementation = f"ATLAS_SINGLETOP_7TEV_{obs}"
    old_implementation = f"ATLAS_SINGLETOP_7TEV_{obs}"

    inp1 = {
        "dataset_input": {"dataset": f"{new_implementation}"},
        "theoryid": 40_000_000,
        "use_cuts": "internal",
        "t0pdfset": "NNPDF40_nnlo_as_01180",
        "use_t0": True,
    }
    inp2 = {
        "dataset_input": {"dataset": f"{old_implementation}", "variant": "legacy"},
        "theoryid": 40_000_000,
        "use_cuts": "internal",
        "t0pdfset": "NNPDF40_nnlo_as_01180",
        "use_t0": True,
    }

    covmat1 = API.covmat_from_systematics(**inp1)
    covmat2 = API.covmat_from_systematics(**inp2)

    t0_covmat1 = API.t0_covmat_from_systematics(**inp1)
    t0_covmat2 = API.t0_covmat_from_systematics(**inp2)

    print(f"Comparison for {new_implementation}")
    print(np.all(np.isclose(covmat1, covmat2)))
    print(np.all(np.isclose(t0_covmat1, t0_covmat2)))

with output

Comparison for ATLAS_SINGLETOP_7TEV_T-Y-NORM
False
False
Comparison for ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM
False
False
Comparison for ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC
False
False

The reasons they don't match are:

  • ATLAS_SINGLETOP_7TEV_T-Y-NORM: treatment of asymmetric uncertainties differs from the legacy implementation as discussed above.
  • ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM: dito
  • ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC: The legacy implementation discards systematics below 1%. Uncertainties contributing less than 1.0 % are marked with "" in the paper, but on HEPdata these uncertainties are approximated with 0.5% . I implemented also these. The ones that contribute more than 1% agree with the legacy implementation.

Some questions that came up:

  • Do we multiply relative uncertainties by the shifted central value or the unshifted one? See inline comment.
  • I'm really not sure how to find out the correct type of uncertainties for some of the systematics. The luminositiy is clearly multiplicative, but for e.g. lepton uncertainties, scale variations, Parton shower, etc... I'm not at all sure.
  • Which value of m_t, m_W, m_Z, etc.. should we use for the scales? I noticed this is not consistent for all datasets, i.e. sometimes m_t = 172.5, sometimes 173.3

Comment thread nnpdf_data/nnpdf_data/commondata/ATLAS_SINGLETOP_7TEV/metadata.yaml
@jacoterh jacoterh marked this pull request as ready for review November 8, 2024 11:33
@jacoterh
Copy link
Copy Markdown
Collaborator Author

Following up on @scarlehoff suggestion in PR#2185 to compare the overall experimental chi2 with all datasets combined, I'm attaching here the reports:

Regarding the ordering, legacy always appears first and is followed by the new commondata implementation, e.g

- { dataset: ATLAS_SINGLETOP_8TEV_T-RAP-NORM, variant: legacy }
- { dataset: ATLAS_SINGLETOP_8TEV_T-RAP-NORM}
- { dataset: ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM, variant: legacy }
- { dataset: ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM}
- { dataset: ATLAS_SINGLETOP_8TEV_TCHANNEL-XSEC, variant: legacy }
- { dataset: ATLAS_SINGLETOP_8TEV_TCHANNEL-XSEC}

Copy link
Copy Markdown
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jacoterh, looks good. The only thing, could you please bump the verison number and change the comment that currently says "port of old commondata", which happily is no longer true :)

RE the comparison you need to separate the legacy and the new (so that you can check whether the correlations between datasets are captured in the same way).

For reference, this is the runcard I'm using to test:

vp runcard
meta:
  title: Data vs Th
  keywords: comparison
  author: juacrumar

pdfs:
  - id: NNPDF40_nnlo_as_01180
    label: NNPDF4.0 NNLO
pdf: NNPDF40_nnlo_as_01180

#theoryid: 40_000_000
theoryid: 200

use_cuts: "internal"
marker_by: "dataset"

old_and_new:
  - temporal: "OLD data"
    dataset_inputs:
    - { dataset: ATLAS_SINGLETOP_7TEV_T-Y-NORM, variant: legacy }
    - { dataset: ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM, variant: legacy }
    - { dataset: ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC, variant: legacy }
    - { dataset: ATLAS_SINGLETOP_8TEV_T-RAP-NORM, variant: legacy }
    - { dataset: ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM, variant: legacy }
    - { dataset: ATLAS_SINGLETOP_8TEV_TCHANNEL-XSEC, variant: legacy }
    - { dataset: ATLAS_SINGLETOP_13TEV_TCHANNEL-XSEC, variant: legacy }
  - temporal: "NEW data"
    dataset_inputs:
    - { dataset: ATLAS_SINGLETOP_7TEV_T-Y-NORM}
    - { dataset: ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM}
    - { dataset: ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC}
    - { dataset: ATLAS_SINGLETOP_8TEV_T-RAP-NORM}
    - { dataset: ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM}
    - { dataset: ATLAS_SINGLETOP_8TEV_TCHANNEL-XSEC}
    - { dataset: ATLAS_SINGLETOP_13TEV_TCHANNEL-XSEC}

template_text: |
  {@ with old_and_new @}
  {@ temporal @}
  ==============

  Data-TH comparison
  ------------------
  {@ dataset_inputs plot_fancy @}

  Normalized
  ----------
  {@ dataset_inputs::pdfs plot_fancy(normalize_to=data)@}

  xq mapping
  ----------
  {@ plot_xq2 @}

  Chi Data
  --------
  {@ experiments_chi2_table @}
  {@ total_chi2_data @}
  {@ endwith @}

actions_:
  - report(main=true)

@scarlehoff scarlehoff added the Done PRs that are done but waiting on something else to merge/approve label Nov 12, 2024
@scarlehoff
Copy link
Copy Markdown
Member

Btw, before the merge, please rebase on top of master (you can do that from github by pressing the arrow where it says update branch and selecting update with rebase)

@scarlehoff
Copy link
Copy Markdown
Member

RE the questions (but @enocera is the right person to answer)

Do we multiply relative uncertainties by the shifted central value or the unshifted one? See inline comment.

I think we have decided to keep al uncertainties are multiplicative?

I'm really not sure how to find out the correct type of uncertainties for some of the systematics. The luminositiy is clearly multiplicative, but for e.g. lepton uncertainties, scale variations, Parton shower, etc... I'm not at all sure.

See above.

Which value of m_t, m_W, m_Z, etc.. should we use for the scales? I noticed this is not consistent for all datasets, i.e. sometimes m_t = 172.5, sometimes 173.3

It's not really important for the data since the scale has an effect only on the theory. Eventually this might come from the PineaAPPL grid as @felixhekhorn has been advocating for (since that's the right place, a theory might be using q = m_t and another one q = Et). So whatever the dataset says (if it says something) is ok.

@enocera
Copy link
Copy Markdown
Contributor

enocera commented Nov 12, 2024

* Do we multiply relative uncertainties by the shifted central value or the unshifted one? See inline comment.

I understand that we store only absolute uncertainties in the uncertainties.yaml file, don't we? If so, when experimentalists give relative uncertainties, I would first transform all of them to absolute values and then symmetrise them and shift the central value.

* I'm really not sure how to find out the correct type of uncertainties for some of the systematics. The luminositiy is clearly multiplicative, but for e.g. lepton uncertainties, scale variations, Parton shower, etc... I'm not at all sure.

If it is unclear whether an uncertainty is additive or multiplicative, which happens all the times, please set it to multiplicative. The rationale is that if you treat a multiplicative uncertainty as additive, then you may incur in the D'Agostini bias; if you treat an additive uncertainty as multiplicative, you do not incur in the bias. In other words: it is less harmful to treat an additive uncertainty as multiplicative than a multiplicative uncertainty as additive.

* Which value of m_t, m_W, m_Z, etc.. should we use for the scales? I noticed this is not consistent for all datasets, i.e. sometimes m_t = 172.5, sometimes 173.3

As @scarlehoff says, physical parameters are mostly relevant for theoretical computations. But there are cases in which these are also needed for data implementation, e.g. if the data comes rescaled by some variable which is a function of the physical parameters. In general, we should avoid to use these parameters to manipulate the data, and re-define the theory accordingly, in such a way that the parameters are called and controlled only at the level of theoretical predictions. If this is unavoidable, please be consistent with the parameters used for theory predictions (m_t must be equal to 172.5 GeV).

@scarlehoff
Copy link
Copy Markdown
Member

@jacoterh is this final now? Could you reabase (preferably) on top of master or merge master into this branch? (so that the tests and the bot that @Radonirinaunimi prepared can run on this data)

thanks

@jacoterh jacoterh force-pushed the ATLAS-single-top-production branch from 317850c to aee1454 Compare November 22, 2024 17:12
@scarlehoff
Copy link
Copy Markdown
Member

I think something went wrong in the merge because you removed also some EICC data ^^U

@RoyStegeman
Copy link
Copy Markdown
Member

We're aware, it's the data with the capitalisation issue and macos isn't sensitive to capitalisation in filenames so should be resolved on a linux cluster.

@scarlehoff
Copy link
Copy Markdown
Member

scarlehoff commented Nov 22, 2024

I see, I can fix it then.

I'll merge/rebase from aee1454 if you give me the ok.

@RoyStegeman
Copy link
Copy Markdown
Member

Up to @jacoterh

I think we should also remove the commit that changes these EIC files so we don't get new changes to them in master and keep having to deal with it when rebasing

@scarlehoff
Copy link
Copy Markdown
Member

If he only has a mac I don't think he can fix it?

I will squash and rebase, that should minimize the changes and make the rebase easy-ish.

@RoyStegeman
Copy link
Copy Markdown
Member

He has a cluster

I will squash and rebase, that should minimize the changes and make the rebase easy-ish.

Thanks

@scarlehoff
Copy link
Copy Markdown
Member

He has a cluster

Then let me know what you prefer @jacoterh

@jacoterh jacoterh force-pushed the ATLAS-single-top-production branch from b72bf39 to 9a4f8b2 Compare November 25, 2024 10:55
@jacoterh jacoterh force-pushed the ATLAS-single-top-production branch from 9a4f8b2 to 0de5f63 Compare November 25, 2024 11:02
Copy link
Copy Markdown
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I just left a few comments about things that should be removed (repeated data files).

For the point about the kinematic variables (removing the unused ones and changing mt2 -> mt) do as you prefer.

Comment thread nnpdf_data/nnpdf_data/commondata/ATLAS_SINGLETOP_13TEV/metadata.yaml Outdated
Comment thread nnpdf_data/nnpdf_data/commondata/ATLAS_SINGLETOP_13TEV/metadata.yaml Outdated
Comment thread nnpdf_data/nnpdf_data/commondata/ATLAS_SINGLETOP_7TEV/metadata.yaml Outdated
Comment thread nnpdf_data/nnpdf_data/commondata/ATLAS_SINGLETOP_8TEV/metadata.yaml Outdated
@jacoterh
Copy link
Copy Markdown
Collaborator Author

This should be ready for merging. Final report can be found at https://vp.nnpdf.science/nfGUvdTyQhCX_EV_C0BLkQ==/

@scarlehoff scarlehoff merged commit be0ed18 into master Nov 25, 2024
@scarlehoff scarlehoff deleted the ATLAS-single-top-production branch November 25, 2024 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data toolchain Done PRs that are done but waiting on something else to merge/approve

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants