Re-implementation of ATLAS single top#2189
Conversation
|
if you get the same covmat (and t0 covmat) synmetrizing I'd say we can keep the symmetrized version and drop the other but @enocera might know better |
|
Dear @jacoterh @scarlehoff , let me say three things.
|
|
Many thanks @enocera for clarifying. For this dataset we have both a stat covmat and a breakdown of the separate sources of systematics. Usually I would convert the covmat to artificial systematics and add the statistics to the systematic entry even though they're not strictly systematics. But how can we have both artificial systematics and a breakdown of systematics in the commondata? I'm not sure how to make these two compatible. I hope it's clear, otherwise I can clarify in the code-meeting. |
| data_central = get_data_values(yaml_content_data, bin_index=range(NB_POINTS - 1), indx=0) | ||
| uncertainties = get_errors(yaml_sys_sources, bin_index=range(NB_POINTS - 1)) | ||
|
|
||
| # TODO: do we multiply relative uncertainties by the shifted central value or the unshifted one? |
There was a problem hiding this comment.
In order to obtain absolute uncertainties, do we multiply the relative uncertainties by the shifted central value or the unshifted one? This is relevant in the presence of asymmetric uncertainties.
|
The implementation is done and the theory/data comparison and x-Q2 plots can be found here for legacy and here for the new implementation . The covariance matrices were compared with with output The reasons they don't match are:
Some questions that came up:
|
|
Following up on @scarlehoff suggestion in PR#2185 to compare the overall experimental chi2 with all datasets combined, I'm attaching here the reports: Regarding the ordering, legacy always appears first and is followed by the new commondata implementation, e.g |
scarlehoff
left a comment
There was a problem hiding this comment.
Hi @jacoterh, looks good. The only thing, could you please bump the verison number and change the comment that currently says "port of old commondata", which happily is no longer true :)
RE the comparison you need to separate the legacy and the new (so that you can check whether the correlations between datasets are captured in the same way).
For reference, this is the runcard I'm using to test:
vp runcard
meta:
title: Data vs Th
keywords: comparison
author: juacrumar
pdfs:
- id: NNPDF40_nnlo_as_01180
label: NNPDF4.0 NNLO
pdf: NNPDF40_nnlo_as_01180
#theoryid: 40_000_000
theoryid: 200
use_cuts: "internal"
marker_by: "dataset"
old_and_new:
- temporal: "OLD data"
dataset_inputs:
- { dataset: ATLAS_SINGLETOP_7TEV_T-Y-NORM, variant: legacy }
- { dataset: ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM, variant: legacy }
- { dataset: ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC, variant: legacy }
- { dataset: ATLAS_SINGLETOP_8TEV_T-RAP-NORM, variant: legacy }
- { dataset: ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM, variant: legacy }
- { dataset: ATLAS_SINGLETOP_8TEV_TCHANNEL-XSEC, variant: legacy }
- { dataset: ATLAS_SINGLETOP_13TEV_TCHANNEL-XSEC, variant: legacy }
- temporal: "NEW data"
dataset_inputs:
- { dataset: ATLAS_SINGLETOP_7TEV_T-Y-NORM}
- { dataset: ATLAS_SINGLETOP_7TEV_TBAR-Y-NORM}
- { dataset: ATLAS_SINGLETOP_7TEV_TCHANNEL-XSEC}
- { dataset: ATLAS_SINGLETOP_8TEV_T-RAP-NORM}
- { dataset: ATLAS_SINGLETOP_8TEV_TBAR-RAP-NORM}
- { dataset: ATLAS_SINGLETOP_8TEV_TCHANNEL-XSEC}
- { dataset: ATLAS_SINGLETOP_13TEV_TCHANNEL-XSEC}
template_text: |
{@ with old_and_new @}
{@ temporal @}
==============
Data-TH comparison
------------------
{@ dataset_inputs plot_fancy @}
Normalized
----------
{@ dataset_inputs::pdfs plot_fancy(normalize_to=data)@}
xq mapping
----------
{@ plot_xq2 @}
Chi Data
--------
{@ experiments_chi2_table @}
{@ total_chi2_data @}
{@ endwith @}
actions_:
- report(main=true)
|
Btw, before the merge, please rebase on top of master (you can do that from github by pressing the arrow where it says update branch and selecting update with rebase) |
|
RE the questions (but @enocera is the right person to answer)
I think we have decided to keep al uncertainties are multiplicative?
See above.
It's not really important for the data since the scale has an effect only on the theory. Eventually this might come from the PineaAPPL grid as @felixhekhorn has been advocating for (since that's the right place, a theory might be using q = m_t and another one q = Et). So whatever the dataset says (if it says something) is ok. |
I understand that we store only absolute uncertainties in the
If it is unclear whether an uncertainty is additive or multiplicative, which happens all the times, please set it to multiplicative. The rationale is that if you treat a multiplicative uncertainty as additive, then you may incur in the D'Agostini bias; if you treat an additive uncertainty as multiplicative, you do not incur in the bias. In other words: it is less harmful to treat an additive uncertainty as multiplicative than a multiplicative uncertainty as additive.
As @scarlehoff says, physical parameters are mostly relevant for theoretical computations. But there are cases in which these are also needed for data implementation, e.g. if the data comes rescaled by some variable which is a function of the physical parameters. In general, we should avoid to use these parameters to manipulate the data, and re-define the theory accordingly, in such a way that the parameters are called and controlled only at the level of theoretical predictions. If this is unavoidable, please be consistent with the parameters used for theory predictions (m_t must be equal to 172.5 GeV). |
|
@jacoterh is this final now? Could you reabase (preferably) on top of master or merge master into this branch? (so that the tests and the bot that @Radonirinaunimi prepared can run on this data) thanks |
317850c to
aee1454
Compare
|
I think something went wrong in the merge because you removed also some EICC data ^^U |
|
We're aware, it's the data with the capitalisation issue and macos isn't sensitive to capitalisation in filenames so should be resolved on a linux cluster. |
|
I see, I can fix it then. I'll merge/rebase from aee1454 if you give me the ok. |
|
Up to @jacoterh I think we should also remove the commit that changes these EIC files so we don't get new changes to them in master and keep having to deal with it when rebasing |
|
If he only has a mac I don't think he can fix it? I will squash and rebase, that should minimize the changes and make the rebase easy-ish. |
|
He has a cluster
Thanks |
Then let me know what you prefer @jacoterh |
b72bf39 to
9a4f8b2
Compare
9a4f8b2 to
0de5f63
Compare
scarlehoff
left a comment
There was a problem hiding this comment.
Thanks. I just left a few comments about things that should be removed (repeated data files).
For the point about the kinematic variables (removing the unused ones and changing mt2 -> mt) do as you prefer.
|
This should be ready for merging. Final report can be found at https://vp.nnpdf.science/nfGUvdTyQhCX_EV_C0BLkQ==/ |
I am opening this PR as draft to make discussion easier - it is not ready to merge yet. I have a question regarding the old implementation of the systematics in the ATLAS_SINGLETOP_7TEV_T-Y-NORM dataset.
Looking at the legacy implementation, I find 17 systematics per datapoint while the experimental paper only reports 7, see Table XV in [1406.7844]. Now, 3 of these correspond to correlated statistical uncertainties, which leaves 14 systematics, which is twice the number of real systematics! I went back to the old common data implementation (the c++ implementation) and found that both the upper and lower bounds are stored, e.g. relevant in case of asymmetric uncertainties. See line 208 in the c++ script.
However, my understanding was that symmetrising was performed before writing the commondata! So I am not sure why should keep both bounds for each source.