[WIP] Added NLO and NNLO candidate runcards for NNPDF40#675
Conversation
|
Hi @enocera. One very minor comment: I noticed that this includes the 8 TeV single top data (for which I am currently working on implementing the correlation info) rather than the 7 TeV data (for which we have the full correlation info already implemented). Is there a reason for this? |
|
Thanks for noticing @voisey . That was a typo. This is now corrected. (BTW I was not aware of the fact that we have correlations for the 8 TeV data now?) |
|
No problem! Yes, we do. It's available here: https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/TOPQ-2015-05/ and it has both the statistical correlation matrices and the systematic breakdowns. We got an email from Wolfgang Wagner on 8 Jan 2020 pointing us to the info. He also said he would put some of the tables into a machine readable format for us by the end of January but I've heard nothing from him since, so I decided to do it myself. There isn't that much info to convert so it shouldn't take me too long |
|
Thanks, let me look at the candidate NNPDF4.0 runcard and I will let you know if I have any comments. |
|
Great, I'll start running the NLO one now. Let me know if the runcard changes! One question, is there by any chance a nnfit baseline fit I can compare with? |
|
@scarlehoff Please wait one second - I've to add the off-peak and forward W/Z data in the runcards. There's no nnfit baseline, but I can easily produce it if need be. |
|
wait a sec, let me check the runcard first. There is no nnfit baseline, so if required we would need to run it ex-profeso. I guess probably we want this baseline, right @enocera ? But first let us get the n3fit running, which is much faster and would allow us to identify any potential problem. |
|
Ok! I'll be in stand-by ! wrt the baseline, no problem, I was just wondering whether it existed. |
|
Please note that I've updated the wiki correspondingly: |
|
Hi @enocera the run cards look good to me with a couple of small modifications:
Other than that the runcards seem ready to go to me. @scarlehoff before running any fit, could you please send around a list of the chi2s at NLO and NNLO to try to identify if there is any obvious problem? |
|
Also @scarlehoff for the k-foldings, I am not sure in which format do you need the info, but I guess that it should be possible to adapt my list and proposed partitions so that it is in one-to-one correspondence with the runcard that @enocera has produced? |
There's in principle double counting with the single differential CMS top pair at 8 TeV (CMSTOPDIFF8TEVTTRAPNORM) which is not commented. It's up to us to decide whether we want to replace it with one of the 2D differential distributions.
As we have shown in our LH proceedings, including or not the pT distribution is inconsequential. Maybe @stefanoforte has a strong opinion on this? I don't.
I of course checked all these numbers before opening the PR. Please have a look at these which were computed with the iterated fit to dijet data (no PDF uncertainty included).
I've already adapted the list at this link. |
I seem to recall that the 2D distributions correspond to the dilepton dataset while the single-differential ones are for lepton+jets, so to the best of my understanding there is no double counting in such a case?
Me neither, but since the chi2 is not optimal perhaps we can facilitate the life of the optimiser? Again, this is a minute thing.
Ok thanks I will check this and comment back.
Amazing, many thanks ;) |
I had the same recollection, but that's not the case. I've checked the papers earlier today while preparing the table at https://www.wiki.ed.ac.uk/display/nnpdfwiki/NNPDF4.0+dataset. But please feel free to double check. |
|
Well in https://arxiv.org/abs/1703.01630 they say "The measurement is performed in the dilepton e±µ∓ final state", so indeed there is no double counting? And then they quote the older paper https://arxiv.org/abs/1505.04480 which is lepton+jets. Or I am missing something obvious? |
|
@juanrojochacon I detest to say that you're right. I've corrected the runcards and all files/numbers linked to https://www.wiki.ed.ac.uk/display/nnpdfwiki/NNPDF4.0+dataset. |
|
Thanks, now it looks good to me |
|
Great, I'll start running with these runcards. |
|
@scarlehoff Please note that there was still a bug in the NNLO runcard: the EWK K-factors were missing for the ATLASPHT12 and ATLASPHT15 data sets (they were not propagated to theory 53). You have to download theory 53 again to get these. |
|
Just out of curiosity, is the DISonly runcard going to remain as the one that's in nnpdf/n3fit/runcards/PN3_DIS_example.yml Lines 1 to 56 in f9bbe6f |
|
no this is an old card, for NNPDF3.1-like fits |
|
@siranipour you should take the NNPDF4.0 card and remove all the non-DIS experiments |
|
Hi @enocera (cc @scarrazza @juanrojochacon @stefanoforte) I've tried doing a global fit, current is the fit after hyperoptimization and reference before. nnpdf40-fit As you can see the fits after some hyperoptimization "work better" (they are more stable from one run to the next) but the chi2 of the central replica is quite bad. I'm looking through https://www.wiki.ed.ac.uk/download/attachments/431524064/NNLO_chi2.txt?version=2&modificationDate=1584395434672&api=v2 and it seems that the experiments these fits fail to describe well (such as ATLAS TPTNORM, ATLAS_1JET_8TEV_R06, and even D0WEASY or ATLASTTBARTOT) were already more problematic in that document. I was wondering whether there could be some problem with these datasets. fit with some datasets removed which seems to do much better (note that I also removed here positivity, this was my first test but didn't really make a difference, I just forgot to put it back, example here ) Any insights? Let me know if you'd like any other fits to complete the info (bear in mind they take 3-6 hours to be ready in the best case). Some extra info: Before fitting I removed the All in all, this is my runcard: |
|
@scarlehoff I think that the somewhat high numbers you find are expected, and consistent with our recent top study (https://inspirehep.net/literature/1783782) for ATLASTPTNORM and ATLASTTBARTOT, with our recent jet study (https://inspirehep.net/literature/1797633) for ATLAS_1JET_8TEV_R06 and with NNPDF3.1 for D0WEASY. We have indication that the experimental covariance matrix is problematic in these cases. Our claim (to be checked) is that the chi2 can be improved by regularizing/decorrelating the covariance matrix, but that this leaves the PDFs unchanged. |
|
Thanks. Then I have a collateral question (for the record): why the data sets on F2c and F2b were not removed from the LO NNPDF3.1 fit? |
|
I think technically F2c is not zero at LO but rather reduced to the massless calculation. But this is terrible for low Q. So it is a bit ambiguous but I think it is safer to remove them |
|
@juanrojochacon Thanks for the clarification. I apologise for having overlooked W+jet. |
|
Perfect! Thanks all for your quick replies. |
|
agreed, let's see what happens now. If the problem persist, we should store the chi2 logs dataset per dataset to pin down the problem @scarlehoff |
|
That's easier said than done - the training validation split is done at the level of the experiments and we never calculate per dataset chi2 at present. I wonder if we should open a new discussion of this study, the conversation is rather long here and nothing to do with the title (but of course very important). |
|
I'm also opening a separate issue wrt the PDG plots for similar reason - the discussion will likely be forgotten about or duplicated as is. |
|
There is already an issue for the PDG plots: https://github.com/NNPDF/papers/issues/27 |
|
Cheers @RoyStegeman I still opened the issue incase we want that code in this repo - if not then somebody can have the endorphin rush of closing the issue without the usual associated pain.. I also opened an issue about fitting at LO, if the new runcard solves the issues then perhaps this will also be closed relatively soon. I wonder if at some point we should merge this PR? It seems somehow useful to have the current runcards on the main branch even if a PR in the future must be made to update them. I guess at some point this was discussed before but rejected for some reason, but if not then I'll re/propose the idea. |
It's immaterial, insofar as I'm concerned. |
|
CMS_WCHARM_DIFF_UNNORM_13TEV does not have any frac or cfac set. Is that intentional or should it be changed? |
|
@RoyStegeman Good spot! The missing training fraction is a mistake - fixed now through all the LO and NLO runcards; the missing cfac should be fine, W+c data are excluded from the computation of similarity cuts. |
|
I noticed you also removed the posf2c constraint from the regular (non pch) LO runcard. Is there a reason for this? |
|
@RoyStegeman It's the same reason for which Juan suggested to remove F2c for HERA. |
|
@enocera I think the LO runcards can be replaced by the ones below (make sure to change the extension to .yml). LO pch is iterated wrt the current version in the repo while the fitted charm is slightly changed to be positive definite in the flavour basis. Further things to note are:
|
…factor for DYE886R
|
@RoyStegeman Thanks a lot. The LO runcards should now read like the ones you pasted above and they should alos include the missing NNLO K-factors. TOmorrow I'll update them with the missing Seaquest dataset. |
scarlehoff
left a comment
There was a problem hiding this comment.
Approving by explicit request from @Zaharid
This PR contains two runcards (NLO and NNLO) for a candidate NNPDF40 fit with the currently available data set. @scarlehoff : you might consider to use these as a baseline for the n3fit requested by @juanrojochacon at the PC today.