Polish legacy jet datasets#2056
Conversation
Radonirinaunimi
left a comment
There was a problem hiding this comment.
Thanks @comane for this! I will have a detailed look soon, but when skimming through I think there are at least two main issues with the CMS_1JET_8TEV_PTY dataset:
- the variable in the kinematic has changed from
p_Ttop_T2while in the metadata it's still kept asp_T - the file
uncertainties_bugged.yamlno longer exists but it is still referenced in the metadata
346e661 to
f66e959
Compare
|
This was not done in this PR, but I changed it since I don't think we should start getting in the habit of suppressing warnings if there is another option |
Thanks @RoyStegeman, just to be sure I understand: removing the read "r" from |
|
No, removing "r" is just because that's the default option anyway. The warnings are fixed around line 900 in legacy_jets_utils.py. A similar error, with similar solution, that I didn't fix still exists for ATLAS_2JET_7TEV_R06. I didn't fix it now because that |
|
Since filter files will now share utilities, and we may need to change/fix those shared utilities at some point - do you think it's feasible to run all filters in the CI to test if the output remains unchanged? |
|
@scarlehoff what should be done in this PR? |
7138ebe to
517b046
Compare
Change it to |
|
@comane I see you addressed the CI fails (as pointed out by @Radonirinaunimi), but could you rename p_T2 to pT2? |
Sure, I am not sure why though? So maybe I can simply remove that comment? |
|
I assume @scarlehoff wrote it. Either way I agree with it for consistency with pT and pT_t |
1276509 to
8cf8687
Compare
| label: $|y|$ | ||
| units: '' | ||
| pT: | ||
| pT2: |
There was a problem hiding this comment.
Why did was this change made (I see the corresponding change was also done in kinematics.yaml)?
There was a problem hiding this comment.
I am not completely sure, but I think this was discussed with @scarlehoff at some point.
The reason is probably that tanishq started using pT2 instead of pT, hence just to uniform to that.
There was a problem hiding this comment.
I set it back to pT. I think this makes more sense as is the kin var for the process JET
| return {"min": min, "mid": mid, "max": max} | ||
|
|
||
|
|
||
| # ==================================================================== CMS_2JET_7TEV ====================================================================# |
There was a problem hiding this comment.
If these are only used for a single dataset, why are they not inside the folder of the corresponding dataset?
There was a problem hiding this comment.
Yes, this is how it was before, where I had a filter_utils.py module in each of the datasets.
I guess that the idea is that in the dataset folders we only have a filter.py that produces the dataset, all of the utils instead go in nnpdf_data/filter_utils.
Otherwise we would have some utils in nnpdf_data/filter_utils and other within the dataset folder, which would be more confusing.
scarlehoff
left a comment
There was a problem hiding this comment.
Please add a __init__.py file to filter_utils as well.
In all honesty, I'm not entirely sure I see the point of having a legacy_jet_utils.py function where the functions inside are actually specific for each dataset.
However, I'm ok with concentrating all that stuff into filter_utils so that it can be easily skipped by the installation with the right rules in pyproject.toml.
…d filter_utils.py to filter_utils.legacy_jets_utils.py
…d filter_utils.py to filter_utils.legacy_jets_utils.py
I only address the 1JET case (though solution for 2JET is similar), because the 2JET filter.py is broken anyway
Co-authored-by: Roy Stegeman <roystegeman@live.nl>
Co-authored-by: Roy Stegeman <roystegeman@live.nl>
f7dd15a to
88fd513
Compare
Radonirinaunimi
left a comment
There was a problem hiding this comment.
This LGTM now! Thanks @comane.
@scarlehoff, are you fine with merging this now? Such that I can continue with #2100 and #2099.
|
Yes. I think it should be ok. |
The scope of this PR is to (try to) polish the new implementation of the legacy jet datasets, namely, CMS_2JET_7TEV, CMS_1JET_8TEV, ATLAS_1JET_8TEV_R06, ATLAS_2JET_7TEV_R06.
What the PR does:
filter_utils.utils.pymodule. This module should collect the functions that are generalisable and could, in principle, be used for all datasets.filter_utils.legacy_jets_utils.pymodule that collects all of the utils functions for the filter files. Polishing these is going to be much harder given the nature of the raw data.