Docs for new commondata format#1708
Conversation
|
How many versions and version comments can there be? Is it a complete changelog or do we forget about everything but the last one? |
| Organization following the naming convention | ||
| -------------------------------------------- | ||
|
|
||
| All dataset in the new data format follow the exact same naming convention:: |
There was a problem hiding this comment.
We should probably find a word different from "new" to refer to this format.
|
|
||
| <experiment>_<process>_<energy>{_<extras>}_<observable> | ||
|
|
||
| The data is contained in folders, each containing one single hepdata publication. |
There was a problem hiding this comment.
I assume we can do other things that are not hepdata?
There was a problem hiding this comment.
Of course vp doesn't know where the data come from but I think the idea is that you open a folder and it comes from one single hepdata publication.
In the odd case in which it doesn't we hope the person implement it add comments or something explaining what happened there.
There was a problem hiding this comment.
The comment was more in the direction that these docs should be rather general.
| @@ -0,0 +1,155 @@ | |||
| Organization following the naming convention | |||
There was a problem hiding this comment.
Is the convention enforced mechanically? Is it needed to process the names?
There was a problem hiding this comment.
Is it needed to process the names?
Yes. The folder in which the data is will always be name.rsplit("_", 1)[0] and the choice of the observable name.rsplit("_", 1)[1]
|
We should discuss things like versions vs variants? I do remember being persuaded we needed both but completely forgot why. |
|
If I remember correctly now, we were supposed to keep all the old version metadata files and add a full new one when needed. |
The versions are for actual fixes and the comment should address whatever the change was. Since we have version control I think we can forget about everything but the last one.
The version is a fix of whatever thing. The variant is a variation of the dataset (for instance, a variation of the dataset in which a particular uncertainty is also included). I'll add some lines about that. |
|
I've added some information on Btw, please feel (@ everyone) free to modify this branch to complete this document. |
|
@enocera @scarlehoff At some point one goal of this effort was mainly that it should be understandable and hopefully implementable by external people. I think it works pretty well for that (in principle, less clear to me in practice), with the exception of the theory section. It requires external names which index into external tool chains. The keys themselves "fktables" are NNPDF jargon. (And there are the ugly double lists...). I believe it that section would look intimidating to an external person trying to understand what is going on. With that, I think that if the strategic goal is to publicize the format, we'd be better off if this section was in a separate |
I guess in practice this is already the case with the On the flip side, people wanting to implement new data to use them in a fit will have to deal with that section anyway so I'm not sure whether having it separate is really an advantage at all. |
felixhekhorn
left a comment
There was a problem hiding this comment.
I added some comments
| Data | ||
| ---- | ||
|
|
||
| The format of the data is a ``yaml`` file with an entry ```data_central``` which is a list for all values for all bins. |
There was a problem hiding this comment.
I guess we should stress somehow more that this is the crucial part
I've applied the suggestions. Feel free to add the other changes to the docs directly! |
|
Adding here a few notes RE the latest changes to the commondata format (/cc @Radonirinaunimi ) I will also update the documentation, but just in case I don't finish today. Plotting: Kinematics: Datasets_names.yaml |
55763a5 to
c3afca3
Compare
|
cc @t7phy @Radonirinaunimi
In the old plotting file you will see the variable And to simplify the part of the cuts, which use the process type, I'd add a |
Thanks for the heads up @scarlehoff! |
|
For the people implementing new commondata: the if you are implementing a new one, so you don't have a reference, the best thing you can do is have a look at the list and use the one that looks the closest. This is used for some internal vp operations (for instance for internal cuts) so it is a required key. |
| implemented_observables: | ||
| - observable_name: "OBS" |
There was a problem hiding this comment.
| implemented_observables: | |
| - observable_name: "OBS" | |
| implemented_observables: | |
| - observable_name: "OBS" |
I think that there should be a tab here.
0ac4832 to
b593578
Compare
first draft of the documented new commondata format add definition of the version key add explanation for the variants and the theory Apply suggestions from code review Co-authored-by: Felix Hekhorn <felixhekhorn@users.noreply.github.com> Update new-commondata.rst update docs Update doc/sphinx/source/data/new-commondata.rst Update doc/sphinx/source/data/new-commondata.rst Update doc/sphinx/source/data/new-commondata.rst Update new-commondata.rst update docs with the definition of the old:new mapping Update doc/sphinx/source/data/new-commondata.rst
…n for the old format
f403223 to
af1be35
Compare
|
This is not perfect, but since we are merging the new commondata format today I wanted to have the documentation in a state in which it can be merged. I've removed some stuff that was plainly wrong or referred to the old format. I've added a (more or less detailed) explanation to every entry but plotting. The plotting metadata hasn't really changed that much so I've left a reference to the old plotting metadata with a warning that this information is outdated. The changes in the last commit I need for the docs to compile correctly in my laptop with py3.12. |
Radonirinaunimi
left a comment
There was a problem hiding this comment.
Thanks a lot @scarlehoff for this extensive documentation. I haven't read it fully but I only have two main comments regarding the:
filter: so far, there is no mention of how the commondata are generated from input hepdata. A (couple of) sentence(s) should suffice.positivityandintegrabilitywhich mainly affects the Observable specific information section.
Co-authored-by: Felix Hekhorn <felixhekhorn@users.noreply.github.com>
I agree but I haven't added any filters so his should be added by people who did it... RE positivity and integrability, that should be added for whoever ends up in charge of updating the commondata docs (we can assign tasks in the next code meeting). |
I've added a first draft of the new commondata format. This deals with #1691.
At the moment I've only added a description that follows the implementation of @t7phy in #1684. I believe the format is now more or less final (we are at a point where we are deciding whether it should be written
definitionordefinitionsso really the details).Since I was at it I've changed what @cschwan had added about the naming convention to
rst. I've added in thenew-commondata.rstfile the sentence:which si what we discussed yesterday.
Please feel free to add new documents to this PR, there are many things missing (a TODO below, feel free to add more items by editing this comment).
TODO: