Analysis JSON schema#878
Conversation
…b.com/HEPData/hepdata-validator/blob/91b182772eac3a6d01451b98e4e24a9e7a865887/hepdata_validator/schemas/1.1.1/additional_resources_schema.json\#L12-L21): limit number of characters in license, add 'description' field
|
Hello, The current json: would then become, to a minimum: If this is valid json according to the schema, I think we are fine, and it is not difficult to change from our side. |
|
Hi @lenzip, Thanks for the quick feedback! Indeed, that would be a valid JSON (apart from a superfluous comma in the "main_url" line). The "path" field in the "implementations" wouldn't even have to contain the "{name}" bit. So you could either use or, as the "path" field is not mandatory, Similarly, not all fields you mentioned would have to be used (see here for the fields that are required). A minimal version could be if that makes more sense for you. But overall: Great to hear that this would work for you! The "implementations" field would be a bit of a misnomer though if the format should also work for Combine. Don't really have a better name at the moment though. "codes" maybe? |
|
Hello @mhabedan , |
|
Hi @lenzip! Very reasonable question. The use mostly comes from MadAnalysis (and LLPrecasting). They have two recasting approaches for detector emulation which are treated completely differently. Some analyses therefore have been recasted by different teams or just in general two implementations with two DOIs. Hopefully, allowing for multiple implementations in the JSON schema gives enough flexibility for all relevant needs without adding too much complication to the syntax. |
|
I added a required "tool_type" field as per @GraemeWatt's suggestion so HEPData knows whether the tool is a "Simplified analysis" or "Statistical model" (given that the schema seems to work for CMS Combine as well). Names are obviously open for discussion but I'd suggest to use an enum so the "tool_type" values are categorised. So the example above would now be |
|
I've merged and deployed a PR #881 that adds support for HackAnalysis and also adds a Lines 322 to 368 in 3a4eae4 The The analyses schema should be compatible with the existing additional_resources_schema.json. In particular, I'm not sure how to impose that the |
|
Hi @GraemeWatt, Re.
Re. |
|
The analysis links are stored in the database as For the first HEPData record with a Combine link (https://www.hepdata.net/record/ins2796231?version=2), CMS added the link manually by uploading a revised additional_resources:
- description: Statistical models
location: https://doi.org/10.17181/bp9fx-6qs64
license:
name: cc-by-4.0
url: https://creativecommons.org/licenses/by/4.0When automating the procedure (PR #847 to address issue #846) I added We can either leave |
Hi sorry for the late reply, am implementing things on SModelS, but I think it all fits us. Just quick question, Wolfgang |
|
Hi @GraemeWatt, Thanks for the detailed explanation! I understand now why you want to match the additional_resources_schema.json. We could try to limit the I wasn't fully aware how the Hi @WolfgangWaltenberger, |
|
Summarising a couple of points from my previous message:
|
|
Alright attached is what our current version would look like. Tell me in case you want something changed. Wolfgang |
|
Hi @WolfgangWaltenberger,
For future reference, the example I gave above would be as of 503ff8f. |
Ha, eluded me, thanks!
Right, will take out.
Right, would leave in for our own "gain". That is not a problem, right? Wolfgang
|
|
Hi! After thinking about it and discussing with Andy, I'm convinced that adding the So the example from above would now be as of 0d972a9. |
|
| from datetime import datetime, timezone I assume you dont care much, but am I oscillating between utc and the timezone of the json file production :) Wolfgang |
|
I checked and dateutils was happy to parse any variant, including pure date with the time info discarded... so I don't think we need to overspecify that. The main use-case will be to know roughly what era a given file dates from, so we know easily if it's ancient and needs to updated/ignored! |
|
As per @GraemeWatt's suggestion, the standard includes a new {
"schema_version" : "1.0.0",
"tool": "SModelS",
"version": "3.0.0",
"date_created": "2018-11-13T20:20:39+00:00",
"implementations_description": "SModelS analysis",
"url_templates": {
"main_url": "https://github.com/SModelS/smodels-database-release/tree/main/{name}"
},
"analyses" : [
{
"inspire_id": 1795076,
"implementations": [
{
"name" : "ATLAS-EXOT-2018-48",
}
]
}
]
}Furthermore, by popular demand, I've added a readme which describes in detail all fields that are required or defined, gives usage examples and instructions on how to test against the standard. From my point of view, the standard is finalised now. |
mhabedan
left a comment
There was a problem hiding this comment.
Following @GraemeWatt's suggestion, I've pulled the logic to parse the new analyses JSON files from #886 to this PR. I've also added the schema validation. At the moment, on failure, the old JSON file handling is invoked. That should be changed to a harder enforcement of the new JSON format once all tools have done the transition.
I've added a couple more discussion points directly in the code. Let me know what you think!
I didn't actually suggest moving the parsing code to this PR. My comment (in yesterday's email) was:
I just wanted #878 containing the JSON schema to be merged first, so that #886 could use it for validation. But it doesn't matter too much which PR contains the parsing code, so keep it here if you prefer. I think you can now check for the presence of |
|
Closing this PR without merging as it has now been replaced by a new PR #906. |
|
As of yesterday, #906 is merged and the new analyses JSON schema is live! 🎉 All tools are now encouraged to move to the new schema. (Apart from SModelS who thankfully already did that so we could use them as a test case.) |
Following up on a discussion of the OpenMAPP project, this PR adds a JSON schema to HEPData that defines the format of the input "analysis" JSONs, currently used by Rivet, MadAnalysis5, SModelS, CheckMATE, and Combine.
An example JSON file for a tool would then look like this:
{ "tool": "SModelS", "version": "3.0.0", "url_templates": { "main_url": "https://github.com/SModelS/smodels-database-release/tree/main/{path}", "val_url": "https://smodels.github.io/docs/Validation#{name}_ul" }, "analyses" : [ { "inspire_id": 1795076, "signature_type": "prompt", "pretty_name": "di-top resonance", "implementations": [ { "name" : "ATLAS-EXOT-2018-48", "path": "13TeV/ATLAS/{name}/" } ] } ], "implementations_license": { "name": "cc-by-4.0", "url": "https://creativecommons.org/licenses/by/4.0" } }Advantages of this JSON schema over currently used format. Click to expand
Changes to version discussed here. Click to expand
templates->url_templatesand used snake case throughout.1795076is the inspire ID instead of an unexplained dictionary key.Using a JSON schema then also has the advantage that everyone can validate their JSON files against the schema following steps similar to this script.
Dear authors of Rivet, MadAnalysis5, SModelS, CheckMATE, and Combine: Does this JSON format work for your tools?
@GraemeWatt: Any further comments from HEPData's perspective?