Add support for HS3 and SimpleAnalysis resource files#925
Conversation
* Remove identification of HistFactory by trigger words in description. * Repurpose add_histfactory_analyses.py to find SimpleAnalysis files.
There was a problem hiding this comment.
Pull Request Overview
Adds explicit support for HS3 and SimpleAnalysis resource files and removes implicit HistFactory detection by description keywords.
- Require explicit type: HistFactory or HS3 for those formats; no longer infer HistFactory from description/filename.
- Add SimpleAnalysis detection via description term "SimpleAnalysis" or explicit type, update indexing to surface HS3 and SimpleAnalysis in analyses, and repurpose the fix script to tag existing SimpleAnalysis resources.
- Update tests and search help documentation accordingly.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/submission_test.py | Updates tests to validate SimpleAnalysis detection and explicit HS3/HistFactory typing. |
| tests/search_test.py | Adjusts search tests to include SimpleAnalysis resource type and uses SITE_URL for analysis URLs. |
| hepdata/version.py | Bumps version string. |
| hepdata/modules/search/templates/hepdata_search/modals/search_help.html | Adds HS3 and SimpleAnalysis examples to search help. |
| hepdata/modules/records/utils/common.py | Removes HistFactory term-based detection; adds SimpleAnalysis detection and explicit handling for HS3. |
| hepdata/ext/opensearch/document_enhancers.py | Indexes HS3 and SimpleAnalysis in the analyses field similarly to HistFactory. |
| hepdata/config.py | Introduces HS3_FILE_TYPE and SIMPLEANALYSIS_FILE_TYPE constants. |
| fixes/add_simpleanalysis_analyses.py | Repurposes the fix command to tag SimpleAnalysis resources and reindex. |
|
Hi @GraemeWatt, Thanks a lot for this, looks good to me! I'd merely have a question regarding the SimpleAnalysis files: this PR's mechanism relies on the SimpleAnalysis snippets that have been uploaded to HEPData as part of a record. I assume that's not the complete list of public SimpleAnalysis codes though. Would it make sense to have SimpleAnalysis expose an analysis JSON file similar to Rivet, MadAnalysis etc.? How would the overlaps between HEPData files and the external files from the SimpleAnalysis server be treated? |
* Follow suggestion made by Copilot review.
|
@mhabedan : good point, this should be clarified before this PR is merged. Currently, the highlighted analyses are either hosted externally and specified in an analyses JSON file (e.g. SModelS) or hosted directly on HEPData (e.g. HistFactory). It would be difficult to combine the two approaches without duplication (or we'd need to write more code to filter out duplicates), so I would prefer to choose only one approach for SimpleAnalysis. There are a significant number of SimpleAnalysis code snippets already hosted on HEPData as found by a search query By the way, I realised I can generalise the |
|
If it's an either-or to simplify the bookkeeping that HEPData has to do, I think I personally would prefer to link another JSON file directly from SimpleAnalysis. That would hopefully allow better coverage of what is actually available without relying on analysers to upload their code to HEPData (evidently difficult as 52/79 codes are still missing) as well as being able to always refer to the latest version. I've nonetheless contacted the SimpleAnalysis authors to ask their opinion as well. Nice idea about extending the search-description to HS3. I just hope that keyword will prove less ambiguous than it turned out to be for pyHF. |
* Rename as add_analyses.py and modify to work also for HS3.
* Rename as is_analysis and modify to work also for HS3.
* Allow case where some analyses included locally and some via endpoint.
* Allow case where some analyses included locally and some via endpoint.
|
@mhabedan : I can see that it would be useful if both local files and external links (defined in a JSON file) were tagged as I can update
These few exceptions could be cleaned up by either re-uploading in a better format or by manually editing the relevant database fields, but it probably does not matter too much if there are a few imperfections. |
|
@mhabedan : although the mixed case of local files and external links (both tagged as |
|
Hi @GraemeWatt! Fantastic, thanks for all this! SimpleAnalysis will indeed expose a JSON file to HEPData soon, I'll create a PR as soon as that happens. No reason to delay this PR though. Re. the accuracy of the keyword tagging: thanks for having a look at this so quickly! I'll get in touch with the HEPData record creators as soon as this PR is merged and ask them to improve the labels/ use the new |
|
This PR is now deployed in production on hepdata.net and I ran the commands to find existing SimpleAnalysis and HS3 files. Now a search query Actually, https://www.hepdata.net/record/ins2845789 does have a I'll update |
|
I made some manual database updates to address the anomalous three records mentioned in my previous comment:
|
|
Thanks a lot! Two comments:
|
It's only individual resources that have a
OK, I've changed the type from |
Agreed, I think it's a mistake rather than a feature. I can also reach out to the submitters internally, if you prefer.
Thanks for the clarification and for updating the type of the files! |
|
|
From the list of HEPData records with HS3 files I found that https://www.hepdata.net/record/ins2689657 and https://www.hepdata.net/record/ins2616326 had attached HS3 files misidentified as HistFactory because the descriptions contained the trigger word "likelihoods" via the URL |
Now updated in https://www.hepdata.net/record/ins2829504?version=2 |
|
@cburgard : as mentioned at the end of my email yesterday, the three JSON files attached to https://www.hepdata.net/record/ins2905253 have |
type: HistFactory.type: HS3or by the stringHS3in the description (closes Highlight HS3 similar to pyHF #921).SimpleAnalysisin the description or an explicittype: SimpleAnalysis(closes Adding “analysis:SimpleAnalysis” search query #864).add_histfactory_analyses.pyto find existing SimpleAnalysis and HS3 files by runninghepdata fix add-analyses -a SimpleAnalysisandhepdata fix add-analyses -a HS3in the production environment.