Skip to content

Validphys script to download hepdata tables#1717

Closed
Radonirinaunimi wants to merge 3 commits into
masterfrom
download_hepdata
Closed

Validphys script to download hepdata tables#1717
Radonirinaunimi wants to merge 3 commits into
masterfrom
download_hepdata

Conversation

@Radonirinaunimi
Copy link
Copy Markdown
Member

As far as I understood from the report of the last CM, the script to download the HepData tables should be implemented in a separate PR and not part of #1693 as mentioned in #1699.

The following is a slight improvement on the script initially introduced in #1500. It is working fine for some of the datasets I tried in #1684 and #1699 but people should try it.

@Radonirinaunimi Radonirinaunimi added the enhancement New feature or request label Apr 19, 2023
@Zaharid
Copy link
Copy Markdown
Contributor

Zaharid commented Apr 19, 2023

FWIW a minimum non-scripting usage of requests involves

response = requests.get(...)
response.raise_for_status()

Otherwise the request may fail with a status code and write the wrong thing to the file.

In practice we'd have some reasonable error handling for that as well.

"""
self.folder.mkdir(exist_ok=True)
ins_id = self.url.split("/")[-1]
filename = f"HEPData-{ins_id}-v{self.version}-Table_{table_id}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like table_id=None works particularly well here. Why is it in the function signature?

Comment thread validphys2/setup.py Outdated
'vp-nextfitruncard = validphys.scripts.vp_nextfitruncard:main',
'vp-hyperoptplot = validphys.scripts.vp_hyperoptplot:main',
'vp-deltachi2 = validphys.scripts.vp_deltachi2:main',
'get_hepdata = validphys.scripts.get_hepdata:main',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I love this name. Certainly not the underscore.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right! For sure people can propose a better name, but it is correct that at least in order be consistent with the previous ones the underscore should be -.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vp-data or vp-hepdata seem quite natural

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a reasonable suggestion. But perhaps for more clarity we should go for vp-getdata (see 1ded03f)?

if "YAML" in url_tab["description"]:
get_reads_yaml = requests.get(url_tab["contentUrl"])
self.write_tables(get_reads_yaml.content, table_id)
log.info(f"Table {table_id} downloaded and stored properly.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this: Surely the table is only written when self.write_tables is called?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! The indentation of the log was misplaced, this should be fixed now.

@Radonirinaunimi
Copy link
Copy Markdown
Member Author

FWIW a minimum non-scripting usage of requests involves

response = requests.get(...)
response.raise_for_status()

Otherwise the request may fail with a status code and write the wrong thing to the file.

In practice we'd have some reasonable error handling for that as well.

Agree! The error handling is the main part that is still missing here. I was hoping that people will try, document the errors, and we can address them all.

@Radonirinaunimi
Copy link
Copy Markdown
Member Author

Closing as no longer required.

@scarlehoff scarlehoff deleted the download_hepdata branch March 5, 2025 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants