-
Notifications
You must be signed in to change notification settings - Fork 3
Add sample data for cloud scatterplots #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sample data for cloud scatterplots #30
Conversation
|
/regenerate |
|
PR comment handling Details: https://github.com/Climate-REF/ref-sample-data/actions/runs/14730933211 |
|
/regenerate |
|
PR comment handling Details: https://github.com/Climate-REF/ref-sample-data/actions/runs/14736606473 |
|
It looks like there is an issue with regridding the data |
Time slice before regridding Make regridding work with data on model levels Add log messages
…-scatterplots-data
|
@lewisjared Is there an easy way to test if Climate-REF/climate-ref#261 works with the data added in this pull request without first merging this? |
|
@nocollier I tried to use the streaming feature in the fetch_test_data.py script, but I get an assertion error coming from intake-esgf if I do that. Would you have time to take a look? |
|
Sigh... so I run this: import intake_esgf
intake_esgf.conf.set(all_indices=True)
cat = intake_esgf.ESGFCatalog().search(
project="obs4MIPs", variable_id="ta", source_id="ERA-5"
)
dpd = cat.to_path_dict(prefer_streaming=True, minimal_keys=False)and 1 out of 3 times I get the 44 links. The assertion you are seeing is a check I have to make sure that when I partition out the file info based on how we access/transfer the data, that we didn't lose a file somehow in the logic. If a user prefers streaming, I added this select link function to return the fastest link which first returns a response. If it fails to do so, the logic should break and the file will be queued for https download. I am not sure what is happening, but it looks like checking the OPENDAP server for the status of 44 links maybe makes it fail. In the short term we could try something that should work but is a little messy. You could: import intake_esgf
intake_esgf.conf.set(all_indices=True)
cat = intake_esgf.ESGFCatalog().search(
project="obs4MIPs", variable_id="ta", source_id="ERA-5"
)
infos = cat._get_file_info()This will return a list of dictionaries which look like: {
"key": "obs4MIPs.ECMWF.ERA-5.mon.ta.gn",
"dataset_id": "obs4MIPs.ECMWF.ERA-5.mon.ta.gn.v20250220|esgf-data2.llnl.gov",
"checksum_type": "SHA256",
"checksum": "d3c14ba6fb16a49cef03ea622bae954f29555c1fe7df4af6d4a272cf160a2eaa",
"size": 494866132,
"HTTPServer": [
"https://esgf-data2.llnl.gov/thredds/fileServer/user_pub_work/obs4MIPs/ECMWF/ERA-5/mon/ta/gn/v20250220/ta_mon_ERA-5_PCMDI_gn_197901-197912.nc",
"https://esgf-data2.llnl.gov/thredds/fileServer/user_pub_work/obs4MIPs/ECMWF/ERA-5/mon/ta/gn/v20250220/ta_mon_ERA-5_PCMDI_gn_197901-197912.nc",
],
"OPENDAP": [
"https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/obs4MIPs/ECMWF/ERA-5/mon/ta/gn/v20250220/ta_mon_ERA-5_PCMDI_gn_197901-197912.nc",
"https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/obs4MIPs/ECMWF/ERA-5/mon/ta/gn/v20250220/ta_mon_ERA-5_PCMDI_gn_197901-197912.nc",
],
"Globus": [
"globus:1889ea03-25ad-4f9f-8110-1ce8833a9d7e/user_pub_work/obs4MIPs/ECMWF/ERA-5/mon/ta/gn/v20250220/ta_mon_ERA-5_PCMDI_gn_197901-197912.nc",
"globus:1889ea03-25ad-4f9f-8110-1ce8833a9d7e/user_pub_work/obs4MIPs/ECMWF/ERA-5/mon/ta/gn/v20250220/ta_mon_ERA-5_PCMDI_gn_197901-197912.nc",
],
"path": PosixPath(
"obs4MIPs/ECMWF/ERA-5/mon/ta/gn/v20250220/ta_mon_ERA-5_PCMDI_gn_197901-197912.nc"
),
}Longer term I may need to think of a way a user can get detailed information that I haven't tried to reduce for them. OPENDAP may not be around much longer but I have a feeling users will break things no matter what and sometimes all the information is the best we can do. |
|
I need to generate some large datasets also ( |
|
The timestamp thing would definitely help a bit here. By the way, are you aware of NetCDF Byterange Support? Most servers on ESGF support this type of access, so that could be a way to support streaming without using OPeNDAP. |
You can use a different directory for the test data by specifying https://climate-ref.readthedocs.io/en/latest/configuration/#ref_test_output You might want to merge main back into this so you can use the |
|
@lewisjared Do you understand why installing |
Unfortunately that didn't work. Making a change like this worked, though it still downloads the existing sample data too. Is that something you would like to have included in the ref package or is it too specific to create yet another environmental variable? |
|
Apologies. I must have had that functionality in another branch. |
I figured it out by removing it, it's used to get the obs4REF data.. |
12a19fe to
3e25d55
Compare
It looks like this is a bug in pixi, the way to generates the URL for pip does not seem valid with recent versions. |
@lewisjared Files uploaded and checksums added in Climate-REF/climate-ref#334. I guess I would need that merged before I can use the files in this PR? |
|
@nocollier Thanks a lot for your patience and all your help! It looks like the information I needed was all already provided by intake-esgf, but I'm just not familiar enough with it. To make it even more obvious, maybe a slightly different exception could be raised if files have been found but all fileservers providing them are offline, e.g. something like |
|
/regenerate |
|
PR comment handling You can find the workflow here: |
|
/regenerate |
|
PR comment handling You can find the workflow here: |
…gh obs4REF" This reverts commit 80b3b53.
|
/regenerate |
|
PR comment handling You can find the workflow here: |
|
/regenerate |
|
PR comment handling You can find the workflow here: |
|
/regenerate |
|
PR comment handling You can find the workflow here: |
|
@bouweandela Do you think the Dask processing will be more preformant than just niavely processing each dataset in parallel? I'm happy to merge main back into this for you |
|
I'll start a new pull request with the updated |
Description
Checklist
Please confirm that this pull request has done the following:
/regenerate)changelog/