-
Notifications
You must be signed in to change notification settings - Fork 0
Description
There are a large and growing number of publicly-available datasets that are loadable into xarray from buckets in the Cloud. Currently, however, there is no effective way to discover these datasets.
Using standards like OGC Catalog Service the Web (CSW) and OpenSearch, it would be possible to discover these xarray datasets via sites like data.gov (and data.gov.uk, data.gov.au, etc) but it requires producing the ISO metadata which these sites consume.
It would also be possible to discover [xarray datasets via sites like Google's dataset search, but it would necessary to produce the json-ld metadata that these sites consume.
Since xarray preserves the content of datasets which follow the CF and ACDD metadata conventions, it should be possible to generate both types of metadata in a straightforward way from the xarray dataset object, using metadata tools that have already been developed for datasets that adhere to the CF conventions. The ncISO tool exists that generate ISO records from netCDF or OPeNDAP endpoints, so the mapping from CF/ACDD attributes to ISO could be reused for records from xarray. Similarly, there has been work already done to create nco-json metadata from netcdf files, a complete metadata representation from which the json-ld content could be extracted.
Proposed Work:
-
Develop code that integrates the
nco-jsonspec into thexarraypackage, which represent the complete metadata of thexarrayobject. -
Develop code that, from the complete
nco-jsonmetadata associated withxarrayobjects, generates the more restrictiveISOandjson-ldmetadata formats.