-
Notifications
You must be signed in to change notification settings - Fork 7
Add new guide on netCDF file format #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b9b5354
61ce8ed
55d0556
4eb38ec
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| File Formats | ||
| ============ | ||
|
|
||
|
|
||
| .. toctree:: | ||
| :maxdepth: 1 | ||
|
|
||
| netcdf.md |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,112 @@ | ||
| # NetCDF | ||
| >**Warning** | ||
| > This guide needs additional information | ||
|
|
||
| NetCDF (Network Common Data Form), is a file format that stores scientific data in arrays. Array values may be accessed | ||
| directly, without knowing how the data are stored, and metadata information may be stored with the data. | ||
|
|
||
| * Binary file format commonly used for scientific data | ||
| * Self-describing, includes metadata | ||
| * Multi-dimensional array data model | ||
|
|
||
| The [netCDF data model](https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html) consists of the following: | ||
| * variable | ||
| * Multi-dimensional array | ||
| * Column-oriented: each variable as a separate entity | ||
| * dimension | ||
| * Usually temporal, spatial, spectral, ... | ||
| * Can be unlimited length. One, at most, is recommended for a growing time dimension | ||
| * attribute | ||
| * Metadata: global and variable level | ||
| * group | ||
| * Akin to directories | ||
| * Avoid unless you really need the complex structure | ||
|
|
||
|
|
||
| ## Why use NetCDF | ||
| NetCDF is a file format commonly used at LASP as it is the "highly preferred" format for NASA Earth Observing System | ||
| Data and Information System data products, per their Data Product Development Guide for Data Producers. | ||
| This affects all NASA Earth Science missions. | ||
|
|
||
| NetCDF features: | ||
| * Self-describing | ||
| * structure captures coordinate system (functional relationship) | ||
| * includes metadata | ||
| * Efficient storage | ||
| * packing | ||
| * compression | ||
| * Efficient access | ||
| * chunking | ||
| * http byte range | ||
| * parallel IO | ||
| * Open specification (unlike IDL save files) | ||
|
|
||
| ## Options available | ||
| There are two netCDF data models: | ||
| * NetCDF-3 classic | ||
| * NetCDF-4 built on HDF5 | ||
| * recommended but prefer classic constructs | ||
|
|
||
| ## How to use this data format | ||
|
|
||
| #### NetCDF Files | ||
| * Binary format with open specification | ||
| * Requires software libraries to read and write C, Fortran, Java, python, IDL, ... | ||
| * Internal compression, don't bother to compress NetCDF files externally | ||
| * HTTP byte range requests | ||
| * Parallel IO | ||
| * nc file extension | ||
| * Don't be afraid of big files | ||
|
|
||
| #### Coordinate System | ||
| * Dimensions should be used to define a coordinate system | ||
| * e.g. temporal, spatial, spectral | ||
| * Avoid using dimensions to group data | ||
| * Think "functional relationship". Each independent variable should represent a dimension. | ||
| * coordinate variable | ||
| * 1D variable with dimension of the same name | ||
| * strictly monotonic (ordered) | ||
| * no missing values | ||
| * Independent variable of functional relationship | ||
| * Every dimension should have one | ||
| * shared dimensions | ||
| * Each variable should reuse dimensions to indicate that they share the same coordinates (domain set) | ||
|
|
||
| #### Time as Coordinate Variable | ||
| * If the data are a function of a single time dimension then there should be a single time variable | ||
| * avoid breaking time up by date and time of day | ||
| * Prefer numeric time units | ||
| * time unit since an epoch | ||
| * e.g. "seconds since 1970-01-01", "microseconds since 1980-01-06" | ||
|
|
||
| #### Metadata | ||
| * Optional but useful to make NetCDF file self-describing | ||
| * attribute | ||
| * global (dataset level) | ||
| * title | ||
| * history (provenance) | ||
| * variable | ||
| * long_name | ||
| * units | ||
| * Conventions | ||
| * [Climate and Forecast (CF)](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be useful to include information for compliance checkers, especially since this is the convention being pushed by the NASA Earth Science Data Systems and that changes what's required vs. useful attributes |
||
| * [Attribute Convention for Data Discovery (ACDD)](https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3) | ||
| * [udunits](https://www.unidata.ucar.edu/software/udunits/): standard units | ||
|
|
||
| #### Other useful variable attributes | ||
| * _FillValue | ||
| * missing_value is considered deprecated and is not recommended by the NetCDF Users Group. | ||
| * NaN is another option, however, NaNs in files are handled differently in every language and so it may | ||
| be better to pick a value for official data products that many users will be using | ||
| * valid_range, valid_min, valid_max | ||
| * scale_factor, add_offset (packed values) | ||
| * [cell_methods](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#_data_representative_of_cells): standards for representing data cells (bins) | ||
| * e.g. daily average, wavelength bins | ||
|
|
||
| ## Useful Links | ||
| * [NetCDF User's Guide](https://docs.unidata.ucar.edu/nug/current/) | ||
| * [NetCDF ToolsUI](https://docs.unidata.ucar.edu/netcdf-java/current/userguide/toolsui_ref.html) | ||
| * [NetCDF Workshop Materials](https://www.unidata.ucar.edu/software/netcdf/workshops/2011/index.html) | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll add this url to the list of links, thanks! |
||
|
|
||
| Credit: Content taken from a Confluence guide written by Doug Lindholm | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| Data Management | ||
| =============== | ||
|
|
||
|
|
||
| .. toctree:: | ||
| :maxdepth: 1 | ||
|
|
||
| file_formats/index |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,3 +7,4 @@ Welcome to the LASP Developer's Guide! | |
| :maxdepth: 1 | ||
|
|
||
| licensing | ||
| data_management/index | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a rabbit hole, depending on how much you want to get into things like Dask