From b9b53546fc7b0a916addc8c6d31d68025a44121a Mon Sep 17 00:00:00 2001 From: Veronica Martinez Date: Thu, 25 Jul 2024 16:50:46 -0600 Subject: [PATCH 1/4] New guide on netCDF file format. Additional content needed --- .../data_management/file_formats/netcdf.md | 107 ++++++++++++++++++ 1 file changed, 107 insertions(+) create mode 100644 docs/source/_static/data_management/file_formats/netcdf.md diff --git a/docs/source/_static/data_management/file_formats/netcdf.md b/docs/source/_static/data_management/file_formats/netcdf.md new file mode 100644 index 0000000..172d314 --- /dev/null +++ b/docs/source/_static/data_management/file_formats/netcdf.md @@ -0,0 +1,107 @@ +# NetCDF +>**Warning** +> This guide needs additional information + +NetCDF (Network Common Data Form), is a file format that stores data in arrays. Array values may be accessed directly, +without knowing how the data are stored, and metadata information may be stored with the data. + +* Binary file format commonly used for scientific data +* Self-describing, includes metadata +* Multi-dimensional array data model + +#### Data Model (Essentials) +* variable + * Multi-dimensional array + * Column-oriented: each variable as a separate entity +* dimension + * Usually temporal, spatial, spectral, ... + * Can be unlimited length. One, at most, is recommended for a growing time dimension +* attribute + * Metadata: global and variable level +* group + * Akin to directories + * Avoid unless you really need the complex structure + + +## Purpose for this guideline + +#### Why Use NetCDF? +* Self-describing + * structure captures coordinate system (functional relationship) + * includes metadata +* Efficient storage + * packing + * compression +* Efficient access + * chunking + * http byte range + * parallel IO +* Open specification (unlike IDL save files) + +## Options for this guideline + +* NetCDF-3 classic +* NetCDF-4 built on HDF5 + * recommended but prefer classic constructs + +## How to apply this guideline + +#### NetCDF Files +* Binary format with open specification +* Requires software libraries to read and write C, Fortran, Java, python, IDL, ... +* Internal compression, don't bother to compress NetCDF files externally +* HTTP byte range requests +* Parallel IO +* nc file extension +* Don't be afraid of big files + +#### Coordinate System +* Dimensions should be used to define a coordinate system + * e.g. temporal, spatial, spectral + * Avoid using dimensions to group data + * Think "functional relationship". Each independent variable should represent a dimension. +* coordinate variable + * 1D variable with dimension of the same name + * strictly monotonic (ordered) + * no missing values + * Independent variable of functional relationship + * Every dimension should have one +* shared dimensions + * Each variable should reuse dimensions to indicate that they share the same coordinates (domain set) + +#### Time as Coordinate Variable +* If the data are a function of a single time dimension then there should be a single time variable + * avoid breaking time up by date and time of day +* Prefer numeric time units + * time unit since an epoch + * e.g. "seconds since 1970-01-01", "microseconds since 1980-01-06" + +#### Metadata +* Optional but useful to make NetCDF file self-describing +* attribute + * global (dataset level) + * title + * history (provenance) + * variable + * long_name + * units +* Conventions + * Climate and Forecast (CF) + * Attribute Convention for Data Discovery (ACDD) + * udunits: standard units + +#### Other useful variable attributes +* missing_value + * prefer over _FillValue + * NaN is a good option +* valid_range, valid_min, valid_max +* scale_factor, add_offset (packed values) +* cell_methods: standards for representing data cells (bins) + * e.g. daily average, wavelength bins + +## Useful Links +* [NetCDF User's Guide](https://docs.unidata.ucar.edu/nug/current/) +* [NetCDF ToolsUI](https://docs.unidata.ucar.edu/netcdf-java/current/userguide/toolsui_ref.html) + + +Credit: Content taken from a Confluence guide written by Doug Lindholm \ No newline at end of file From 61ce8eded5b8ebc6de91028586ee1ce0aa9b3f01 Mon Sep 17 00:00:00 2001 From: Veronica Martinez Date: Thu, 25 Jul 2024 17:05:54 -0600 Subject: [PATCH 2/4] Add hyperlinks --- .../source/_static/data_management/file_formats/netcdf.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/source/_static/data_management/file_formats/netcdf.md b/docs/source/_static/data_management/file_formats/netcdf.md index 172d314..d006111 100644 --- a/docs/source/_static/data_management/file_formats/netcdf.md +++ b/docs/source/_static/data_management/file_formats/netcdf.md @@ -86,9 +86,9 @@ without knowing how the data are stored, and metadata information may be stored * long_name * units * Conventions - * Climate and Forecast (CF) - * Attribute Convention for Data Discovery (ACDD) - * udunits: standard units + * [Climate and Forecast (CF)](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html) + * [Attribute Convention for Data Discovery (ACDD)](https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3) + * [udunits](https://www.unidata.ucar.edu/software/udunits/): standard units #### Other useful variable attributes * missing_value @@ -96,7 +96,7 @@ without knowing how the data are stored, and metadata information may be stored * NaN is a good option * valid_range, valid_min, valid_max * scale_factor, add_offset (packed values) -* cell_methods: standards for representing data cells (bins) +* [cell_methods](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#_data_representative_of_cells): standards for representing data cells (bins) * e.g. daily average, wavelength bins ## Useful Links From 55d0556bea07dbf21acc23ae637b7214b8cae09e Mon Sep 17 00:00:00 2001 From: Veronica Martinez Date: Mon, 29 Jul 2024 10:18:16 -0600 Subject: [PATCH 3/4] Minor edits to provide some clarity. Still needs an introduction for each section --- .../_static/data_management/file_formats/netcdf.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/docs/source/_static/data_management/file_formats/netcdf.md b/docs/source/_static/data_management/file_formats/netcdf.md index d006111..8a90efb 100644 --- a/docs/source/_static/data_management/file_formats/netcdf.md +++ b/docs/source/_static/data_management/file_formats/netcdf.md @@ -2,14 +2,14 @@ >**Warning** > This guide needs additional information -NetCDF (Network Common Data Form), is a file format that stores data in arrays. Array values may be accessed directly, -without knowing how the data are stored, and metadata information may be stored with the data. +NetCDF (Network Common Data Form), is a file format that stores scientific data in arrays. Array values may be accessed +directly, without knowing how the data are stored, and metadata information may be stored with the data. * Binary file format commonly used for scientific data * Self-describing, includes metadata * Multi-dimensional array data model -#### Data Model (Essentials) +The [netCDF data model](https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html) consists of the following: * variable * Multi-dimensional array * Column-oriented: each variable as a separate entity @@ -24,8 +24,9 @@ without knowing how the data are stored, and metadata information may be stored ## Purpose for this guideline +NetCDF is a file format commonly used at LASP... -#### Why Use NetCDF? +Benefits of using netCDF: * Self-describing * structure captures coordinate system (functional relationship) * includes metadata @@ -39,7 +40,7 @@ without knowing how the data are stored, and metadata information may be stored * Open specification (unlike IDL save files) ## Options for this guideline - +There are two netCDF data models: * NetCDF-3 classic * NetCDF-4 built on HDF5 * recommended but prefer classic constructs From 4eb38ec8b3a31595e20db542cd440a1b5562931f Mon Sep 17 00:00:00 2001 From: Veronica Martinez Date: Thu, 29 Aug 2024 11:16:10 -0600 Subject: [PATCH 4/4] Add guide to read the docs. Incorporate feedback from PR --- .../data_management/file_formats/index.rst | 8 ++++++++ .../data_management/file_formats/netcdf.md | 20 +++++++++++-------- docs/source/_static/data_management/index.rst | 8 ++++++++ docs/source/index.rst | 1 + 4 files changed, 29 insertions(+), 8 deletions(-) create mode 100644 docs/source/_static/data_management/file_formats/index.rst create mode 100644 docs/source/_static/data_management/index.rst diff --git a/docs/source/_static/data_management/file_formats/index.rst b/docs/source/_static/data_management/file_formats/index.rst new file mode 100644 index 0000000..bde309f --- /dev/null +++ b/docs/source/_static/data_management/file_formats/index.rst @@ -0,0 +1,8 @@ +File Formats +============ + + +.. toctree:: + :maxdepth: 1 + + netcdf.md \ No newline at end of file diff --git a/docs/source/_static/data_management/file_formats/netcdf.md b/docs/source/_static/data_management/file_formats/netcdf.md index 8a90efb..23a6165 100644 --- a/docs/source/_static/data_management/file_formats/netcdf.md +++ b/docs/source/_static/data_management/file_formats/netcdf.md @@ -23,10 +23,12 @@ The [netCDF data model](https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_da * Avoid unless you really need the complex structure -## Purpose for this guideline -NetCDF is a file format commonly used at LASP... +## Why use NetCDF +NetCDF is a file format commonly used at LASP as it is the "highly preferred" format for NASA Earth Observing System +Data and Information System data products, per their Data Product Development Guide for Data Producers. +This affects all NASA Earth Science missions. -Benefits of using netCDF: +NetCDF features: * Self-describing * structure captures coordinate system (functional relationship) * includes metadata @@ -39,13 +41,13 @@ Benefits of using netCDF: * parallel IO * Open specification (unlike IDL save files) -## Options for this guideline +## Options available There are two netCDF data models: * NetCDF-3 classic * NetCDF-4 built on HDF5 * recommended but prefer classic constructs -## How to apply this guideline +## How to use this data format #### NetCDF Files * Binary format with open specification @@ -92,9 +94,10 @@ There are two netCDF data models: * [udunits](https://www.unidata.ucar.edu/software/udunits/): standard units #### Other useful variable attributes -* missing_value - * prefer over _FillValue - * NaN is a good option +* _FillValue + * missing_value is considered deprecated and is not recommended by the NetCDF Users Group. + * NaN is another option, however, NaNs in files are handled differently in every language and so it may + be better to pick a value for official data products that many users will be using * valid_range, valid_min, valid_max * scale_factor, add_offset (packed values) * [cell_methods](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#_data_representative_of_cells): standards for representing data cells (bins) @@ -103,6 +106,7 @@ There are two netCDF data models: ## Useful Links * [NetCDF User's Guide](https://docs.unidata.ucar.edu/nug/current/) * [NetCDF ToolsUI](https://docs.unidata.ucar.edu/netcdf-java/current/userguide/toolsui_ref.html) +* [NetCDF Workshop Materials](https://www.unidata.ucar.edu/software/netcdf/workshops/2011/index.html) Credit: Content taken from a Confluence guide written by Doug Lindholm \ No newline at end of file diff --git a/docs/source/_static/data_management/index.rst b/docs/source/_static/data_management/index.rst new file mode 100644 index 0000000..aa55ee4 --- /dev/null +++ b/docs/source/_static/data_management/index.rst @@ -0,0 +1,8 @@ +Data Management +=============== + + +.. toctree:: + :maxdepth: 1 + + file_formats/index \ No newline at end of file diff --git a/docs/source/index.rst b/docs/source/index.rst index 0203a47..6593bea 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -7,3 +7,4 @@ Welcome to the LASP Developer's Guide! :maxdepth: 1 licensing + data_management/index