lasp · vmartinez-cu · Jan 3, 2025 · Dec 20, 2024 · Dec 20, 2024 · Dec 20, 2024
diff --git a/docs/source/data_management/fair_principles.md b/docs/source/data_management/fair_principles.md
@@ -0,0 +1,129 @@
+# The FAIR Principles
+
+The FAIR Principles are a *"set of standards that connects researchers, publishers,
+and data repositories in Earth, space, and environmental sciences to ... accelerate
+scientific discovery and enhance the integrity, transparency, and reproducibility of
+scientific data on a large scale"* ([COPDESS](https://copdess.org/enabling-fair-data-project/)).
+Essentially, scientific data should be **Findable, Accessible, Interoperable, and Reusable.**
+
+## Why FAIR Principles Matter
+
+More important than knowing *what* the FAIR goals are is understanding *why* they matter.
+The nature of performing science is changing, including shifts in scientific publication
+and peer review. Key changes include:
+
+- Scientific analyses are being encoded in repeatable, shareable workflows.
+- Publications are moving away from static print documentation to interactive demonstrations online.
+
+These capabilities rely completely on the availability of the data underpinning the research.
+
+The FAIR principles were originally articulated by [FORCE11](https://www.force11.org/about),
+an organization founded on the belief that *"semantically enhanced, media-rich digital
+publishing will be more powerful than traditional print media or electronic copies of printed works."*
+
+FAIR principles have been adopted by researchers, publishers, and data repositories affiliated
+with [COPDESS](https://copdess.org/enabling-fair-data-project/), the Coalition on Publishing Data
+in the Earth and Space Sciences. Partners include:
+
+- Publications such as *Nature* and *Science*
+- Funding agencies like NASA, USGS, NOAA, and NIH
+- Professional groups like AGU
+
+For a full list of FAIR partners, see the [COPDESS FAIR Data Project](https://copdess.org/enabling-fair-data-project/).
+To view the list of signatories committed to FAIR data, visit the [Statement of Commitment](https://copdess.org/statement-of-commitment/).
+
+---
+
+## FAIR Principles
+
+The following are synopsized descriptions of the FAIR principles,
+adopted from [GO FAIR](https://www.go-fair.org/fair-principles/).
+
+### **Findable**
+
+The first step in (re)using data is to find them. [Metadata](metadata.md) and data should be easy
+to find for both humans and computers. Machine-readable metadata are essential for
+automatic discovery of datasets and services.
+
+**Findable characteristics include:**
+
+- Data and metadata are assigned globally unique and persistent identifiers.
+- Data are described by rich metadata that clearly includes the identifier of the data they describe.
+- Data and metadata are registered or indexed in a searchable resource.
+
+### **Accessible**
+
+Once the user finds the required data, they need to know how to access them, including details about authentication and authorization.
+
+**Characteristics of being accessible include:**
+
+- Data and metadata are retrievable using common protocols.
+- The protocol is open and free.
+- Authentication and authorization procedures are applied where necessary.
+- Metadata remain accessible even when data are no longer available.
+
+### **Interoperable**
+
+Data usually need to be integrated with other data. Additionally, data must
+interoperate with applications or workflows for analysis, storage, and processing.
+
+**Interoperable data and metadata:**
+
+- Use a formal, accessible, broadly applicable language for knowledge representation.
+- Use vocabularies that follow FAIR principles.
+- Include qualified references to other data and metadata.
+
+### **Reusable**
+
+The ultimate goal of FAIR is to optimize data reuse. To achieve this, metadata and data
+must be well-described to enable replication and/or combination in different settings.
+
+**Reusable data and metadata:**
+
+- Have clear, accessible data usage licenses.
+- Are associated with detailed provenance.
+- Meet domain-relevant community standards.
+
+---
+
+## How to Apply FAIR Principles
+
+1. **Adopt FAIR-Compliant Practices**:
+  - Assign persistent identifiers to datasets and metadata.
+  - Use rich metadata that describe datasets thoroughly.
+
+2. **Register Metadata**:
+  - Index metadata in searchable repositories to enhance discoverability.
+
+3. **Implement Standards for Access and Interoperability**:
+  - Ensure retrieval protocols are open and free.
+  - Use FAIR-aligned vocabularies and knowledge representation languages.
+
+4. **Provide Reuse Guidance**:
+  - Include detailed provenance information.
+  - Apply clear licenses for data usage.
+
+5. **Collaborate with FAIR Partners**:
+  - Follow practices adopted by FORCE11, COPDESS, and similar organizations.
+
+---
+
+## Useful Links
+
+- [Metadata Overview](metadata.md)
+- [COPDESS FAIR Data Project](https://copdess.org/enabling-fair-data-project/)
+- [Statement of Commitment to FAIR Data](https://copdess.org/statement-of-commitment/)
+- [GO FAIR Principles](https://www.go-fair.org/fair-principles/)
+- [FORCE11](https://www.force11.org/about)
+
+## Acronyms
+
+- **FAIR** = Findable, Accessible, Interoperable, Reusable
+- **FORCE11** = The Future of Research Communication and e-Scholarship
+- **COPDESS** = Coalition on Publishing Data in the Earth and Space Sciences
+- **AGU** = American Geophysical Union
+- **NASA** = National Aeronautics and Space Administration
+- **NIH** = National Institutes of Health
+- **NOAA** = National Oceanic and Atmospheric Administration
+
+Credit: Content taken from a Confluence guide written by Anne Wilson, and modified by Shawn Polson in 2019
diff --git a/docs/source/data_management/index.rst b/docs/source/data_management/index.rst
@@ -5,4 +5,6 @@ Data Management
 .. toctree::
    :maxdepth: 1
 
-   file_formats/index
+   file_formats/index
+   metadata.md
+   fair_principles.md
diff --git a/docs/source/data_management/metadata.md b/docs/source/data_management/metadata.md
@@ -0,0 +1,169 @@
+# Metadata
+
+## Purpose
+
+Metadata supports data science workflows by:
+
+- Ensuring datasets are discoverable and usable by both humans and machines.
+- Meeting internal and external policies for data accessibility and preservation.
+- Enhancing collaboration by providing clear and standardized metadata practices.
+- Contributing to the overall success of projects by enabling proper data usage and interoperability.
+
+## What is Metadata
+
+A dataset generally consists of sets of measured or modeled values. However, the values alone are
+insufficient to understand and use that dataset. Consider this example of a very small dataset:
+
+**Temperature: 31.5**
+
+The data point “Temperature: 31.5” raises many questions:
+
+- Temperature of what?
+- According to whom or what?
+- Collected when/where?
+- Measured or calculated?
+- If calculated, how?
+- What units?
+- To what precision?
+
+To make this dataset FAIR (Findable, Accessible, Interoperable, and Reusable), additional information is needed.
+
+Metadata is information (data) about a dataset. It includes:
+
+- Time and spatial coverages and cadences
+- Units
+- Processing level
+- Data quality
+- Instrument details
+- Principal Investigator
+- Provenance
+- Special alerts, etc.
+
+Ideally, metadata provides all the information necessary to find, understand,
+and use the dataset correctly. Good quality metadata is critical for data to be FAIR.
+
+## Benefits of Good Quality Metadata
+
+Good quality, searchable metadata enables people to find data that fits their needs:
+
+- **Good quality**: Sufficient information is provided.
+- **Searchable**: Users can find data by various facets like spatial or temporal coverage.
+
+## Metadata Storage, Formats, and Access
+
+### Storage Options
+
+The best practices for metadata storage include:
+
+1. **Machine-readable metadata** consumable by common tools
+2. **Publicly accessible metadata** readable by humans.
+3. **Avoid private, inaccessible formats** like personal notebooks or sticky notes.
+
+#### Examples of Metadata Storage
+
+- **Prose embedded in HTML**: Readable by humans but not easily consumable by tools.
+- **Public spreadsheets**: Readable by tools that understand the structure but not widely accessible otherwise.
+- **Self-describing formats**: Examples include:
+  - **NetCDF, HDF, FITS**: Include specific metadata properties like variables, geospatial coverage, and time coverage.
+  - **Header information** in CSV or ASCII tables:
+    - Simple but less machine-readable.
+
+Machine readability often depends on established metadata conventions, such as
+**Climate and Forecast (CF) conventions** used widely in atmospheric science ([More details here](https://www.unidata.ucar.edu/software/netcdf/workshops/most-recent/cf/index.html)).
+
+### LASP Metadata Repository
+
+LASP is developing the **LASP Extended Metadata Repository (LEMR)** to store and access dataset metadata:
+
+- Automates and dynamically accesses essential properties for data services.
+- Plans to extend metadata management capabilities for LASP scientists.
+
+## Metadata Formats
+
+Metadata formats refer to schemas describing the metadata structure. Examples include:
+
+- **[ISO 19115](https://www.fgdc.gov/metadata/iso-standards)**: Geographic information and services.
+- **[SPASE](https://spdf.gsfc.nasa.gov/spdf-documents/SPASE_and_SPDF.html)**: Used in Heliophysics.
+
+At LASP, the **laspds schema** is used for applications serving data, with plans to
+integrate with standard schemas like SPASE and ISO 19115.
+
+## What Metadata to Save
+
+### Key Considerations
+
+At project inception:
+
+- Identify essential metadata for understanding and using the dataset.
+- Create a plan to preserve this information.
+
+### Balancing Minimal and Comprehensive Metadata
+
+Repositories often balance between minimal metadata (to lower barriers for participation)
+and sufficient metadata for full dataset understanding. Repositories recognize that providing
+quality metadata takes resources.
+
+- Example: **CU Scholar** requires:
+  - Landing page URL
+  - Names of dataset creators
+  - Title
+  - Publishing organization
+  - Resource type
+
+This information alone would not be sufficient to use a dataset, but it is sufficient
+to allow CU Scholar to serve the dataset. CU Scholar expects additional details
+(e.g., coverages, units, quality indicators) to be available on the landing page or
+via self-describing formats.
+
+## Provenance
+
+The **provenance** of a dataset describes its history and is critical for using datasets correctly:
+
+- Origin of the data
+- Processing methods
+- Calibration and validation details
+- Software versions used
+
+Data producers should record:
+
+- Dataset inputs
+- Processing steps
+- Configuration, calibration, and validation details
+
+Provenance is often provided as descriptive prose, making machine-readable text a reasonable option.
+
+**Learn More**: [The Importance of Data Set Provenance for Science](https://eos.org/opinions/the-importance-of-data-set-provenance-for-science).
+
+## Summary of Metadata Workflow
+
+1. **Identify Necessary Metadata**:
+  - At project inception, determine what metadata is essential for understanding and using the dataset.
+2. **Choose the Appropriate Storage Option**:
+  - Use machine-readable formats like NetCDF or HDF where possible.
+  - For simpler use cases, include metadata in file headers or spreadsheets, ensuring structure is clear.
+3. **Follow Metadata Conventions**:
+  - Adhere to standards for machine-readability.
+  - Consult metadata experts when encoding complex datasets.
+4. **Leverage LASP’s Tools**:
+  - Use the **LASP Extended Metadata Repository (LEMR)** for automated and dynamic metadata management if applicable.
+  - Work with LASP administrators to input metadata into LEMR.
+5. **Maintain Provenance**:
+  - Record dataset inputs, processing, calibration, and validation details.
+  - Provide descriptive prose or structured metadata to ensure provenance is clear and traceable.
+
+## Useful Links
+
+- [CF Conventions for NetCDF](https://www.unidata.ucar.edu/software/netcdf/workshops/most-recent/cf/index.html)
+- [The Importance of Dataset Provenance for Science](https://eos.org/opinions/the-importance-of-data-set-provenance-for-science)
+- [NASA DOI Landing Page Requirements](https://wiki.earthdata.nasa.gov/display/DOIsforEOSDIS/DOI+Landing+Page)
+- [CU Scholar Metadata Requirements](https://scholar.colorado.edu/faq)
+
+## Acronyms
+
+- **CF** = Climate and Forecast
+- **FAIR** = Findable, Accessible, Interoperable, and Reusable
+- **ISO** = International Organization for Standardization
+- **LEMR** = LASP Extended Metadata Repository
+- **SPASE** = Space Physics Archive Search and Extract
+
+Credit: Content taken from a Confluence guide written by Anne Wilson and Shawn Polson.