From f17a32d7a4272c8e8be968e6ebffb067cc7d916d Mon Sep 17 00:00:00 2001 From: Veronica Martinez Date: Fri, 20 Dec 2024 11:01:26 -0700 Subject: [PATCH 1/7] Add guides for metadata and FAIR principles --- .../source/data_management/fair_principles.md | 131 ++++++++++++++ docs/source/data_management/metadata.md | 169 ++++++++++++++++++ 2 files changed, 300 insertions(+) create mode 100644 docs/source/data_management/fair_principles.md create mode 100644 docs/source/data_management/metadata.md diff --git a/docs/source/data_management/fair_principles.md b/docs/source/data_management/fair_principles.md new file mode 100644 index 0000000..be0c76a --- /dev/null +++ b/docs/source/data_management/fair_principles.md @@ -0,0 +1,131 @@ +# The FAIR Principles + +The FAIR Principles are a *"set of standards that connects researchers, publishers, +and data repositories in Earth, space, and environmental sciences to ... accelerate +scientific discovery and enhance the integrity, transparency, and reproducibility of +scientific data on a large scale"* ([COPDESS](https://copdess.org/enabling-fair-data-project/)). +Essentially, scientific data should be **Findable, Accessible, Interoperable, and Reusable** + +## Why FAIR Principles Matter + +More important than knowing *what* the FAIR goals are is understanding *why* they matter. +The nature of performing science is changing, including shifts in scientific publication +and peer review. Key changes include: + +- Scientific analyses are being encoded in repeatable, shareable workflows. +- Publications are moving away from static print documentation to interactive demonstrations online. + +These capabilities rely completely on the availability of the data underpinning the research. + +The FAIR principles were originally articulated by [FORCE11](https://www.force11.org/about), +an organization founded on the belief that *"semantically enhanced, media-rich digital +publishing will be more powerful than traditional print media or electronic copies of printed works."* + +FAIR principles have been adopted by researchers, publishers, and data repositories affiliated +with [COPDESS](https://copdess.org/enabling-fair-data-project/), the Coalition on Publishing Data +in the Earth and Space Sciences. Partners include: + +- Publications such as *Nature* and *Science* +- Funding agencies like NASA, USGS, NOAA, and NIH +- Professional groups like AGU + +For a full list of FAIR partners, see the [COPDESS FAIR Data Project](https://copdess.org/enabling-fair-data-project/). +To view the list of signatories committed to FAIR data, visit the [Statement of Commitment](https://copdess.org/statement-of-commitment/). + +--- + +## FAIR Principles + +The following are synopsized descriptions of the FAIR principles, +adopted from [GO FAIR](https://www.go-fair.org/fair-principles/). + +### **Findable** + +The first step in (re)using data is to find them. Metadata and data should be easy +to find for both humans and computers. Machine-readable metadata are essential for +automatic discovery of datasets and services. + +**Findable characteristics include:** + +- Data and metadata are assigned globally unique and persistent identifiers. +- Data are described by rich metadata that clearly includes the identifier of the data they describe. +- Data and metadata are registered or indexed in a searchable resource. + +--- + +### **Accessible** + +Once the user finds the required data, they need to know how to access them, including details about authentication and authorization. + +**Characteristics of being accessible include:** + +- Data and metadata are retrievable using common protocols. +- The protocol is open and free. +- Authentication and authorization procedures are applied where necessary. +- Metadata remain accessible even when data are no longer available. + +--- + +### **Interoperable** + +Data usually need to be integrated with other data. Additionally, data must +interoperate with applications or workflows for analysis, storage, and processing. + +**Interoperable data and metadata:** + +- Use a formal, accessible, broadly applicable language for knowledge representation. +- Use vocabularies that follow FAIR principles. +- Include qualified references to other data and metadata. + +--- + +### **Reusable** + +The ultimate goal of FAIR is to optimize data reuse. To achieve this, metadata and data +must be well-described to enable replication and/or combination in different settings. + +**Reusable data and metadata:** + +- Have clear, accessible data usage licenses. +- Are associated with detailed provenance. +- Meet domain-relevant community standards. + +--- + +## How to Apply FAIR Principles + +1. **Adopt FAIR-Compliant Practices**: + - Assign persistent identifiers to datasets and metadata. + - Use rich metadata that describe datasets thoroughly. + +2. **Register Metadata**: + - Index metadata in searchable repositories to enhance discoverability. + +3. **Implement Standards for Access and Interoperability**: + - Ensure retrieval protocols are open and free. + - Use FAIR-aligned vocabularies and knowledge representation languages. + +4. **Provide Reuse Guidance**: + - Include detailed provenance information. + - Apply clear licenses for data usage. + +5. **Collaborate with FAIR Partners**: + - Follow practices adopted by FORCE11, COPDESS, and similar organizations. + +## Useful Links + +- [COPDESS FAIR Data Project](https://copdess.org/enabling-fair-data-project/) +- [Statement of Commitment to FAIR Data](https://copdess.org/statement-of-commitment/) +- [GO FAIR Principles](https://www.go-fair.org/fair-principles/) + +## Acronyms + +- **FAIR** = Findable, Accessible, Interoperable, Reusable +- **FORCE11** = The Future of Research Communication and e-Scholarship +- **COPDESS** = Coalition on Publishing Data in the Earth and Space Sciences +- **AGU** = American Geophysical Union +- **NASA** = National Aeronautics and Space Administration +- **NIH** = National Institutes of Health +- **NOAA** = National Oceanic and Atmospheric Administration + +Credit: Content taken from a Confluence guide written by Anne Wilson, and modified by Shawn Polson in 2019 \ No newline at end of file diff --git a/docs/source/data_management/metadata.md b/docs/source/data_management/metadata.md new file mode 100644 index 0000000..cbe6c5f --- /dev/null +++ b/docs/source/data_management/metadata.md @@ -0,0 +1,169 @@ +# Metadata + +## Purpose + +Metadata supports data science workflows by: + +- Ensuring datasets are discoverable and usable by both humans and machines. +- Meeting internal and external policies for data accessibility and preservation. +- Enhancing collaboration by providing clear and standardized metadata practices. +- Contributing to the overall success of projects by enabling proper data usage and interoperability. + +## What is Metadata + +A dataset generally consists of sets of measured or modeled values. However, the values alone are +insufficient to understand and use that dataset. Consider this example of a very small dataset: + +**Temperature: 31.5** + +The data point “Temperature: 31.5” raises many questions: + +- Temperature of what? +- According to whom or what? +- Collected when/where? +- Measured or calculated? +- If calculated, how? +- What units? +- To what precision? + +To make this dataset FAIR (Findable, Accessible, Interoperable, and Reusable), additional information is needed. + +Metadata is information (data) about a dataset. It includes: + +- Time and spatial coverages and cadences +- Units +- Processing level +- Data quality +- Instrument details +- Principal Investigator +- Provenance +- Special alerts, etc. + +Ideally, metadata provides all the information necessary to find, understand, +and use the dataset correctly. Good quality metadata is critical for data to be FAIR. + +## Benefits of Good Quality Metadata + +Good quality, searchable metadata enables people to find data that fits their needs: + +- **Good quality**: Sufficient information is provided. +- **Searchable**: Users can find data by various facets like spatial or temporal coverage. + +## Metadata Storage, Formats, and Access + +### Storage Options + +The best practices for metadata storage include: + +1. **Machine-readable metadata** consumable by common tools +2. **Publicly accessible metadata** readable by humans. +3. Avoid private, inaccessible formats like personal notebooks or sticky notes. + +#### Examples of Metadata Storage + +- **Prose embedded in HTML**: Readable by humans but not easily consumable by tools. +- **Public spreadsheets**: Readable by tools that understand the structure but not widely accessible otherwise. +- **Self-describing formats**: Examples include: + - **NetCDF, HDF, FITS**: Include specific metadata properties like variables, geospatial coverage, and time coverage. + - **Header information** in CSV or ASCII tables: + - Simple but less machine-readable. + +Machine readability often depends on established metadata conventions, such as +**Climate and Forecast (CF) conventions** used widely in atmospheric science ([More details here](https://www.unidata.ucar.edu/software/netcdf/workshops/most-recent/cf/index.html)). + +### LASP Metadata Repository + +LASP is developing the **LASP Extended Metadata Repository (LEMR)** to store and access dataset metadata: + +- Automates and dynamically accesses essential properties for data services. +- Plans to extend metadata management capabilities for LASP scientists. + +## Metadata Formats + +Metadata formats refer to schemas describing the metadata structure. Examples include: + +- **ISO 19115**: Geographic information and services. +- **SPASE**: Used in Heliophysics. + +At LASP, the **laspds schema** is used for applications serving data, with plans to +integrate with standard schemas like SPASE and ISO 19115. + +## What Metadata to Save + +### Key Considerations + +At project inception: + +- Identify essential metadata for understanding and using the dataset. +- Create a plan to preserve this information. + +### Balancing Minimal and Comprehensive Metadata + +Repositories often balance between minimal metadata (to lower barriers for participation) +and sufficient metadata for full dataset understanding. Repositories recognize that providing +quality metadata takes resources. + +- Example: **CU Scholar** requires: + - Landing page URL + - Names of dataset creators + - Title + - Publishing organization + - Resource type + +This information alone would not be sufficient to use a dataset, but it is sufficient +to allow CU Scholar to serve the dataset. CU Scholar expects additional details +(e.g., coverages, units, quality indicators) to be available on the landing page or +via self-describing formats. + +## Provenance + +The **provenance** of a dataset describes its history and is critical for using datasets correctly: + +- Origin of the data +- Processing methods +- Calibration and validation details +- Software versions used + +Data producers should record: + +- Dataset inputs +- Processing steps +- Configuration, calibration, and validation details + +Provenance is often provided as descriptive prose, making machine-readable text a reasonable option. + +**Learn More**: [The Importance of Data Set Provenance for Science](https://eos.org/opinions/the-importance-of-data-set-provenance-for-science). + +## Summary of Metadata Workflow + +1. **Identify Necessary Metadata**: + - At project inception, determine what metadata is essential for understanding and using the dataset. +2. **Choose the Appropriate Storage Option**: + - Use machine-readable formats like NetCDF or HDF where possible. + - For simpler use cases, include metadata in file headers or spreadsheets, ensuring structure is clear. +3. **Follow Metadata Conventions**: + - Adhere to standards for machine-readability. + - Consult metadata experts when encoding complex datasets. +4. **Leverage LASP’s Tools**: + - Use the **LASP Extended Metadata Repository (LEMR)** for automated and dynamic metadata management if applicable. + - Work with LASP administrators to input metadata into LEMR. +5. **Maintain Provenance**: + - Record dataset inputs, processing, calibration, and validation details. + - Provide descriptive prose or structured metadata to ensure provenance is clear and traceable. + +## Useful Links + +- [CF Conventions for NetCDF](https://www.unidata.ucar.edu/software/netcdf/workshops/most-recent/cf/index.html) +- [The Importance of Dataset Provenance for Science](https://eos.org/opinions/the-importance-of-data-set-provenance-for-science) +- [NASA DOI Landing Page Requirements](https://wiki.earthdata.nasa.gov/display/DOIsforEOSDIS/DOI+Landing+Page) +- [CU Scholar Metadata Requirements](https://scholar.colorado.edu/faq) + +## Acronyms + +- **CF** = Climate and Forecast +- **FAIR** = Findable, Accessible, Interoperable, and Reusable +- **ISO** = International Organization for Standardization +- **LEMR** = LASP Extended Metadata Repository +- **SPASE** = Space Physics Archive Search and Extract + +Credit: Content taken from a Confluence guide written by Anne Wilson and Shawn Polson. \ No newline at end of file From 876d25aa897852f1f2277c5792eff6a0568e130a Mon Sep 17 00:00:00 2001 From: Veronica Martinez Date: Fri, 20 Dec 2024 11:04:24 -0700 Subject: [PATCH 2/7] add new guidelines to index.rst --- docs/source/data_management/index.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/source/data_management/index.rst b/docs/source/data_management/index.rst index aa55ee4..e0a7f04 100644 --- a/docs/source/data_management/index.rst +++ b/docs/source/data_management/index.rst @@ -5,4 +5,6 @@ Data Management .. toctree:: :maxdepth: 1 - file_formats/index \ No newline at end of file + file_formats/index + metadata.md + fair_principles.md \ No newline at end of file From 27dc634a72362f2567c3cef682f2085a7422d75b Mon Sep 17 00:00:00 2001 From: Veronica Martinez Date: Fri, 20 Dec 2024 11:36:11 -0700 Subject: [PATCH 3/7] Address markdown lint errors identified in PR code checks --- .../workflows/open_source/lasp_github_org.md | 60 +++++++++++++------ 1 file changed, 41 insertions(+), 19 deletions(-) diff --git a/docs/source/workflows/open_source/lasp_github_org.md b/docs/source/workflows/open_source/lasp_github_org.md index a7a6e3d..1ba1546 100644 --- a/docs/source/workflows/open_source/lasp_github_org.md +++ b/docs/source/workflows/open_source/lasp_github_org.md @@ -1,8 +1,13 @@ # LASP GitHub Organization -LASP has a public-facing Github organization, which can be used as an umbrella source for any open source repositories under LASP. We prefer to keep open source projects under a general LASP umbrella where it makes sense, so this project is not restricted to specific projects or maturity. However, projects under this heading do need to follow a few rules. +LASP has a public-facing GitHub organization, which can be used as an umbrella source +for any open source repositories under LASP. We prefer to keep open source projects +under a general LASP umbrella where it makes sense, so this project is not restricted +to specific projects or maturity. However, projects under this heading do need to +follow a few rules. + +## General Guidelines -### General Guidelines All repositories and maintainers should abide by the following rules: 1. Repositories must have at least one admin maintainer who actively works at LASP @@ -10,50 +15,67 @@ All repositories and maintainers should abide by the following rules: 3. Repositories must include a license and a README.md 4. Repository names should be clear, specific, and unique within the organization -### GitHub Organization Maintainers -The Github at LASP Open Source Stewards (GLOSS) is a committee of volunteers committed to helping those at LASP with open source project management. This committee consists of volunteers that have agreed to answer questions, set up Github repositories within the LASP organization, and support open source projects within LASP. +## GitHub Organization Maintainers + +The GitHub at LASP Open Source Stewards (GLOSS) is a committee of volunteers committed +to helping those at LASP with open source project management. This committee consists +of volunteers that have agreed to answer questions, set up GitHub repositories within +the LASP organization, and support open source projects within LASP. + +### GLOSS Requirements + +Assist with answering questions in slack or via email about open source project management, +open source tools, and the LASP GitHub organization + +* Members should be experienced with GitHub project management +* Create open source projects under the [LASP GitHub Organization](https://github.com/lasp) +* Help ensure all projects follow the guidelines laid out in the LASP GitHub Organization rules -#### GLOSS Requirements -* Assist with answering questions in slack or via email about open source project management, open source tools, and the LASP Github organization -* Members should be experienced with Github project management -* Create open source projects under the [LASP Github Organization](https://github.com/lasp) -* Help ensure all projects follow the guidelines laid out in the LASP Github Organization rules -* Help ensure all projects and maintainers follow the LASP open source [Code of Conduct](https://github.com/lasp/repository-template/blob/main/CODE_OF_CONDUCT.md) +* Help ensure all projects and maintainers follow the LASP open source + [Code of Conduct](https://github.com/lasp/repository-template/blob/main/CODE_OF_CONDUCT.md) +### Current Members -#### Current Members: * Maxine Hartnett * Keira Brooks * Matthew Bourque * Rita Borelli -If you are interested in joining GLOSS, please reach out to the #open-source slack channel or email a current maintainer. +If you are interested in joining GLOSS, please reach out to the #open-source slack channel +or email a current maintainer. ## How to create a repository under the LASP GitHub Organization -To create a repository within the organization, you must send a message to the slack channel #open-source or email one of the maintainers with the following information: + +To create a repository within the organization, you must send a message to the slack channel #open-source +or email one of the maintainers with the following information: 1. Repository name 2. Initial maintainer(s) 3. Repository description -By default, repositories are expected to be public. If you want to create a private repo, please reach out for an exception. Repositories will be created with the LASP [template](https://github.com/lasp/repository-template). +By default, repositories are expected to be public. If you want to create a private repo, +please reach out for an exception. Repositories will be created with the LASP [template](https://github.com/lasp/repository-template). ### Repository Requirements + All repositories are required to contain the following files: * License * This license must be an MIT license with a copyright to the University of Colorado as follows: * Copyright (c) 202x The Regents of the University of Colorado * README.md - * This should include a basic description of your project at minimum, although we also encourage installation notes, a user guide, contributor guide, and any other information that someone using your project would want to know. - -Beyond those requirements, repository management is up to the individual. However, the #open-source channel is available for anyone with questions about best practices, repository management, releasing projects, or anything related to open source development. + * This should include a basic description of your project at minimum, although we also encourage + installation notes, a user guide, contributor guide, and any other information that someone using + your project would want to know. +Beyond those requirements, repository management is up to the individual. However, the #open-source +channel is available for anyone with questions about best practices, repository management, releasing +projects, or anything related to open source development. ## Acronyms -List of acronyms used in the guide -* **GLOSS** = Github at LASP Open Source Stewards +List of acronyms used in the guide +* **GLOSS** = GitHub at LASP Open Source Stewards Credit: Content taken from a Confluence guide written by Maxine Hartnett From 5ceebb7d36dc724ef6660dda23e3e553ef2bf7be Mon Sep 17 00:00:00 2001 From: Veronica Martinez <39746325+vmartinez-cu@users.noreply.github.com> Date: Thu, 2 Jan 2025 16:02:57 -0700 Subject: [PATCH 4/7] add period to end of sentence Co-authored-by: Matthew Bourque --- docs/source/data_management/fair_principles.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/data_management/fair_principles.md b/docs/source/data_management/fair_principles.md index be0c76a..838e431 100644 --- a/docs/source/data_management/fair_principles.md +++ b/docs/source/data_management/fair_principles.md @@ -4,7 +4,7 @@ The FAIR Principles are a *"set of standards that connects researchers, publishe and data repositories in Earth, space, and environmental sciences to ... accelerate scientific discovery and enhance the integrity, transparency, and reproducibility of scientific data on a large scale"* ([COPDESS](https://copdess.org/enabling-fair-data-project/)). -Essentially, scientific data should be **Findable, Accessible, Interoperable, and Reusable** +Essentially, scientific data should be **Findable, Accessible, Interoperable, and Reusable.** ## Why FAIR Principles Matter From c3b9b44fdf5f9b9267a090d21e56d28eb084e9fd Mon Sep 17 00:00:00 2001 From: Veronica Martinez <39746325+vmartinez-cu@users.noreply.github.com> Date: Thu, 2 Jan 2025 16:06:33 -0700 Subject: [PATCH 5/7] bold font to match formatting of other items in list Co-authored-by: Matthew Bourque --- docs/source/data_management/metadata.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/data_management/metadata.md b/docs/source/data_management/metadata.md index cbe6c5f..3b8396a 100644 --- a/docs/source/data_management/metadata.md +++ b/docs/source/data_management/metadata.md @@ -57,7 +57,7 @@ The best practices for metadata storage include: 1. **Machine-readable metadata** consumable by common tools 2. **Publicly accessible metadata** readable by humans. -3. Avoid private, inaccessible formats like personal notebooks or sticky notes. +3. **Avoid private, inaccessible formats** like personal notebooks or sticky notes. #### Examples of Metadata Storage From 15fc7d0bbaaeccc0e95769cef841d923a2ec387a Mon Sep 17 00:00:00 2001 From: Veronica Martinez Date: Thu, 2 Jan 2025 16:25:48 -0700 Subject: [PATCH 6/7] Minor formatting updates. Link to metadata guide --- docs/source/data_management/fair_principles.md | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/docs/source/data_management/fair_principles.md b/docs/source/data_management/fair_principles.md index 838e431..6110199 100644 --- a/docs/source/data_management/fair_principles.md +++ b/docs/source/data_management/fair_principles.md @@ -51,8 +51,6 @@ automatic discovery of datasets and services. - Data are described by rich metadata that clearly includes the identifier of the data they describe. - Data and metadata are registered or indexed in a searchable resource. ---- - ### **Accessible** Once the user finds the required data, they need to know how to access them, including details about authentication and authorization. @@ -64,8 +62,6 @@ Once the user finds the required data, they need to know how to access them, inc - Authentication and authorization procedures are applied where necessary. - Metadata remain accessible even when data are no longer available. ---- - ### **Interoperable** Data usually need to be integrated with other data. Additionally, data must @@ -77,8 +73,6 @@ interoperate with applications or workflows for analysis, storage, and processin - Use vocabularies that follow FAIR principles. - Include qualified references to other data and metadata. ---- - ### **Reusable** The ultimate goal of FAIR is to optimize data reuse. To achieve this, metadata and data @@ -112,11 +106,15 @@ must be well-described to enable replication and/or combination in different set 5. **Collaborate with FAIR Partners**: - Follow practices adopted by FORCE11, COPDESS, and similar organizations. +--- + ## Useful Links +- [Metadata Overview](metadata.md) - [COPDESS FAIR Data Project](https://copdess.org/enabling-fair-data-project/) - [Statement of Commitment to FAIR Data](https://copdess.org/statement-of-commitment/) - [GO FAIR Principles](https://www.go-fair.org/fair-principles/) +- [FORCE11](https://www.force11.org/about) ## Acronyms From 6a603057798d97c41ed55271fc91156e62dba6a7 Mon Sep 17 00:00:00 2001 From: Veronica Martinez Date: Thu, 2 Jan 2025 16:59:14 -0700 Subject: [PATCH 7/7] reference metadata doc in FAIR pricinples guide. Add links to SPASE and ISO 19115 --- docs/source/data_management/fair_principles.md | 2 +- docs/source/data_management/metadata.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/data_management/fair_principles.md b/docs/source/data_management/fair_principles.md index 6110199..61b1b27 100644 --- a/docs/source/data_management/fair_principles.md +++ b/docs/source/data_management/fair_principles.md @@ -41,7 +41,7 @@ adopted from [GO FAIR](https://www.go-fair.org/fair-principles/). ### **Findable** -The first step in (re)using data is to find them. Metadata and data should be easy +The first step in (re)using data is to find them. [Metadata](metadata.md) and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services. diff --git a/docs/source/data_management/metadata.md b/docs/source/data_management/metadata.md index 3b8396a..be82b40 100644 --- a/docs/source/data_management/metadata.md +++ b/docs/source/data_management/metadata.md @@ -82,8 +82,8 @@ LASP is developing the **LASP Extended Metadata Repository (LEMR)** to store and Metadata formats refer to schemas describing the metadata structure. Examples include: -- **ISO 19115**: Geographic information and services. -- **SPASE**: Used in Heliophysics. +- **[ISO 19115](https://www.fgdc.gov/metadata/iso-standards)**: Geographic information and services. +- **[SPASE](https://spdf.gsfc.nasa.gov/spdf-documents/SPASE_and_SPDF.html)**: Used in Heliophysics. At LASP, the **laspds schema** is used for applications serving data, with plans to integrate with standard schemas like SPASE and ISO 19115.