Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/metadata_assessment.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
147 changes: 147 additions & 0 deletions docs/source/data_management/data_stewardship.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Data Stewardship at LASP

LASP has a hard-earned, world-class reputation for creating valuable, high-quality
datasets. These datasets, often paid for by U.S. taxpayers, are our legacy.
Comment on lines +3 to +4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be LASP's mission statement 😂


## Purpose

The world of science is rapidly evolving, particularly in how research data is managed
and how science is conducted. Expectations from government agencies, OSTP, and publishers
around open and accessible research data have significant implications for:

- Scientific conduct and integrity
- Funding for science
- Scientific publication
- Attribution and rewards

By following best practices in data stewardship, LASP ensures its datasets remain
accessible, reproducible, and valuable for the scientific community.

## Key principles

Several key principles guide data stewardship and open science at LASP:

### Open Data

Funding agencies now expect data—and, in some cases, the software that created it—to
be made available beyond the project that produced it. Agencies such as USGS, NASA,
NOAA, and NSF require data management plans in proposals, and funded projects must
meet these responsibilities. This generally includes:

- Making data and metadata publicly accessible.
- Ensuring machine-readability for automated tools.

Organizations such as ESIP, RDA, ESDSWG, and CODATA work to enhance data management and
sharing nationally and internationally.

### Enabling a Science Ecosystem

Scientific artifacts should be [preserved as a lasting world heritage](https://www.agu.org/share-and-advocate/share/policymakers/position-statements/position_data).
As scientific data grows in complexity and scale, interoperable tools, standards, and conventions play a crucial role
in simplifying and automating data processing, analysis, and metadata collection. Data that is
[FAIR](fair_principles.md)—findable, accessible, interoperable, and reusable—is essential to supporting these efforts.

### Reproducibility

A key issue in modern science is the [reproducibility crisis](https://en.wikipedia.org/wiki/Replication_crisis),
where peer reviewers find it difficult to replicate results from many publications. While complete reproducibility
may not always be feasible, best practices in data management ensure:

- Experiments can be rerun, validated, or verified.
- Data and documentation are identifiable and machine-readable.
- Automated tools can process and analyze data.
- Workflow and notebook tools enable shareable, reproducible workflows.

For more on improving reproducibility in Earth science research, see: [EOS article](https://eos.org/opinions/improving-reproducibility-in-earth-science-research).

### Return on Investment

Good data management maximizes the impact of a dataset, allowing for wider use now and in the future. This
benefits both science and society by increasing the return on investment in data collection and analysis.

### Scientific Publication

Publishing is evolving, with journals increasingly requiring:

- Data contributions alongside publications.
- Linking publications with datasets and executable code.

Organizations like [FORCE11](https://www.force11.org/about) advocate for semantically enhanced, media-rich
digital publishing, which is more powerful than traditional print media.

### Attribution for Datasets and Science Software

Funding agencies and publishers are promoting cultural shifts in how datasets and software are credited. High-quality
datasets and software are now recognized as independent scientific contributions.

Since datasets are often generated by software, proper software management is a critical component of data stewardship.
The [Software Sustainability Institute](https://www.software.ac.uk/) supports sustainable research software.

## How to apply these principles

To meet the expectations outlined above:

1. **Develop a Data Management Plan (DMP)**
- Address how data will be stored, accessed, and shared.
- Align with agency requirements (e.g., NASA, NSF).

2. **Ensure Open Access**
- Make datasets and metadata publicly available.
- Use machine-readable formats when possible.

3. **Use Standardized Metadata and Formats**
- Follow best practices for documentation and accessibility.
- Use tools that support automated metadata generation.

4. **Maintain Provenance and Reproducibility**
- Record data lineage, processing steps, and software versions.
- Use reproducible workflows and notebook tools.

5. **Publish Data Alongside Research**
- Link datasets with publications.
- Provide clear attribution for datasets and software.

6. **Engage with Data Stewardship Communities**
- Participate in organizations like ESIP, RDA, and CODATA.
- Follow emerging best practices in open science.
Comment on lines +84 to +106
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine an ideal world in which we have a lot of time and we could possibly create a guide for each of these! 🤞🏻


### Assessing data management maturity

To assess data stewardship maturity, consider the following rubrics:

![Data Management Assessment](../_static/data_management_assessment.png)
![Metadata Assessment](../_static/metadata_assessment.png)

Efforts to define levels of data stewardship maturity have produced useful rubrics that have been shown to help
repositories and projects evaluate and improve their data stewardship efforts. In particular, and similar to the
CMMI that was developed for process improvement, a maturity matrix for data stewardship was developed. A high-level
view of that matrix is presented here. Note that Level 3 was determined to be the recommended level of attainment
for operational digital products stewarded by national data centers.

![Data Stewardship Maturity Matrix](../_static/data_stewardship_maturity_matrix.png)

## Useful Links

- [FAIR Principles](fair_principles.md)
- [AGU Position Statement on Data](https://www.agu.org/share-and-advocate/share/policymakers/position-statements/position_data)
- [Improving Reproducibility in Earth Science Research](https://eos.org/opinions/improving-reproducibility-in-earth-science-research)
- [FORCE11 on Digital Publishing](https://www.force11.org/about)
- [Software Sustainability Institute](https://www.software.ac.uk/)

## Acronyms

- **AGU** = American Geophysical Union
- **CODATA** = Committee on Data for Science and Technology
- **DMP** = Data Management Plan
- **ESDSWG** = Earth Science Data Systems Working Group
- **ESIP** = Earth Science Information Partners
- **FAIR** = Findable, Accessible, Interoperable, and Reusable
- **FORCE11** = Future of Research Communication and e-Scholarship
- **NASA** = National Aeronautics and Space Administration
- **NOAA** = National Oceanic and Atmospheric Administration
- **NSF** = National Science Foundation
- **OSTP** = Office of Science and Technology Policy
- **RDA** = Research Data Alliance
- **USGS** = United States Geological Survey

Credit: Content adapted from a Confluence guide written by Anne Wilson and Shawn Polson.
3 changes: 2 additions & 1 deletion docs/source/data_management/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@ Data Management

file_formats/index
metadata.md
fair_principles.md
fair_principles.md
data_stewardship.md