Skip to content

mdeagen/nmcuration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

444 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NanoMine Curation

This repository is dedicated to the curation of published data from the polymer nanocomposites literature into a structured format for NanoMine.

Links and Resources

  • Data Science Productivity Tools: A free-to-audit course on edX covering RStudio, Git/Github, and Unix/Linux.
  • WebPlotDigitizer: Tool for semi-automated extraction of raw data from plots.
  • NanoMine Schema: XML Schema Definition (.xsd) files containing the full list of terms (refer to most recent schema; filenames contain date in MMDDYY format)
  • NanoMine Excel Template: Microsoft Excel workbook into which data for a given sample are assembled
  • Tidy Data: Principles for data tidiness laid out in R for Data Science by Hadley Wickham

Supplementary links related to the Knowledge Graph side of NanoMine

File and Folder Organization

Within a curation job's sub-directory, the file organization depends on what makes the most sense for the curator. However, the sub-directory should contain a "Traveler" (README.md) with the DOI and other information relevant to the curation process. Because data in NanoMine are uploaded on a per-sample basis, it is suggested to give each "sample" its own child sub-directory containing the completed Excel template along with any supplemental data files (.csv, .jpg, etc.).

Curation Workflow

There are five directories in this repository that can be considered "stages" of the curation process:

At the "Wishlist" stage, a curation job is prepared by creating a sub-directory, initializing a "Traveler" (README.md file) in the sub-directory, and identifying figures/data of interest. Once the raw data have been retrieved (either provided by the original authors or through a digital extraction tool),

The "In-Progress" stage should be kept as uncluttered as possible, with only those curation jobs that are actively in progress. Curation jobs should spend as little time as reasonable in this directory and should be moved to either "Completed" or "Stalled."

The "Completed" stage is designed to keep track of curation jobs that have already been uploaded to NanoMine QA. This is ideally the final location for the curation job, unless there are modifications or updates to make in which case the sub-directory should be moved to the "Revisited" directory.

If a significant roadblock is encountered, the curation job can be moved to the "Stalled" stage. Documenting the issue as clearly as possible will help the team make the necessary improvements or updates to the system. Once a solution has been identified, the job can be returned to the "In-Progress" directory.

If a curation job in the NanoMine system ("Completed") requires some revision, the sub-directory should be moved to the "Revision" stage. At this point, the issue should be clearly described before moving the curation job to the "In-Progress" stage.

The overall workflow is illustrated in the diagram below. Illustration of the NanoMine curation process

To collaboratively manage and keep track of changes to curation-related files, the git workflow will be adopted. Raw data tables and the code used to prepare the raw data should be included in a shared repository (e.g. Dropbox, Google Drive). This Github repository is not designed to host raw data, so any curation jobs in the "Completed" stage should be configured to ignore the raw data files and only track the Traveler and other small files (such as the master Excel template and any R code).

About

Collaborative curation of data into NanoMine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors