NanoMine Curation

This repository is dedicated to the curation of published data from the polymer nanocomposites literature into a structured format for NanoMine.

Links and Resources

Data Science Productivity Tools: A free-to-audit course on edX covering RStudio, Git/Github, and Unix/Linux.
WebPlotDigitizer: Tool for semi-automated extraction of raw data from plots.
NanoMine Schema: XML Schema Definition (.xsd) files containing the full list of terms (refer to most recent schema; filenames contain date in MMDDYY format)
NanoMine Excel Template: Microsoft Excel workbook into which data for a given sample are assembled
Tidy Data: Principles for data tidiness laid out in R for Data Science by Hadley Wickham

Supplementary links related to the Knowledge Graph side of NanoMine

Tetherless World repo for NanoMine: Ontology and Knowledge Graph approach to data storage (see nanomine.ttl), using the RDF data model
RDF Primer: W3C introduction to the RDF data model
NanoMine SPARQL Endpoint: Direct querying of RDF data in the NanoMine Knowledge Graph, using the SPARQL query language
Semantic Data Dictionaries: Developed by our collaborators at RPI, a specification and method for mapping tabular data into RDF format
NanoMine Ontology spreadsheet: Google Sheet used to collaboratively develop the NanoMine ontology

File and Folder Organization

Within a curation job's sub-directory, the file organization depends on what makes the most sense for the curator. However, the sub-directory should contain a "Traveler" (README.md) with the DOI and other information relevant to the curation process. Because data in NanoMine are uploaded on a per-sample basis, it is suggested to give each "sample" its own child sub-directory containing the completed Excel template along with any supplemental data files (.csv, .jpg, etc.).

Curation Workflow

There are five directories in this repository that can be considered "stages" of the curation process:

Wishlist

At the "Wishlist" stage, a curation job is prepared by creating a sub-directory, initializing a "Traveler" (README.md file) in the sub-directory, and identifying figures/data of interest. Once the raw data have been retrieved (either provided by the original authors or through a digital extraction tool),

In-Progress

The "In-Progress" stage should be kept as uncluttered as possible, with only those curation jobs that are actively in progress. Curation jobs should spend as little time as reasonable in this directory and should be moved to either "Completed" or "Stalled."

Completed

The "Completed" stage is designed to keep track of curation jobs that have already been uploaded to NanoMine QA. This is ideally the final location for the curation job, unless there are modifications or updates to make in which case the sub-directory should be moved to the "Revisited" directory.

Stalled

If a significant roadblock is encountered, the curation job can be moved to the "Stalled" stage. Documenting the issue as clearly as possible will help the team make the necessary improvements or updates to the system. Once a solution has been identified, the job can be returned to the "In-Progress" directory.

Revisited

If a curation job in the NanoMine system ("Completed") requires some revision, the sub-directory should be moved to the "Revision" stage. At this point, the issue should be clearly described before moving the curation job to the "In-Progress" stage.

The overall workflow is illustrated in the diagram below.

To collaboratively manage and keep track of changes to curation-related files, the git workflow will be adopted. Raw data tables and the code used to prepare the raw data should be included in a shared repository (e.g. Dropbox, Google Drive). This Github repository is not designed to host raw data, so any curation jobs in the "Completed" stage should be configured to ignore the raw data files and only track the Traveler and other small files (such as the master Excel template and any R code).

Name		Name	Last commit message	Last commit date
Latest commit History 444 Commits
completed		completed
in-progress		in-progress
revisited		revisited
stalled		stalled
weibull		weibull
wishlist		wishlist
www		www
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NanoMine Curation

Links and Resources

Supplementary links related to the Knowledge Graph side of NanoMine

File and Folder Organization

Curation Workflow

Wishlist

In-Progress

Completed

Stalled

Revisited

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NanoMine Curation

Links and Resources

Supplementary links related to the Knowledge Graph side of NanoMine

File and Folder Organization

Curation Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages