Skip to content

Spike: Inventory and prioritize all existing Harvesting related issues #24

@mreekie

Description

@mreekie

This is in support of:

The first step is to figure out what has already been done by the dataverse team and by the community towards this aim and what still remains to be done.

For example:

And then to prioritize which issues are to be fixed.

Def of done

As completely as is reasonably possible in a 2 week period (sprint):

  • Search out previous related issues that are problems with the current implementation. Take an inventory.
  • Search out previous work done within the dataverse community as well.
  • prioritize which of the issues/PRs that should be moved forward.

We need to keep in mind that to harvest something from a particular source requires that that source be bug free. Identify which sources have which bugs so that bugs for a particular source can be targeted. for example: ICPSR as an example. Zenodo is another.

More information:

There is a lot packaged into Aim 4

  1. Improved Harvesting via the OAI-PMH standard
  2. Improved support for Bagit
  3. Improved support for Signposting

The scope for this issue is Harvesting via the OAI-PMH standard

Aim 4:

Improve harvesting and packaging standards to share metadata and data across repositories

Our proposed project will significantly improve the widely-used Harvard Dataverse repository to better support NIH-funded research.

A critical measure of the GREI program’s success is to standardize the discoverability across generalist repositories.

To help with this, **we propose to improve the existing harvesting functionality in the Dataverse software based on the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) standard, and coordinate with other repository packaging standards to share or move metadata and data. **

Dataverse already supports the Bags as defined by the Research Data Alliance (RDA) Research Data Repository Interoperability Working Group. Here we proposed to improve the support for Bags, test it for NIH-funded datasets, and explore and define the appropriate standard to use to move the metadata and data across generalist repositories. This will help with a sustainable and succession plan - if one repository cannot support anymore a specific dataset, it will allow to easily move the dataset to another repository without losing any information about the dataset.

Additionally we propose to implement Signposting in the Dataverse software. By adding additional http link headers throughout the application, we can more easily support automated metadata and data discovery in the repository, and allow for other applications and services to more accurately and completely represent the content in the Harvard Dataverse repository.

Related documents

Metadata

Metadata

Assignees

Labels

pm.GREI-d-1.4.1NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issuespm.GREI-d-1.4.2NIH, yr1, aim4, task2: Create working group on packaging standards

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions