Spike: Inventory and prioritize all existing Harvesting related issues

This is in support of:
- an NIH grant ["The Harvard Dataverse repository: A generalist repository integrated with a Data Commons"](https://docs.google.com/document/d/1cFK8pdwMKIRZxNs8EZXL1cXmW0ho6Rx9ulCewJ1ZCFI/edit?usp=sharing), 
  - Aim 4: **Improve harvesting and packaging standards to share metadata and data across repositories**, 

The first step is to figure out what has already been done by the dataverse team and by the community towards this aim and what still remains to be done. 

For example:
- [8139 Can't harvest when Dublin core field language is set](https://github.com/IQSS/dataverse/issues/8139#)

And then to prioritize which issues are to be fixed.

### Def of done
As completely as is reasonably possible in a 2 week period (sprint):
- [ ] Search out previous related issues that are problems with the current implementation. Take an inventory.
- [ ] Search out previous work done within the dataverse community as well.
- [ ] prioritize which of the issues/PRs that should be moved forward.

We need to keep in mind that to harvest something from a particular source requires that that source be bug free. Identify which sources have which bugs so that bugs for a particular source can be targeted. for example: ICPSR as an example. Zenodo is another.


### More information:

There is a lot packaged into Aim 4
 1. Improved Harvesting via the  OAI-PMH standard
 1. Improved support for Bagit
 1. Improved support for Signposting

The scope for this issue is Harvesting via the  OAI-PMH standard

### Aim 4: 

**Improve harvesting and packaging standards to share metadata and data across repositories**

Our proposed project will significantly improve the widely-used Harvard Dataverse repository to better support NIH-funded research. 

**A critical measure of the [GREI program](https://www.iq.harvard.edu/news/dataverse-joins-nih-data-repository-initiative)’s success is to standardize the discoverability across generalist repositories**.

To help with this, **we propose to improve the existing harvesting functionality in the Dataverse software based on the Open Archives Initiative Protocol for Metadata Harvesting **(OAI-PMH) standard**, and coordinate with other repository packaging standards to share or move metadata and data. **

Dataverse already supports the Bags as defined by the Research Data Alliance (RDA) Research Data Repository Interoperability Working Group. Here we proposed to improve the support for **Bags**, test it for NIH-funded datasets, and explore and define the appropriate standard to use to move the metadata and data across generalist repositories. This will help with a sustainable and succession plan - if one repository cannot support anymore a specific dataset, it will allow to easily move the dataset to another repository without losing any information about the dataset. 

Additionally we propose to implement **Signposting** in the Dataverse software.  By adding additional http link headers throughout the application, we can more easily support automated metadata and data discovery in the repository, and allow for other applications and services to more accurately and completely represent the content in the Harvard Dataverse repository. 



### Related documents
- [Notes on Dataverse Deliverablas for NIH OTA](https://docs.google.com/document/d/1N9xgubVcHb2mQxnCrmusa0M7qVGiVvyvmODc4j4HvQ8/edit?usp=sharing)
- [NIH OTA Progress Notes](https://docs.google.com/document/d/1k0XLOqYGCbV1O4eqUtOz67Hk4kXqhv4dKg_juabNVw0/edit?usp=sharing)
- [NIH OTA](https://docs.google.com/document/d/1cFK8pdwMKIRZxNs8EZXL1cXmW0ho6Rx9ulCewJ1ZCFI/edit?usp=sharing)
- [Exposing and harvesting metadata using the OAI metadata harvesting protocol: A tutoria (2001)](https://arxiv.org/pdf/cs/0106057.pdf)
- [Getting Started with BagIt in 2018](https://patchbay.tech/blog/2018/03/14/getting-started-with-bagit-in-2018/)
- [NIH OTA](https://docs.google.com/document/d/1cFK8pdwMKIRZxNs8EZXL1cXmW0ho6Rx9ulCewJ1ZCFI/edit?usp=sharing)
- [bagit from Library of Congress video](https://www.youtube.com/watch?time_continue=11&v=l3p3ao_JSfo&feature=emb_logo)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spike: Inventory and prioritize all existing Harvesting related issues #24

Def of done

More information:

Aim 4:

Related documents

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Spike: Inventory and prioritize all existing Harvesting related issues #24

Description

Def of done

More information:

Aim 4:

Related documents

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions