Skip to content

Can't harvest when Dublin core field language is set #8139

@tcoupin

Description

@tcoupin

I try to harvest a record on an oaipmh server. This record is format in oai_dc schema and has the field language set to fr value (oai_dc specifies that language must be an ISO 639-1 code, 2 letters).

<OAI-PMH xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
	<responseDate>
		2021-10-12T15:14:19+00:00
	</responseDate>
	<request verb="GetRecord" identifier="https://doi.org/10.23708/herbier-guyane-ird" metadataPrefix="oai_dc">
		http://doi2pmh.ird.fr/oai/
	</request>
	<GetRecord>
		<record>
			<header>
				<identifier>
					https://doi.org/10.23708/herbier-guyane-ird
				</identifier>
				<datestamp>
					2021-10-12T20:21:00+00:00
				</datestamp>
				<setSpec>
					Doi2Pmh
				</setSpec>
				<setSpec>
					UMR-AMAP
				</setSpec>
			</header>
			<metadata>
				<dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
					<identifier>
						https://doi.org/10.23708/herbier-guyane-ird
					</identifier>
					<publisher>
						UMR AMAP. CIRAD, CNRS, INRAE, IRD, Univ. Montpellier (France)
					</publisher>
					<title>
						L'herbier IRD de Guyane
					</title>
					<creator>
						Gonzalez, Sophie
					</creator>
					<creator>
						Bilot-Guérin, Véronique
					</creator>
					<creator>
						Delprete, Piero
					</creator>
					<creator>
						Geniez, Chantal
					</creator>
					<creator>
						Molino, Jean-François
					</creator>
					<creator>
						Smock, Jean-Louis
					</creator>
					<creator>
						Théveny, Frédéric
					</creator>
					<creator>
						IRD
					</creator>
					<creator>
						CIRAD
					</creator>
					<creator>
						INRAE
					</creator>
					<creator>
						Université de Montpellier
					</creator>
					<creator>
						Herbier de Guyane, Cayenne, Guyane française
					</creator>
					<creator>
						CNRS
					</creator>
					<description>
						L’Herbier IRD de Guyane (CAY), joue un rôle central dans l’acquisition et la diffusion des connaissances sur la flore de la Guyane française, et plus largement du Bouclier Guyanais et de l'Amazonie. Il a été créé en 1965 par R.A.A. Oldeman, et abrite aujourd’hui près de 200 000 spécimens collectés pour la plupart en Guyane française, mais aussi au Surinam, au Guyana, au Brésil (notamment dans l’État de l’Amapá) et au Vénézuela (État d'Amazonas).
					</description>
					<subject>
						FOS: Biological sciences
					</subject>
					<language>
						fr
					</language>
					<type>
						article
					</type>
				</dc>
			</metadata>
		</record>
	</GetRecord>
</OAI-PMH>

But the harvest is failling with the following error:

Exception processing getRecord(), oaiUrl=https://doi2pmh.ird.fr/oai/, identifier=https://doi.org/10.23708/h
erbier-guyane-ird, edu.harvard.iq.dataverse.api.imports.ImportException, Failed to import harvested dataset: class edu
.harvard.iq.dataverse.util.json.ControlledVocabularyException (Value 'fr' does not exist in type 'language')

Language is a controlled vocabulary field and values are human readable: see https://github.com/IQSS/dataverse/blob/develop/scripts/api/data/metadatablocks/citation.tsv#L186

I think that the controlled vocabulary must refer to ISO 639-1 codes and human readable display value must be set with translation files.

Removing language field from record fix the harvesting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Feature: HarvestingNIH OTA DCGrant: The Harvard Dataverse repository: A generalist repository integrated with a Data CommonsNIH OTA: 1.4.14 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ...pm.GREI-d-1.4.1NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issuespm.epic.nih_harvesting

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions