Skip to content

Publish Dataset - Fails when metadata contains HTML entities w/special characters such as   #3328

@rmo-cdsp

Description

@rmo-cdsp

I added datasets in a Dataverse using Datacite for DOI's. Those datasets were added with python scripts, using the dataverse python api to first create a "simple" (only the required elements) dataset, then update its metadatas with a json (made by extracting datas from xlsx documents).

The problem here seems to come when I try to publish the dataset. The publish fails and there is the error message : "Error – This dataset may not be published because the DataCite Service is currently inaccessible. Please try again. Does the issue continue to persist? Please contact Dataverse Support for assistance. "

Logs are here :dataverse_event_published_error_log.txt

We can see a message : [xml] xml error: The entity "nbsp" was referenced, but not declared.]
So I tried to use the datacite api to try with "custom" xml files, and there was something: if I had a "&nbsp" element, I receive the same error. When I remove it, everything is fine. I tried by adding an entity element ( ]>), (it's a &# and 160 between the "") and the nbsp was replaced and everything worked like a charm. Except that it's by using the api. The problem here is that dataverse doesn't handle nbsp thing (coming from my imports maybe).

Here is a json example asked for doing test on a test server using datacite :
json.txt

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions