Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions doc/release-notes/11087-codemeta-block-improvement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
### CodeMeta v3.0

The experimental CodeMeta metadata block has been improved by:

* Adding subfields for size, unit and type to memoryRequirements and subfields for size and unit to storageRequirements of software to improve the machine actionability of these metadata fields and enable external tools like Jupyter Lab to run the software in an appropriate environment.
* Adding a new subfield InfoUrl to softwareSuggestions and softwareRequirements to distinguish between the download URL of a dependency (URL) and an information page of a dependency (InfoUrl).
* Adjusting the termURI of the contIntegration metadata field to the changes with CodeMeta v3.0.

Please note that existing metadata contents of the fields memoryRequirements and storageRequirements have to be manually migrated to the new subfields. The following SQL query can help you identify these fields:

`select dvo.identifier, dt.name as name, dfv.value as val from datasetfield as df, datasetfieldtype as dt, datasetfieldvalue as dfv, dvobject as dvo, datasetversion as dv where df.id = dfv.datasetfield_id and df.datasetfieldtype_id = dt.id and dvo.id = dv.dataset_id and df.datasetversion_id = dv.id and name IN ('memoryRequirements', 'storageRequirements');`

You can download the updated CodeMeta block from the [Experimental Metadata](https://dataverse-guide--11087.org.readthedocs.build/en/11087/user/appendix.html#experimental-metadata) section of the User Guide. See also #10859 and #11087.
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/user/appendix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Experimental Metadata

Unlike supported metadata, experimental metadata is not enabled by default in a new Dataverse installation. Feedback via any `channel <https://dataverse.org/contact>`_ is welcome!

- `CodeMeta Software Metadata <https://docs.google.com/spreadsheets/d/e/2PACX-1vTE-aSW0J7UQ0prYq8rP_P_AWVtqhyv46aJu9uPszpa9_UuOWRsyFjbWFDnCd7us7PSIpW7Qg2KwZ8v/pub>`__: based on the `CodeMeta Software Metadata Schema, version 2.0 <https://codemeta.github.io/terms/>`__ (`see .tsv version <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/codemeta.tsv>`__)
- `CodeMeta Software Metadata <https://docs.google.com/spreadsheets/d/e/2PACX-1vTE-aSW0J7UQ0prYq8rP_P_AWVtqhyv46aJu9uPszpa9_UuOWRsyFjbWFDnCd7us7PSIpW7Qg2KwZ8v/pub>`__: based on the `CodeMeta Software Metadata Schema, version 3.0 <https://codemeta.github.io/terms/>`__ (`see .tsv version <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/codemeta.tsv>`__)
- Computational Workflow Metadata (`see .tsv <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/computational_workflow.tsv>`__): adapted from `Bioschemas Computational Workflow Profile, version 1.0 <https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE>`__ and `Codemeta <https://codemeta.github.io/terms/>`__.
- Archival Metadata (`see .tsv <https://github.com/IQSS/dataverse/blob/master/scripts/api/data/metadatablocks/archival.tsv>`__): Enables repositories to register metadata relating to the potential archiving of the dataset at a depositor archive, whether that be your own institutional archive or an external archive, i.e. a historical archive.

Expand Down
45 changes: 32 additions & 13 deletions scripts/api/data/metadatablocks/codemeta.tsv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just some screenshots as of 28ff8ed for my reference or others interested in the PR. Overall, the fields look good to me.

displayOnCreate fields

Screenshot 2025-01-30 at 11-53-30 Add New Dataset - Root

all fields

Screenshot 2025-01-30 at 11-53-07 pyDataverse - Root

Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#metadataBlock name dataverseAlias displayName blockURI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just going to add this comment here at the top, but yes, a release note should be added. Please see https://guides.dataverse.org/en/6.5/developers/version-control.html#writing-release-note-snippets

@doigl I noticed you wrote this: "As it introduces new subfields for the metadata fields storageRequirements and memoryRequirements (that were simple fields and no compound fields before), existing metadata in these fields have to be migrated (manually?)."

Do you plan to provide an SQL upgrade script? If not, perhaps tell people they are on their own since the block is experimental? 🤔 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdurbin I added a release note, but so far not with an upgrade script. This SQL-statement
select dvo.identifier, dt.name as name, dfv.value as val from datasetfield as df, datasetfieldtype as dt, datasetfieldvalue as dfv, dvobject as dvo, datasetversion as dv where df.id = dfv.datasetfield_id and df.datasetfieldtype_id = dt.id and dvo.id = dv.dataset_id and df.datasetversion_id = dv.id and name IN ('memoryRequirements', 'storageRequirements');
identifies the datasets with values in memoryRequirements and storageRequirements and the following:
select upper(substring(dfv.value from '[kmgtpKMGTP][Bb]')) as unit, substring (dfv.value from '\d{1,4}') as numb_val, upper(substring(dfv.value from 'RAM|Ram|ram|GPU|Gpu|gpu|NPU|Npu|npu')) as ramtype, dvo.identifier, dt.name as name, dfv.value as val from datasetfield as df, datasetfieldtype as dt, datasetfieldvalue as dfv, dvobject as dvo, datasetversion as dv where df.id = dfv.datasetfield_id and df.datasetfieldtype_id = dt.id and dvo.id = dv.dataset_id and df.datasetversion_id = dv.id and name IN ('memoryRequirements', 'storageRequirements'); extracts the information for the subfields.

But I'm a bit hesistant (and perhaps just not experienced enough to really dare), to try to automatically generate the insert statements for the subfields. For our installation I would perhaps rather try to add theses by a script using the API.

What do you mean? Have you had similar changes in metadata blocks before and a good way how to handle this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand your hesitancy. I just made this PR to suggest some changes to the release note:

I like the idea of at least showing people which datasets are affected, so I copied that part of the SQL into the note. Actually, I just realized there are two. Maybe you can add that?

Copy link
Member

@pdurbin pdurbin Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@doigl I see you incorporated ideas form the PR 25 into 415be06 so I'll resolve this and move the PR into "ready for QA". Thanks!

codeMeta20 Software Metadata (CodeMeta v2.0) https://codemeta.github.io/terms/
codeMeta20 Software Metadata (CodeMeta v3.0) https://codemeta.github.io/terms/
#datasetField name title description watermark fieldType displayOrder displayFormat advancedSearchField allowControlledVocabulary allowmultiples facetable displayoncreate required parent metadatablock_id termURI
codeVersion Software Version Version of the software instance, usually following some convention like SemVer etc. e.g. 0.2.1 or 1.3 or 2021.1 etc text 0 #VALUE TRUE FALSE FALSE TRUE TRUE FALSE codeMeta20 https://schema.org/softwareVersion
developmentStatus Development Status Description of development status, e.g. work in progress (wip), active, etc. See repostatus.org for more information. text 1 <a href='https://www.repostatus.org/##VALUE'><img src='https://www.repostatus.org/badges/latest/#VALUE.svg' alt='#VALUE '/></a> TRUE TRUE FALSE TRUE FALSE FALSE codeMeta20 https://www.repostatus.org
Expand All @@ -12,20 +12,27 @@
targetProduct Target Product Target Operating System / Product to which the code applies. If applies to several versions, just the product name can be used. text 8 #VALUE TRUE FALSE TRUE TRUE FALSE FALSE codeMeta20 https://schema.org/targetProduct
buildInstructions Build Instructions Link to installation instructions/documentation e.g. https://github.com/user/project/blob/main/BUILD.md url 9 <a href="#VALUE" target="_blank" rel="noopener">#VALUE</a> FALSE FALSE TRUE FALSE FALSE FALSE codeMeta20 https://codemeta.github.io/terms/buildInstructions
softwareRequirementsItem Software Requirements Required software dependencies none 10 FALSE FALSE TRUE FALSE TRUE FALSE codeMeta20
softwareRequirements Name & Version Name and version of the required software/library dependency e.g. Pandas 1.4.3 text 0 #VALUE TRUE FALSE FALSE FALSE TRUE FALSE softwareRequirementsItem codeMeta20 https://schema.org/softwareRequirements
softwareRequirements Name & Version Name and version of the required software/library dependency e.g. Pandas 1.4.3 text 0 #VALUE TRUE FALSE FALSE FALSE TRUE TRUE softwareRequirementsItem codeMeta20 https://schema.org/softwareRequirements
softwareRequirementsInfoUrl Info URL Link to required software/library homepage or documentation (ideally also versioned) e.g. https://pandas.pydata.org/pandas-docs/version/1.4.3 url 1 <a href="#VALUE" target="_blank" rel="noopener">#VALUE</a> FALSE FALSE FALSE FALSE TRUE FALSE softwareRequirementsItem codeMeta20 https://dataverse.org/schema/codeMeta20/softwareRequirementsInfoUrl
softwareRequirementsUrl Download URL Link to required software/library https://... url 2 <a href="#VALUE" target="_blank" rel="noopener">#VALUE</a> FALSE FALSE FALSE FALSE FALSE FALSE softwareRequirementsItem codeMeta20
softwareSuggestionsItem Software Suggestions Optional dependencies, e.g. for optional features, code development, etc. none 11 FALSE FALSE TRUE FALSE FALSE FALSE codeMeta20
softwareSuggestions Name & Version Name and version of the optional software/library dependency e.g. Sphinx 5.0.2 text 0 #VALUE TRUE FALSE FALSE TRUE FALSE FALSE softwareSuggestionsItem codeMeta20 https://codemeta.github.io/terms/softwareSuggestions
softwareSuggestionsInfoUrl Info URL Link to optional software/library homepage or documentation (ideally also versioned) e.g. https://www.sphinx-doc.org url 1 <a href="#VALUE" target="_blank" rel="noopener">#VALUE</a> FALSE FALSE FALSE FALSE FALSE FALSE softwareSuggestionsItem codeMeta20 https://dataverse.org/schema/codeMeta20/softwareSuggestionsInfoUrl
memoryRequirements Memory Requirements Minimum memory requirements. text 12 #VALUE TRUE FALSE FALSE FALSE FALSE FALSE codeMeta20 https://schema.org/memoryRequirements
processorRequirements Processor Requirements Processor architecture or other CPU requirements to run the application (e.g. IA64). text 13 #VALUE TRUE FALSE TRUE FALSE FALSE FALSE codeMeta20 https://schema.org/processorRequirements
storageRequirements Storage Requirements Minimum storage requirements (e.g. free space required). text 14 #VALUE TRUE FALSE FALSE FALSE FALSE FALSE codeMeta20 https://schema.org/storageRequirements
permissions Permissions Permission(s) required to run the code (for example, a mobile app may require full internet access or may run only on wifi). text 15 #VALUE TRUE FALSE TRUE FALSE FALSE FALSE codeMeta20 https://schema.org/permissions
softwareHelp Software Help/Documentation Link to help texts or documentation e.g. https://user.github.io/project/docs url 16 <a href="#VALUE" target="_blank" rel="noopener">#VALUE</a> FALSE FALSE TRUE FALSE TRUE FALSE codeMeta20 https://schema.org/softwareHelp
readme Readme Link to the README of the project e.g. https://github.com/user/project/blob/main/README.md url 17 <a href="#VALUE" target="_blank" rel="noopener">#VALUE</a> FALSE FALSE FALSE FALSE FALSE FALSE codeMeta20 https://codemeta.github.io/terms/readme
releaseNotes Release Notes Link to release notes e.g. https://github.com/user/project/blob/main/docs/release-0.1.md url 18 <a href="#VALUE" target="_blank" rel="noopener">#VALUE</a> FALSE FALSE FALSE FALSE FALSE FALSE codeMeta20 https://schema.org/releaseNotes
contIntegration Continuous Integration Link to continuous integration service e.g. https://github.com/user/project/actions url 19 <a href="#VALUE" target="_blank" rel="noopener">#VALUE</a> FALSE FALSE TRUE FALSE FALSE FALSE codeMeta20 https://codemeta.github.io/terms/contIntegration
issueTracker Issue Tracker Link to software bug reporting or issue tracking system e.g. https://github.com/user/project/issues url 20 <a href="#VALUE" target="_blank" rel="noopener">#VALUE</a> FALSE FALSE FALSE FALSE FALSE FALSE codeMeta20 https://codemeta.github.io/terms/issueTracker
softwareSuggestions Name & Version Name and version of the optional software/library dependency e.g. Sphinx 5.0.2 text 0 #VALUE TRUE FALSE FALSE TRUE FALSE TRUE softwareSuggestionsItem codeMeta20
softwareSuggestionsInfoUrl Info URL Link to optional software/library homepage or documentation (ideally also versioned) e.g. https://www.sphinx-doc.org url 1 <a href="#VALUE" target="_blank" rel="noopener">#VALUE</a> FALSE FALSE FALSE FALSE FALSE FALSE softwareSuggestionsItem codeMeta20
softwareSuggestionsUrl Download URL Link to optional software/library https://... url 2 <a href="#VALUE" target="_blank" rel="noopener">#VALUE</a> FALSE FALSE FALSE FALSE FALSE FALSE softwareSuggestionsItem codeMeta20
memoryRequirements Memory Requirements Minimum memory requirements text 12 TRUE FALSE TRUE FALSE FALSE FALSE codeMeta20 https://schema.org/memoryRequirements
memoryRequirementsSize Memory Size Minimum memory requirements size int 13 #VALUE TRUE FALSE FALSE FALSE FALSE FALSE memoryRequirements codeMeta20
memoryRequirementsUnit Memory Unit Memory Unit (KB, MB, GB, TB) text 14 #VALUE TRUE TRUE FALSE FALSE FALSE FALSE memoryRequirements codeMeta20
memoryRequirementsType Memory Type Type of memory (GPU or RAM) text 15 (#VALUE) TRUE TRUE FALSE FALSE FALSE FALSE memoryRequirements codeMeta20
processorRequirements Processor Requirements Processor architecture or other CPU requirements to run the application (e.g. IA64). text 16 #VALUE TRUE FALSE TRUE FALSE FALSE FALSE codeMeta20 https://schema.org/processorRequirements
storageRequirements Storage Requirements Minimum storage requirements (e.g. free space required). text 17 TRUE FALSE FALSE FALSE FALSE FALSE codeMeta20 https://schema.org/storageRequirements
storageRequirementsSize Storage Size Minimum storage requirements size text 18 #VALUE TRUE FALSE FALSE FALSE FALSE FALSE storageRequirements codeMeta20
storageRequirementsUnit Storage Unit Storage Unit (MB, GB, TB) text 19 #VALUE TRUE TRUE FALSE FALSE FALSE FALSE storageRequirements codeMeta20
permissions Permissions Permission(s) required to run the code (for example, a mobile app may require full internet access or may run only on wifi). text 20 #VALUE TRUE FALSE TRUE FALSE FALSE FALSE codeMeta20 https://schema.org/permissions
softwareHelp Software Help/Documentation Link to help texts or documentation e.g. https://user.github.io/project/docs url 21 <a href="#VALUE" target="_blank" rel="noopener">#VALUE</a> FALSE FALSE TRUE FALSE TRUE FALSE codeMeta20 https://schema.org/softwareHelp
readme Readme Link to the README of the project e.g. https://github.com/user/project/blob/main/README.md url 22 <a href="#VALUE" target="_blank" rel="noopener">#VALUE</a> FALSE FALSE FALSE FALSE FALSE FALSE codeMeta20 https://codemeta.github.io/terms/readme
releaseNotes Release Notes Link to release notes e.g. https://github.com/user/project/blob/main/docs/release-0.1.md url 23 <a href="#VALUE" target="_blank" rel="noopener">#VALUE</a> FALSE FALSE FALSE FALSE FALSE FALSE codeMeta20 https://schema.org/releaseNotes
contIntegration Continuous Integration Link to continuous integration service e.g. https://github.com/user/project/actions url 24 <a href="#VALUE" target="_blank" rel="noopener">#VALUE</a> FALSE FALSE TRUE FALSE FALSE FALSE codeMeta20 https://codemeta.github.io/terms/continuousIntegration
issueTracker Issue Tracker Link to software bug reporting or issue tracking system e.g. https://github.com/user/project/issues url 25 <a href="#VALUE" target="_blank" rel="noopener">#VALUE</a> FALSE FALSE FALSE FALSE FALSE FALSE codeMeta20 https://codemeta.github.io/terms/issueTracker
#controlledVocabulary DatasetField Value identifier displayOrder
developmentStatus Concept concept 0
developmentStatus WIP wip 1
Expand All @@ -35,3 +42,15 @@
developmentStatus Moved moved 5
developmentStatus Suspended suspended 6
developmentStatus Abandoned abandoned 7
memoryRequirementsType RAM ram 1
memoryRequirementsType GPU gpu 2
memoryRequirementsType NPU npu 3
memoryRequirementsUnit KB kb 1
memoryRequirementsUnit MB mb 2
memoryRequirementsUnit GB gb 3
memoryRequirementsUnit TB tb 4
storageRequirementsUnit KB kb 1
storageRequirementsUnit MB mb 2
storageRequirementsUnit GB gb 3
storageRequirementsUnit TB tb 4
storageRequirementsUnit PB pb 5
39 changes: 36 additions & 3 deletions src/main/java/propertyFiles/codeMeta20.properties
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
metadatablock.name=codeMeta20
metadatablock.displayName=Software Metadata (CodeMeta v2.0)
metadatablock.displayName=Software Metadata (CodeMeta v3.0)
metadatablock.displayFacet=Software
datasetfieldtype.codeVersion.title=Software Version
datasetfieldtype.codeVersion.description=Version of the software instance, usually following some convention like SemVer etc.
datasetfieldtype.codeVersion.watermark=e.g. 0.2.1 or 1.3 or 2021.1 etc
datasetfieldtype.developmentStatus.title=Development Status
datasetfieldtype.developmentStatus.description=Description of development status, e.g. work in progress (wip), active, etc. See repostatus.org for more information.
datasetfieldtype.developmentStatus.watermark= Development Status
datasetfieldtype.developmentStatus.watermark=
datasetfieldtype.codeRepository.title=Code Repository
datasetfieldtype.codeRepository.description=Link to the repository where the un-compiled, human-readable code and related code is located (SVN, GitHub, CodePlex, institutional GitLab instance, Gitea, etc.).
datasetfieldtype.codeRepository.watermark=e.g. https://github.com/user/project
Expand Down Expand Up @@ -40,6 +40,9 @@ datasetfieldtype.softwareRequirements.watermark=e.g. Pandas 1.4.3
datasetfieldtype.softwareRequirementsInfoUrl.title=Info URL
datasetfieldtype.softwareRequirementsInfoUrl.description=Link to required software/library homepage or documentation (ideally also versioned)
datasetfieldtype.softwareRequirementsInfoUrl.watermark=e.g. https://pandas.pydata.org/pandas-docs/version/1.4.3
datasetfieldtype.softwareRequirementsUrl.title=Download URL
datasetfieldtype.softwareRequirementsUrl.description=Link to required software/library
datasetfieldtype.softwareRequirementsUrl.watermark=https://...
datasetfieldtype.softwareSuggestionsItem.title=Software Suggestions
datasetfieldtype.softwareSuggestionsItem.description=Optional dependencies, e.g. for optional features, code development, etc.
datasetfieldtype.softwareSuggestionsItem.watermark=
Expand All @@ -49,15 +52,33 @@ datasetfieldtype.softwareSuggestions.watermark=e.g. Sphinx 5.0.2
datasetfieldtype.softwareSuggestionsInfoUrl.title=Info URL
datasetfieldtype.softwareSuggestionsInfoUrl.description=Link to optional software/library homepage or documentation (ideally also versioned)
datasetfieldtype.softwareSuggestionsInfoUrl.watermark=e.g. https://www.sphinx-doc.org
datasetfieldtype.softwareSuggestionsUrl.title=Download URL
datasetfieldtype.softwareSuggestionsUrl.description=Link to optional software/library
datasetfieldtype.softwareSuggestionsUrl.watermark=https://...
datasetfieldtype.memoryRequirements.title=Memory Requirements
datasetfieldtype.memoryRequirements.description=Minimum memory requirements.
datasetfieldtype.memoryRequirements.description=Minimum memory requirements
datasetfieldtype.memoryRequirements.watermark=
datasetfieldtype.memoryRequirementsSize.title=Memory Size
datasetfieldtype.memoryRequirementsSize.description=Minimum memory requirements size
datasetfieldtype.memoryRequirementsSize.watermark=
datasetfieldtype.memoryRequirementsUnit.title=Memory Unit
datasetfieldtype.memoryRequirementsUnit.description=Memory Unit (KB, MB, GB, TB)
datasetfieldtype.memoryRequirementsUnit.watermark=
datasetfieldtype.memoryRequirementsType.title=Memory Type
datasetfieldtype.memoryRequirementsType.description=Type of memory (GPU or RAM)
datasetfieldtype.memoryRequirementsType.watermark=
datasetfieldtype.processorRequirements.title=Processor Requirements
datasetfieldtype.processorRequirements.description=Processor architecture or other CPU requirements to run the application (e.g. IA64).
datasetfieldtype.processorRequirements.watermark=
datasetfieldtype.storageRequirements.title=Storage Requirements
datasetfieldtype.storageRequirements.description=Minimum storage requirements (e.g. free space required).
datasetfieldtype.storageRequirements.watermark=
datasetfieldtype.storageRequirementsSize.title=Storage Size
datasetfieldtype.storageRequirementsSize.description=Minimum storage requirements size
datasetfieldtype.storageRequirementsSize.watermark=
datasetfieldtype.storageRequirementsUnit.title=Storage Unit
datasetfieldtype.storageRequirementsUnit.description=Storage Unit (MB, GB, TB)
datasetfieldtype.storageRequirementsUnit.watermark=
datasetfieldtype.permissions.title=Permissions
datasetfieldtype.permissions.description=Permission(s) required to run the code (for example, a mobile app may require full internet access or may run only on wifi).
datasetfieldtype.permissions.watermark=
Expand All @@ -84,3 +105,15 @@ controlledvocabulary.developmentStatus.unsupported=Unsupported
controlledvocabulary.developmentStatus.moved=Moved
controlledvocabulary.developmentStatus.suspended=Suspended
controlledvocabulary.developmentStatus.abandoned=Abandoned
controlledvocabulary.memoryRequirementsType.ram=RAM
controlledvocabulary.memoryRequirementsType.gpu=GPU
controlledvocabulary.memoryRequirementsType.npu=NPU
controlledvocabulary.memoryRequirementsUnit.kb=KB
controlledvocabulary.memoryRequirementsUnit.mb=MB
controlledvocabulary.memoryRequirementsUnit.gb=GB
controlledvocabulary.memoryRequirementsUnit.tb=TB
controlledvocabulary.storageRequirementsUnit.kb=KB
controlledvocabulary.storageRequirementsUnit.mb=MB
controlledvocabulary.storageRequirementsUnit.gb=GB
controlledvocabulary.storageRequirementsUnit.tb=TB
controlledvocabulary.storageRequirementsUnit.pb=PB