Update the codemeta metadata block to add some more structure for machine actionability#11087
Update the codemeta metadata block to add some more structure for machine actionability#11087doigl wants to merge 10 commits intoIQSS:developfrom
Conversation
…d storageRequirements, added new subfield for softwareRequirements and softwareSuggestions to distinguish between InfoUrl (documentation page) and URL (DownloadURL) for dependencies, adjusted termURI of contIntegration to codemeta v3.0 in codemeta.tsv
| @@ -1,5 +1,5 @@ | |||
| #metadataBlock name dataverseAlias displayName blockURI | |||
| codeMeta20 Software Metadata (CodeMeta v2.0) https://codemeta.github.io/terms/ | |||
| codeMeta20 Software Metadata (CodeMeta 2.0) https://codemeta.github.io/terms/ | |||
There was a problem hiding this comment.
I'm confused. Should this be 3.0 instead of 2.0? Or are we not there yet?
Also, is it important to remove the "v"? In c5a6a8f @poikilotherm added a "v" to src/main/java/propertyFiles/codeMeta20.properties.
There was a problem hiding this comment.
Is addressed, thanks for pointing out. The name of the metadatablock is still codeMeta20, but the displayName is now "Software Metadata (CodeMeta v3.0)"
There was a problem hiding this comment.
Thanks for fixing the display name.
But what about the name? Is the plan to stick with "codeMeta20" forever? Or will we someday switch to "codeMeta30" or "codeMeta40"? Should we consider shortening it to just "codeMeta" (or "codemeta")?
There was a problem hiding this comment.
I would be in favour of renaming the block to codemeta, but then, we really need detailed upgrade instructions to avoid a dublication of the metadatablock
@poikilotherm what was the original reason to include the version into the block name?
There was a problem hiding this comment.
These are just some screenshots as of 28ff8ed for my reference or others interested in the PR. Overall, the fields look good to me.
displayOnCreate fields
all fields
| @@ -1,5 +1,5 @@ | |||
| #metadataBlock name dataverseAlias displayName blockURI | |||
There was a problem hiding this comment.
I'm just going to add this comment here at the top, but yes, a release note should be added. Please see https://guides.dataverse.org/en/6.5/developers/version-control.html#writing-release-note-snippets
@doigl I noticed you wrote this: "As it introduces new subfields for the metadata fields storageRequirements and memoryRequirements (that were simple fields and no compound fields before), existing metadata in these fields have to be migrated (manually?)."
Do you plan to provide an SQL upgrade script? If not, perhaps tell people they are on their own since the block is experimental? 🤔 🤷
There was a problem hiding this comment.
@pdurbin I added a release note, but so far not with an upgrade script. This SQL-statement
select dvo.identifier, dt.name as name, dfv.value as val from datasetfield as df, datasetfieldtype as dt, datasetfieldvalue as dfv, dvobject as dvo, datasetversion as dv where df.id = dfv.datasetfield_id and df.datasetfieldtype_id = dt.id and dvo.id = dv.dataset_id and df.datasetversion_id = dv.id and name IN ('memoryRequirements', 'storageRequirements');
identifies the datasets with values in memoryRequirements and storageRequirements and the following:
select upper(substring(dfv.value from '[kmgtpKMGTP][Bb]')) as unit, substring (dfv.value from '\d{1,4}') as numb_val, upper(substring(dfv.value from 'RAM|Ram|ram|GPU|Gpu|gpu|NPU|Npu|npu')) as ramtype, dvo.identifier, dt.name as name, dfv.value as val from datasetfield as df, datasetfieldtype as dt, datasetfieldvalue as dfv, dvobject as dvo, datasetversion as dv where df.id = dfv.datasetfield_id and df.datasetfieldtype_id = dt.id and dvo.id = dv.dataset_id and df.datasetversion_id = dv.id and name IN ('memoryRequirements', 'storageRequirements'); extracts the information for the subfields.
But I'm a bit hesistant (and perhaps just not experienced enough to really dare), to try to automatically generate the insert statements for the subfields. For our installation I would perhaps rather try to add theses by a script using the API.
What do you mean? Have you had similar changes in metadata blocks before and a good way how to handle this?
There was a problem hiding this comment.
I can understand your hesitancy. I just made this PR to suggest some changes to the release note:
I like the idea of at least showing people which datasets are affected, so I copied that part of the SQL into the note. Actually, I just realized there are two. Maybe you can add that?
…erse into codemeta_structure
| @@ -1,5 +1,5 @@ | |||
| #metadataBlock name dataverseAlias displayName blockURI | |||
There was a problem hiding this comment.
I can understand your hesitancy. I just made this PR to suggest some changes to the release note:
I like the idea of at least showing people which datasets are affected, so I copied that part of the SQL into the note. Actually, I just realized there are two. Maybe you can add that?
…dded improvments from PR#25
pdurbin
left a comment
There was a problem hiding this comment.
I haven't done much testing but I think this is ready for QA. Approved.
|
@doigl this got bumped to 6.7. Sorry. |
|
looks like continuous-integration is failing on this. Please advise. |
|
@doigl can you please merge the latest from develop into this branch. I tried but I don't have permissions: Thanks! |
|
@doigl thank for merging the latest! ❤️ |
|
Just a few comments:
We should talk someplace else about indexFormat/displayFormat, there was an issue from @doigl about that |
Sounds fine to me. But I'm in a mindset that people aren't using the 2.0 version (since it's experimental). Or would it be better to remove the number entirely? 🤔 Should we pull this out of QA? There's no rush to get this merged.
@doigl said she was hesitant at #11087 (comment) which I can certainly understand as a non-expert SQL hacker myself. 😅 |
@poikilotherm
Do you know how or can give me some hint where this is documented?
You mean #7856 ? |
|
FWIW: Field names are globally unique and, if you reuse those names in a new block, you will just shift them into the new block. |
|
I spoke with @poikilotherm about this yesterday and he'd like to help @doigl write an SQL (Flyway) update script. He's a busy at the moment but hope to get to it sometime soon. As we'll probably want to review the script I'll move this PR back to "in progress". |
|
@doigl @poikilotherm with 6.7 rapidly approaching I removed that milestone from this PR since I doubt we'll make it. Also, I have a topic going in Zulip if you want to talk there: https://dataverse.zulipchat.com/#narrow/channel/375707-community/topic/CodeMeta/near/520695208 In related news, for the geospatial block, we're planning on using one PR to advertise an updated block and another PR to actually change it. See #11507 (comment) |


What this PR does / why we need it:
Actually, the fields MemoryRequirements and ProcessorRequirements and StorageRequirements are just free text fields, what makes it difficult to use them in an automated process to provide the right resources for running a jupyter notebook or a container. Adding subfields to these fields with controlled vocabularies would it make it easier to differentiate between different types and identify the right amount of resources like memory.
In addition to these changes, this pull request also adds new subfields for softwareRequirements and softwareSuggestions to distinguish between InfoUrl (documentation page) and URL (DownloadURL) for dependencies, and adjusts the termURI of contIntegration to contiousIntegration (codemeta v3.0)
Which issue(s) this PR closes:
Special notes for your reviewer:
This pull requests only changes codemeta.tsv file. As it introduces new subfields for the metadata fields storageRequirements and memoryRequirements (that were simple fields and no compound fields before), existing metadata in these fields have to be migrated (manually?).
Suggestions on how to test this:
Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Metadata form:
Rendered metadata:

Is there a release notes update needed for this change?:
Yes, this should be mentioned in the release notes- if applied - together with a description how to migrate existing metadata in the changed fields.
Additional documentation:
…d storageRequirements, added new subfield for softwareRequirements and softwareSuggestions to distinguish between InfoUrl (documentation page) and URL (DownloadURL) for dependencies, adjusted termURI of contIntegration to codemeta v3.0 in codemeta.tsv
What this PR does / why we need it:
Which issue(s) this PR closes:
Special notes for your reviewer:
Suggestions on how to test this:
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: