From 2c64bada2d75d9acdaee7b0413dcc21f52122ffd Mon Sep 17 00:00:00 2001 From: qqmyers Date: Thu, 4 Aug 2022 15:22:29 -0400 Subject: [PATCH 01/10] release notes related to HDC 1 and 3A/3B --- .../8611-DataCommons-related-notes.md | 77 +++++++++++++++++++ 1 file changed, 77 insertions(+) create mode 100644 doc/release-notes/8611-DataCommons-related-notes.md diff --git a/doc/release-notes/8611-DataCommons-related-notes.md b/doc/release-notes/8611-DataCommons-related-notes.md new file mode 100644 index 00000000000..73d9b8b5db4 --- /dev/null +++ b/doc/release-notes/8611-DataCommons-related-notes.md @@ -0,0 +1,77 @@ +# Dataverse Software 5.12 + +This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project. + +## Release Highlights + +### Harvard Data Commons Additions + +As reported at the 2022 Dataverse Community Meeting, the Harvard Data Commons project has supported a wide range of additions to the Dataverse software that improve support for Big Data, Workflows, Archiving, and Interaction with other repositories. In many cases, these additions build upon features developed within the Dataverse Community = by Borealis, DANS, QDR, and TDL and others. Highlights from this work include: + +- Initial support for Globus file transfer to upload/download to/from a Dataverse managed S3 store. The current implementation disables file restriction and embargo on Globus-enabled stores. +- Initial support for Remote File Storage. This capability, enabled via a new RemoteOverlay store type, allows a file stored in a remote system to be added to a dataset (currently only via API) with download requests redirected to the remote system. Use cases include referencing public files hosted on external web servers as well as support for controlled access managed by Dataverse (e.g. via restricted and embargoed status) and/or by the remote store. +- Workflow (add Aday's notes here or reword to separate the Objective 2 work) +- Support for Archiving to any S3 store using Dataverse's RDA-conformant BagIT file format (a BagPack). +- Improved error handling and performance in archival bag creation and new options such as only supporting archiving of one dataset version. +- Additions to the OAI-ORE metadata format (which is included in archival bags) such as the inclusion of metadata about the parent collection(s) of an archived dataset version and use of the URL form of PIDs. +- Display of archival status within the dataset page versions table, richer status options including success, pending, and failure states, with a complete API for managing archival status. +- Support for batch archiving via API as an alternative to the current options of configuring archiving upon publication or archiving each dataset version manually. +- Initial support for sending and receiving Linked Data Notification messages indicating relationships between a dataset and external resources (e.g. papers or other dataset) that can be used to trigger additional actions, such as the creation of a back-link to provide, for example, bi-dreictional linking between a published paper and a Dataverse dataset. + + + +## Major Use Cases and Infrastructure Enhancements + +Changes and fixes in this release include: + +- Administrators can configure an S3 store used in Dataverse to support users uploading/downloading files via Globus File Transfer (PR #8891) +- Administrators can configure a RemoteOverlay store to allow files that remain hosted by a remote system to be added to a dataset. (PR #7325) +- Administrators can configure dataverse to send archival Bag copies of published dataset versions to any S3-compatible service. (PR #8751) +- Users can see information about a dataset's parent collection(s) in the OAI-ORE metadata export. (PR #8770) +- Archiving via RDA-conformant Bags is more robust and is more configurable (PR #8773, 8747, 8699, 8609, 8606, 8610) +- Users and administrators can see the archival status of the versions of the datasets they manage in the dataset page version table (PR #8748, #8696) +- Administrators can configure messaging between Dataverse and other repositories that may hold related resources or services intersted in activity within Dataverse (PR #8775) + +## Notes for Dataverse Installation Administrators + +### Enabling experimental capabilities + +Several of the capabilities introduced in v5.12 are 'experimental' in the sense that further changes and enhancements to these capabilities should be expected and that these changes may involve additional work, for those who use the initial implementations, when upgrading to newer versions of Dataverse. Administrators wishing to use them are encouraged to stay in touch, e.g. via the Dataverse Community Slack space, to understand the limits of current capabilties and to plan for future upgrades. + +## New JVM Options and DB Settings + +The following DB settings have been added: + +- `:LDNMessageHosts` +- `: BasicGlobusToken` +- `:GlobusEndpoint` +- `:GlobusStores` +- `:GlobusAppUrl` +- `:S3ArchiverConfig` +- `:S3ArchiverProfile` +- `:DRSArchivalConfig` + +See the [Database Settings](https://guides.dataverse.org/en/5.11/installation/config.html#database-settings) section of the Guides for more information. + +## Notes for Developers and Integrators + +See the "Backward Incompatibilities" section below. + +## Backward Incompatibilities + +### OAI-ORE and Archival Bag Changes + +It is likely that capabilities in development (i.e. as part of the (Dataverse Uplaoder)[https://github/org/GlobalDataverseCommunityConsortium/dataverse-uploader]) to allow re-creation of a dataset version from an archival bag will only be fully compatible with archival bags generated by a Dataverse instance at a release >v5.12. Administrators should be aware that re-creating archival bags, i.e. via the new batch archiving API, will be recommended at some point in the future. + +## Complete List of Changes + + + +## Installation + +If this is a new installation, please see our [Installation Guide](https://guides.dataverse.org/en/5.11/installation/). Please also contact us to get added to the [Dataverse Project Map](https://guides.dataverse.org/en/5.11/installation/config.html#putting-your-dataverse-installation-on-the-map-at-dataverse-org) if you have not done so already. + +## Upgrade Instructions + +8\. Re-export metadata files (OAI_ORE is affected by the PRs in this release note file) + From 68706cd3b7f12b9c8cfb469a0c81bee34c3229ae Mon Sep 17 00:00:00 2001 From: qqmyers Date: Sat, 6 Aug 2022 13:49:11 -0400 Subject: [PATCH 02/10] update for 8901 --- doc/release-notes/8611-DataCommons-related-notes.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/doc/release-notes/8611-DataCommons-related-notes.md b/doc/release-notes/8611-DataCommons-related-notes.md index 73d9b8b5db4..fb324ddb51b 100644 --- a/doc/release-notes/8611-DataCommons-related-notes.md +++ b/doc/release-notes/8611-DataCommons-related-notes.md @@ -13,7 +13,7 @@ As reported at the 2022 Dataverse Community Meeting, the Harvard Data Commons pr - Workflow (add Aday's notes here or reword to separate the Objective 2 work) - Support for Archiving to any S3 store using Dataverse's RDA-conformant BagIT file format (a BagPack). - Improved error handling and performance in archival bag creation and new options such as only supporting archiving of one dataset version. -- Additions to the OAI-ORE metadata format (which is included in archival bags) such as the inclusion of metadata about the parent collection(s) of an archived dataset version and use of the URL form of PIDs. +- Additions/corrections to the OAI-ORE metadata format (which is included in archival bags) such as referenciong the name/mimeType/size/checksum/download URL or the original file for ingested files, the inclusion of metadata about the parent collection(s) of an archived dataset version and use of the URL form of PIDs. - Display of archival status within the dataset page versions table, richer status options including success, pending, and failure states, with a complete API for managing archival status. - Support for batch archiving via API as an alternative to the current options of configuring archiving upon publication or archiving each dataset version manually. - Initial support for sending and receiving Linked Data Notification messages indicating relationships between a dataset and external resources (e.g. papers or other dataset) that can be used to trigger additional actions, such as the creation of a back-link to provide, for example, bi-dreictional linking between a published paper and a Dataverse dataset. @@ -28,6 +28,7 @@ Changes and fixes in this release include: - Administrators can configure a RemoteOverlay store to allow files that remain hosted by a remote system to be added to a dataset. (PR #7325) - Administrators can configure dataverse to send archival Bag copies of published dataset versions to any S3-compatible service. (PR #8751) - Users can see information about a dataset's parent collection(s) in the OAI-ORE metadata export. (PR #8770) +- Users and Administrators can use the OAI-ORE metadata export to retrieve and validate the checksum of the original file for ingested tabuar files. (PR #8901) - Archiving via RDA-conformant Bags is more robust and is more configurable (PR #8773, 8747, 8699, 8609, 8606, 8610) - Users and administrators can see the archival status of the versions of the datasets they manage in the dataset page version table (PR #8748, #8696) - Administrators can configure messaging between Dataverse and other repositories that may hold related resources or services intersted in activity within Dataverse (PR #8775) @@ -61,7 +62,7 @@ See the "Backward Incompatibilities" section below. ### OAI-ORE and Archival Bag Changes -It is likely that capabilities in development (i.e. as part of the (Dataverse Uplaoder)[https://github/org/GlobalDataverseCommunityConsortium/dataverse-uploader]) to allow re-creation of a dataset version from an archival bag will only be fully compatible with archival bags generated by a Dataverse instance at a release >v5.12. Administrators should be aware that re-creating archival bags, i.e. via the new batch archiving API, will be recommended at some point in the future. +Earlier versions of the archival bags included the ingested (tab-separated-value) version of tabular files while providing the checksum of the original file (Issue #8849). This release fixes that by including the original file and its metadata in the archival bag. This means that archival bags created prior to this version do not include a way to validate ingested files. Further, it is likely that capabilities in development (i.e. as part of the (Dataverse Uplaoder)[https://github/org/GlobalDataverseCommunityConsortium/dataverse-uploader]) to allow re-creation of a dataset version from an archival bag will only be fully compatible with archival bags generated by a Dataverse instance at a release > v5.12. (Specifically, at a minimum, since only the ingested file is included in earlier archival bags, an upload via DVUploader would not result in the same original file/ingested version as in the original dataset.) Administrators should be aware that re-creating archival bags, i.e. via the new batch archiving API, may be advisable now and will be recommended at some point in the future (i.e. there will be a point where we will start versioning archival bags and will start maintaining backward compatibility for older versions as part of transitioning this from being an experimental capability). ## Complete List of Changes @@ -73,5 +74,5 @@ If this is a new installation, please see our [Installation Guide](https://guide ## Upgrade Instructions -8\. Re-export metadata files (OAI_ORE is affected by the PRs in this release note file) +8\. Re-export metadata files (OAI_ORE is affected by the PRs in this release note file) Optionally, for those using Dataverse's BagIt-based archiving, re-archive datasetversions archived using prior Dataverse versions. This will be recommended/required in a future release. From 268c38d745fc00ae43b13806ee0c3455ad7e8da7 Mon Sep 17 00:00:00 2001 From: qqmyers Date: Tue, 9 Aug 2022 14:43:23 -0400 Subject: [PATCH 03/10] Apply suggestions from code review Co-authored-by: Philip Durbin --- .../8611-DataCommons-related-notes.md | 27 ++++++++++--------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/doc/release-notes/8611-DataCommons-related-notes.md b/doc/release-notes/8611-DataCommons-related-notes.md index fb324ddb51b..0adaca1590d 100644 --- a/doc/release-notes/8611-DataCommons-related-notes.md +++ b/doc/release-notes/8611-DataCommons-related-notes.md @@ -6,17 +6,18 @@ This release brings new features, enhancements, and bug fixes to the Dataverse S ### Harvard Data Commons Additions -As reported at the 2022 Dataverse Community Meeting, the Harvard Data Commons project has supported a wide range of additions to the Dataverse software that improve support for Big Data, Workflows, Archiving, and Interaction with other repositories. In many cases, these additions build upon features developed within the Dataverse Community = by Borealis, DANS, QDR, and TDL and others. Highlights from this work include: +As reported at the 2022 Dataverse Community Meeting, the Harvard Data Commons project has supported a wide range of additions to the Dataverse software that improve support for Big Data, Workflows, Archiving, and Interaction with other repositories. In many cases, these additions build upon features developed within the Dataverse community by Borealis, DANS, QDR, and TDL and others. Highlights from this work include: -- Initial support for Globus file transfer to upload/download to/from a Dataverse managed S3 store. The current implementation disables file restriction and embargo on Globus-enabled stores. +- Initial support for Globus file transfer to upload to and download from a Dataverse managed S3 store. The current implementation disables file restriction and embargo on Globus-enabled stores. + - ``` - Initial support for Remote File Storage. This capability, enabled via a new RemoteOverlay store type, allows a file stored in a remote system to be added to a dataset (currently only via API) with download requests redirected to the remote system. Use cases include referencing public files hosted on external web servers as well as support for controlled access managed by Dataverse (e.g. via restricted and embargoed status) and/or by the remote store. - Workflow (add Aday's notes here or reword to separate the Objective 2 work) - Support for Archiving to any S3 store using Dataverse's RDA-conformant BagIT file format (a BagPack). - Improved error handling and performance in archival bag creation and new options such as only supporting archiving of one dataset version. -- Additions/corrections to the OAI-ORE metadata format (which is included in archival bags) such as referenciong the name/mimeType/size/checksum/download URL or the original file for ingested files, the inclusion of metadata about the parent collection(s) of an archived dataset version and use of the URL form of PIDs. +- Additions/corrections to the OAI-ORE metadata format (which is included in archival bags) such as referencing the name/mimeType/size/checksum/download URL or the original file for ingested files, the inclusion of metadata about the parent collection(s) of an archived dataset version and use of the URL form of PIDs. - Display of archival status within the dataset page versions table, richer status options including success, pending, and failure states, with a complete API for managing archival status. - Support for batch archiving via API as an alternative to the current options of configuring archiving upon publication or archiving each dataset version manually. -- Initial support for sending and receiving Linked Data Notification messages indicating relationships between a dataset and external resources (e.g. papers or other dataset) that can be used to trigger additional actions, such as the creation of a back-link to provide, for example, bi-dreictional linking between a published paper and a Dataverse dataset. +- Initial support for sending and receiving Linked Data Notification messages indicating relationships between a dataset and external resources (e.g. papers or other dataset) that can be used to trigger additional actions, such as the creation of a back-link to provide, for example, bi-directional linking between a published paper and a Dataverse dataset. @@ -26,25 +27,25 @@ Changes and fixes in this release include: - Administrators can configure an S3 store used in Dataverse to support users uploading/downloading files via Globus File Transfer (PR #8891) - Administrators can configure a RemoteOverlay store to allow files that remain hosted by a remote system to be added to a dataset. (PR #7325) -- Administrators can configure dataverse to send archival Bag copies of published dataset versions to any S3-compatible service. (PR #8751) +- Administrators can configure the Dataverse software to send archival Bag copies of published dataset versions to any S3-compatible service. (PR #8751) - Users can see information about a dataset's parent collection(s) in the OAI-ORE metadata export. (PR #8770) - Users and Administrators can use the OAI-ORE metadata export to retrieve and validate the checksum of the original file for ingested tabuar files. (PR #8901) -- Archiving via RDA-conformant Bags is more robust and is more configurable (PR #8773, 8747, 8699, 8609, 8606, 8610) +- Archiving via RDA-conformant Bags is more robust and is more configurable (PR #8773, #8747, #8699, #8609, #8606, #8610) - Users and administrators can see the archival status of the versions of the datasets they manage in the dataset page version table (PR #8748, #8696) -- Administrators can configure messaging between Dataverse and other repositories that may hold related resources or services intersted in activity within Dataverse (PR #8775) +- Administrators can configure messaging between their Dataverse installation and other repositories that may hold related resources or services interested in activity within that installation (PR #8775) ## Notes for Dataverse Installation Administrators ### Enabling experimental capabilities -Several of the capabilities introduced in v5.12 are 'experimental' in the sense that further changes and enhancements to these capabilities should be expected and that these changes may involve additional work, for those who use the initial implementations, when upgrading to newer versions of Dataverse. Administrators wishing to use them are encouraged to stay in touch, e.g. via the Dataverse Community Slack space, to understand the limits of current capabilties and to plan for future upgrades. +Several of the capabilities introduced in v5.12 are "experimental" in the sense that further changes and enhancements to these capabilities should be expected and that these changes may involve additional work, for those who use the initial implementations, when upgrading to newer versions of the Dataverse software. Administrators wishing to use them are encouraged to stay in touch, e.g. via the Dataverse Community Slack space, to understand the limits of current capabilties and to plan for future upgrades. ## New JVM Options and DB Settings The following DB settings have been added: - `:LDNMessageHosts` -- `: BasicGlobusToken` +- `:BasicGlobusToken` - `:GlobusEndpoint` - `:GlobusStores` - `:GlobusAppUrl` @@ -52,7 +53,7 @@ The following DB settings have been added: - `:S3ArchiverProfile` - `:DRSArchivalConfig` -See the [Database Settings](https://guides.dataverse.org/en/5.11/installation/config.html#database-settings) section of the Guides for more information. +See the [Database Settings](https://guides.dataverse.org/en/5.12/installation/config.html#database-settings) section of the Guides for more information. ## Notes for Developers and Integrators @@ -62,7 +63,7 @@ See the "Backward Incompatibilities" section below. ### OAI-ORE and Archival Bag Changes -Earlier versions of the archival bags included the ingested (tab-separated-value) version of tabular files while providing the checksum of the original file (Issue #8849). This release fixes that by including the original file and its metadata in the archival bag. This means that archival bags created prior to this version do not include a way to validate ingested files. Further, it is likely that capabilities in development (i.e. as part of the (Dataverse Uplaoder)[https://github/org/GlobalDataverseCommunityConsortium/dataverse-uploader]) to allow re-creation of a dataset version from an archival bag will only be fully compatible with archival bags generated by a Dataverse instance at a release > v5.12. (Specifically, at a minimum, since only the ingested file is included in earlier archival bags, an upload via DVUploader would not result in the same original file/ingested version as in the original dataset.) Administrators should be aware that re-creating archival bags, i.e. via the new batch archiving API, may be advisable now and will be recommended at some point in the future (i.e. there will be a point where we will start versioning archival bags and will start maintaining backward compatibility for older versions as part of transitioning this from being an experimental capability). +Earlier versions of the archival bags included the ingested (tab-separated-value) version of tabular files while providing the checksum of the original file (Issue #8449). This release fixes that by including the original file and its metadata in the archival bag. This means that archival bags created prior to this version do not include a way to validate ingested files. Further, it is likely that capabilities in development (i.e. as part of the [Dataverse Uploader](https://github/org/GlobalDataverseCommunityConsortium/dataverse-uploader) to allow re-creation of a dataset version from an archival bag will only be fully compatible with archival bags generated by a Dataverse instance at a release > v5.12. (Specifically, at a minimum, since only the ingested file is included in earlier archival bags, an upload via DVUploader would not result in the same original file/ingested version as in the original dataset.) Administrators should be aware that re-creating archival bags, i.e. via the new batch archiving API, may be advisable now and will be recommended at some point in the future (i.e. there will be a point where we will start versioning archival bags and will start maintaining backward compatibility for older versions as part of transitioning this from being an experimental capability). ## Complete List of Changes @@ -70,9 +71,9 @@ Earlier versions of the archival bags included the ingested (tab-separated-value ## Installation -If this is a new installation, please see our [Installation Guide](https://guides.dataverse.org/en/5.11/installation/). Please also contact us to get added to the [Dataverse Project Map](https://guides.dataverse.org/en/5.11/installation/config.html#putting-your-dataverse-installation-on-the-map-at-dataverse-org) if you have not done so already. +If this is a new installation, please see our [Installation Guide](https://guides.dataverse.org/en/5.12/installation/). Please also contact us to get added to the [Dataverse Project Map](https://guides.dataverse.org/en/5.12/installation/config.html#putting-your-dataverse-installation-on-the-map-at-dataverse-org) if you have not done so already. ## Upgrade Instructions -8\. Re-export metadata files (OAI_ORE is affected by the PRs in this release note file) Optionally, for those using Dataverse's BagIt-based archiving, re-archive datasetversions archived using prior Dataverse versions. This will be recommended/required in a future release. +8\. Re-export metadata files (OAI_ORE is affected by the PRs in these release notes). Optionally, for those using the Dataverse software's BagIt-based archiving, re-archive dataset versions archived using prior versions of the Dataverse software. This will be recommended/required in a future release. From ca11cd245dfe0bcff7a86377cb4bacfb2ac295ff Mon Sep 17 00:00:00 2001 From: qqmyers Date: Tue, 9 Aug 2022 14:47:51 -0400 Subject: [PATCH 04/10] improve wording --- doc/release-notes/8611-DataCommons-related-notes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/release-notes/8611-DataCommons-related-notes.md b/doc/release-notes/8611-DataCommons-related-notes.md index fb324ddb51b..c62df83b03e 100644 --- a/doc/release-notes/8611-DataCommons-related-notes.md +++ b/doc/release-notes/8611-DataCommons-related-notes.md @@ -28,7 +28,7 @@ Changes and fixes in this release include: - Administrators can configure a RemoteOverlay store to allow files that remain hosted by a remote system to be added to a dataset. (PR #7325) - Administrators can configure dataverse to send archival Bag copies of published dataset versions to any S3-compatible service. (PR #8751) - Users can see information about a dataset's parent collection(s) in the OAI-ORE metadata export. (PR #8770) -- Users and Administrators can use the OAI-ORE metadata export to retrieve and validate the checksum of the original file for ingested tabuar files. (PR #8901) +- Users and Administrators can now use the OAI-ORE metadata export to retrieve and assess the fixity of the the original file (for ingested tabular files) via the included checksum. (PR #8901) - Archiving via RDA-conformant Bags is more robust and is more configurable (PR #8773, 8747, 8699, 8609, 8606, 8610) - Users and administrators can see the archival status of the versions of the datasets they manage in the dataset page version table (PR #8748, #8696) - Administrators can configure messaging between Dataverse and other repositories that may hold related resources or services intersted in activity within Dataverse (PR #8775) From 1c817489896f99a54229f119bbafde5299bb396c Mon Sep 17 00:00:00 2001 From: qqmyers Date: Tue, 9 Aug 2022 15:51:59 -0400 Subject: [PATCH 05/10] add backward compatibility note re: submit to archive --- doc/release-notes/8611-DataCommons-related-notes.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/doc/release-notes/8611-DataCommons-related-notes.md b/doc/release-notes/8611-DataCommons-related-notes.md index 59ceb3770b2..6b5df39fed2 100644 --- a/doc/release-notes/8611-DataCommons-related-notes.md +++ b/doc/release-notes/8611-DataCommons-related-notes.md @@ -61,7 +61,9 @@ See the "Backward Incompatibilities" section below. ## Backward Incompatibilities -### OAI-ORE and Archival Bag Changes +### OAI-ORE and Archiving Changes + +The Admin API call to manually sumbit a dataset version for archiving has changed to require POST instead of GET and to have a name making it clearer that archiving is being done for a given dataset version: /api/admin/submitDatasetVersionToArchive. Earlier versions of the archival bags included the ingested (tab-separated-value) version of tabular files while providing the checksum of the original file (Issue #8449). This release fixes that by including the original file and its metadata in the archival bag. This means that archival bags created prior to this version do not include a way to validate ingested files. Further, it is likely that capabilities in development (i.e. as part of the [Dataverse Uploader](https://github/org/GlobalDataverseCommunityConsortium/dataverse-uploader) to allow re-creation of a dataset version from an archival bag will only be fully compatible with archival bags generated by a Dataverse instance at a release > v5.12. (Specifically, at a minimum, since only the ingested file is included in earlier archival bags, an upload via DVUploader would not result in the same original file/ingested version as in the original dataset.) Administrators should be aware that re-creating archival bags, i.e. via the new batch archiving API, may be advisable now and will be recommended at some point in the future (i.e. there will be a point where we will start versioning archival bags and will start maintaining backward compatibility for older versions as part of transitioning this from being an experimental capability). From 4c4a6f7fc68de7ed36c7bf87463948f2e95f95ff Mon Sep 17 00:00:00 2001 From: qqmyers Date: Fri, 12 Aug 2022 15:21:55 -0400 Subject: [PATCH 06/10] 3B template/citation block notes --- doc/release-notes/8611-DataCommons-related-notes.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/doc/release-notes/8611-DataCommons-related-notes.md b/doc/release-notes/8611-DataCommons-related-notes.md index 6b5df39fed2..0e4633ab6e2 100644 --- a/doc/release-notes/8611-DataCommons-related-notes.md +++ b/doc/release-notes/8611-DataCommons-related-notes.md @@ -18,6 +18,7 @@ As reported at the 2022 Dataverse Community Meeting, the Harvard Data Commons pr - Display of archival status within the dataset page versions table, richer status options including success, pending, and failure states, with a complete API for managing archival status. - Support for batch archiving via API as an alternative to the current options of configuring archiving upon publication or archiving each dataset version manually. - Initial support for sending and receiving Linked Data Notification messages indicating relationships between a dataset and external resources (e.g. papers or other dataset) that can be used to trigger additional actions, such as the creation of a back-link to provide, for example, bi-directional linking between a published paper and a Dataverse dataset. +- A new capability to provide custom per field instructions in dataset templates @@ -33,6 +34,7 @@ Changes and fixes in this release include: - Archiving via RDA-conformant Bags is more robust and is more configurable (PR #8773, #8747, #8699, #8609, #8606, #8610) - Users and administrators can see the archival status of the versions of the datasets they manage in the dataset page version table (PR #8748, #8696) - Administrators can configure messaging between their Dataverse installation and other repositories that may hold related resources or services interested in activity within that installation (PR #8775) +- Collection managers can create templates that include custom instructions on how to fill out specific metadata fields. ## Notes for Dataverse Installation Administrators @@ -79,3 +81,4 @@ If this is a new installation, please see our [Installation Guide](https://guide 8\. Re-export metadata files (OAI_ORE is affected by the PRs in these release notes). Optionally, for those using the Dataverse software's BagIt-based archiving, re-archive dataset versions archived using prior versions of the Dataverse software. This will be recommended/required in a future release. +9. Standard instructions for reinstalling the citation metadatablock. There are no new fields so solr changes/reindex aren't needed. This PR just adds an option to the list of publicationIdTypes From a21684c0acac06f6ccdd19f90397e85cf1075440 Mon Sep 17 00:00:00 2001 From: qqmyers Date: Thu, 18 Aug 2022 17:59:16 -0400 Subject: [PATCH 07/10] typo --- doc/release-notes/8611-DataCommons-related-notes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/release-notes/8611-DataCommons-related-notes.md b/doc/release-notes/8611-DataCommons-related-notes.md index 0e4633ab6e2..0e4290b595f 100644 --- a/doc/release-notes/8611-DataCommons-related-notes.md +++ b/doc/release-notes/8611-DataCommons-related-notes.md @@ -14,7 +14,7 @@ As reported at the 2022 Dataverse Community Meeting, the Harvard Data Commons pr - Workflow (add Aday's notes here or reword to separate the Objective 2 work) - Support for Archiving to any S3 store using Dataverse's RDA-conformant BagIT file format (a BagPack). - Improved error handling and performance in archival bag creation and new options such as only supporting archiving of one dataset version. -- Additions/corrections to the OAI-ORE metadata format (which is included in archival bags) such as referencing the name/mimeType/size/checksum/download URL or the original file for ingested files, the inclusion of metadata about the parent collection(s) of an archived dataset version and use of the URL form of PIDs. +- Additions/corrections to the OAI-ORE metadata format (which is included in archival bags) such as referencing the name/mimeType/size/checksum/download URL of the original file for ingested files, the inclusion of metadata about the parent collection(s) of an archived dataset version, and use of the URL form of PIDs. - Display of archival status within the dataset page versions table, richer status options including success, pending, and failure states, with a complete API for managing archival status. - Support for batch archiving via API as an alternative to the current options of configuring archiving upon publication or archiving each dataset version manually. - Initial support for sending and receiving Linked Data Notification messages indicating relationships between a dataset and external resources (e.g. papers or other dataset) that can be used to trigger additional actions, such as the creation of a back-link to provide, for example, bi-directional linking between a published paper and a Dataverse dataset. From f44bebd6cb5f97a2352df4b0d62d73d59f36005c Mon Sep 17 00:00:00 2001 From: qqmyers Date: Mon, 19 Sep 2022 13:24:32 -0400 Subject: [PATCH 08/10] Update release notes --- doc/release-notes/8611-DataCommons-related-notes.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/doc/release-notes/8611-DataCommons-related-notes.md b/doc/release-notes/8611-DataCommons-related-notes.md index 0e4290b595f..d19e9b741f1 100644 --- a/doc/release-notes/8611-DataCommons-related-notes.md +++ b/doc/release-notes/8611-DataCommons-related-notes.md @@ -47,13 +47,14 @@ Several of the capabilities introduced in v5.12 are "experimental" in the sense The following DB settings have been added: - `:LDNMessageHosts` -- `:BasicGlobusToken` +- `:GlobusBasicToken` - `:GlobusEndpoint` - `:GlobusStores` - `:GlobusAppUrl` +- `:GlobusPollingInterval` - `:S3ArchiverConfig` - `:S3ArchiverProfile` -- `:DRSArchivalConfig` +- `:DRSArchiverConfig` See the [Database Settings](https://guides.dataverse.org/en/5.12/installation/config.html#database-settings) section of the Guides for more information. From a6875f34f57f67fcc753c7579e4b49f1e0e38aca Mon Sep 17 00:00:00 2001 From: qqmyers Date: Mon, 19 Sep 2022 13:27:12 -0400 Subject: [PATCH 09/10] update release notes with recent changes --- doc/release-notes/8611-DataCommons-related-notes.md | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/release-notes/8611-DataCommons-related-notes.md b/doc/release-notes/8611-DataCommons-related-notes.md index d19e9b741f1..2a9648c6fd1 100644 --- a/doc/release-notes/8611-DataCommons-related-notes.md +++ b/doc/release-notes/8611-DataCommons-related-notes.md @@ -52,6 +52,7 @@ The following DB settings have been added: - `:GlobusStores` - `:GlobusAppUrl` - `:GlobusPollingInterval` +- `:GlobusSingleFileTransfer` - `:S3ArchiverConfig` - `:S3ArchiverProfile` - `:DRSArchiverConfig` From cd6edb4e5922d4e73da72156b68782a0a5edf8c7 Mon Sep 17 00:00:00 2001 From: Philip Durbin Date: Mon, 19 Sep 2022 15:04:33 -0400 Subject: [PATCH 10/10] tweaks for HDC release notes #8611 --- .../8611-DataCommons-related-notes.md | 29 ++++++++----------- .../8639-computational-workflow.md | 4 ++- ...59-add-computational-worflow-file-types.md | 2 ++ 3 files changed, 17 insertions(+), 18 deletions(-) diff --git a/doc/release-notes/8611-DataCommons-related-notes.md b/doc/release-notes/8611-DataCommons-related-notes.md index 2a9648c6fd1..af222db5b9f 100644 --- a/doc/release-notes/8611-DataCommons-related-notes.md +++ b/doc/release-notes/8611-DataCommons-related-notes.md @@ -6,41 +6,38 @@ This release brings new features, enhancements, and bug fixes to the Dataverse S ### Harvard Data Commons Additions -As reported at the 2022 Dataverse Community Meeting, the Harvard Data Commons project has supported a wide range of additions to the Dataverse software that improve support for Big Data, Workflows, Archiving, and Interaction with other repositories. In many cases, these additions build upon features developed within the Dataverse community by Borealis, DANS, QDR, and TDL and others. Highlights from this work include: +As reported at the 2022 Dataverse Community Meeting, the [Harvard Data Commons](https://sites.harvard.edu/harvard-data-commons/) project has supported a wide range of additions to the Dataverse software that improve support for Big Data, Workflows, Archiving, and interaction with other repositories. In many cases, these additions build upon features developed within the Dataverse community by Borealis, DANS, QDR, TDL, and others. Highlights from this work include: - Initial support for Globus file transfer to upload to and download from a Dataverse managed S3 store. The current implementation disables file restriction and embargo on Globus-enabled stores. - - ``` - Initial support for Remote File Storage. This capability, enabled via a new RemoteOverlay store type, allows a file stored in a remote system to be added to a dataset (currently only via API) with download requests redirected to the remote system. Use cases include referencing public files hosted on external web servers as well as support for controlled access managed by Dataverse (e.g. via restricted and embargoed status) and/or by the remote store. -- Workflow (add Aday's notes here or reword to separate the Objective 2 work) -- Support for Archiving to any S3 store using Dataverse's RDA-conformant BagIT file format (a BagPack). +- Initial support for computational workflows, including a new metadata block and detected filetypes. +- Support for archiving to any S3 store using Dataverse's RDA-conformant BagIT file format (a BagPack). - Improved error handling and performance in archival bag creation and new options such as only supporting archiving of one dataset version. -- Additions/corrections to the OAI-ORE metadata format (which is included in archival bags) such as referencing the name/mimeType/size/checksum/download URL of the original file for ingested files, the inclusion of metadata about the parent collection(s) of an archived dataset version, and use of the URL form of PIDs. +- Additions/corrections to the OAI-ORE metadata format (which is included in archival bags) such as referencing the name/mimetype/size/checksum/download URL of the original file for ingested files, the inclusion of metadata about the parent collection(s) of an archived dataset version, and use of the URL form of PIDs. - Display of archival status within the dataset page versions table, richer status options including success, pending, and failure states, with a complete API for managing archival status. - Support for batch archiving via API as an alternative to the current options of configuring archiving upon publication or archiving each dataset version manually. - Initial support for sending and receiving Linked Data Notification messages indicating relationships between a dataset and external resources (e.g. papers or other dataset) that can be used to trigger additional actions, such as the creation of a back-link to provide, for example, bi-directional linking between a published paper and a Dataverse dataset. - A new capability to provide custom per field instructions in dataset templates - - ## Major Use Cases and Infrastructure Enhancements Changes and fixes in this release include: -- Administrators can configure an S3 store used in Dataverse to support users uploading/downloading files via Globus File Transfer (PR #8891) +- Administrators can configure an S3 store used in Dataverse to support users uploading/downloading files via Globus File Transfer. (PR #8891) - Administrators can configure a RemoteOverlay store to allow files that remain hosted by a remote system to be added to a dataset. (PR #7325) - Administrators can configure the Dataverse software to send archival Bag copies of published dataset versions to any S3-compatible service. (PR #8751) - Users can see information about a dataset's parent collection(s) in the OAI-ORE metadata export. (PR #8770) -- Users and Administrators can now use the OAI-ORE metadata export to retrieve and assess the fixity of the the original file (for ingested tabular files) via the included checksum. (PR #8901) -- Archiving via RDA-conformant Bags is more robust and is more configurable (PR #8773, #8747, #8699, #8609, #8606, #8610) -- Users and administrators can see the archival status of the versions of the datasets they manage in the dataset page version table (PR #8748, #8696) -- Administrators can configure messaging between their Dataverse installation and other repositories that may hold related resources or services interested in activity within that installation (PR #8775) +- Users and administrators can now use the OAI-ORE metadata export to retrieve and assess the fixity of the original file (for ingested tabular files) via the included checksum. (PR #8901) +- Archiving via RDA-conformant Bags is more robust and is more configurable. (PR #8773, #8747, #8699, #8609, #8606, #8610) +- Users and administrators can see the archival status of the versions of the datasets they manage in the dataset page version table. (PR #8748, #8696) +- Administrators can configure messaging between their Dataverse installation and other repositories that may hold related resources or services interested in activity within that installation. (PR #8775) - Collection managers can create templates that include custom instructions on how to fill out specific metadata fields. ## Notes for Dataverse Installation Administrators -### Enabling experimental capabilities +### Enabling Experimental Capabilities -Several of the capabilities introduced in v5.12 are "experimental" in the sense that further changes and enhancements to these capabilities should be expected and that these changes may involve additional work, for those who use the initial implementations, when upgrading to newer versions of the Dataverse software. Administrators wishing to use them are encouraged to stay in touch, e.g. via the Dataverse Community Slack space, to understand the limits of current capabilties and to plan for future upgrades. +Several of the capabilities introduced in v5.12 are "experimental" in the sense that further changes and enhancements to these capabilities should be expected and that these changes may involve additional work, for those who use the initial implementations, when upgrading to newer versions of the Dataverse software. Administrators wishing to use them are encouraged to stay in touch, e.g. via the Dataverse Community Slack space, to understand the limits of current capabilities and to plan for future upgrades. ## New JVM Options and DB Settings @@ -73,8 +70,6 @@ Earlier versions of the archival bags included the ingested (tab-separated-value ## Complete List of Changes - - ## Installation If this is a new installation, please see our [Installation Guide](https://guides.dataverse.org/en/5.12/installation/). Please also contact us to get added to the [Dataverse Project Map](https://guides.dataverse.org/en/5.12/installation/config.html#putting-your-dataverse-installation-on-the-map-at-dataverse-org) if you have not done so already. @@ -83,4 +78,4 @@ If this is a new installation, please see our [Installation Guide](https://guide 8\. Re-export metadata files (OAI_ORE is affected by the PRs in these release notes). Optionally, for those using the Dataverse software's BagIt-based archiving, re-archive dataset versions archived using prior versions of the Dataverse software. This will be recommended/required in a future release. -9. Standard instructions for reinstalling the citation metadatablock. There are no new fields so solr changes/reindex aren't needed. This PR just adds an option to the list of publicationIdTypes +9\. Standard instructions for reinstalling the citation metadatablock. There are no new fields so Solr changes/reindex aren't needed. This PR just adds an option to the list of publicationIdTypes diff --git a/doc/release-notes/8639-computational-workflow.md b/doc/release-notes/8639-computational-workflow.md index d1f014e4af3..efd5b26e538 100644 --- a/doc/release-notes/8639-computational-workflow.md +++ b/doc/release-notes/8639-computational-workflow.md @@ -1,6 +1,8 @@ +NOTE: These "workflow" changes should be folded into "Harvard Data Commons Additions" in 8611-DataCommons-related-notes.md + ## Adding Computational Workflow Metadata The new Computational Workflow metadata block will allow depositors to effectively tag datasets as computational workflows. To add the new metadata block, follow the instructions in the user guide: -The location of the new metadata block tsv file is: `dataverse/scripts/api/data/metadatablocks/computational_workflow.tsv` \ No newline at end of file +The location of the new metadata block tsv file is: `dataverse/scripts/api/data/metadatablocks/computational_workflow.tsv` diff --git a/doc/release-notes/8759-add-computational-worflow-file-types.md b/doc/release-notes/8759-add-computational-worflow-file-types.md index fa2fd3d001c..d2db860fe5f 100644 --- a/doc/release-notes/8759-add-computational-worflow-file-types.md +++ b/doc/release-notes/8759-add-computational-worflow-file-types.md @@ -1,3 +1,5 @@ +NOTE: These "workflow" changes should be folded into "Harvard Data Commons Additions" in 8611-DataCommons-related-notes.md + The following file extensions are now detected: wdl=text/x-workflow-description-language