diff --git a/.github/workflows/maven_unit_test.yml b/.github/workflows/maven_unit_test.yml
index 464d60c2db6..e2048f73431 100644
--- a/.github/workflows/maven_unit_test.yml
+++ b/.github/workflows/maven_unit_test.yml
@@ -4,22 +4,34 @@ on:
push:
paths:
- "**.java"
+ - "pom.xml"
+ - "modules/**/pom.xml"
pull_request:
paths:
- "**.java"
+ - "pom.xml"
+ - "modules/**/pom.xml"
jobs:
unittest:
- name: (JDK ${{ matrix.jdk }} / ${{ matrix.os }}) Unit Tests
+ name: (${{ matrix.status}} / JDK ${{ matrix.jdk }}) Unit Tests
strategy:
fail-fast: false
matrix:
- os: [ ubuntu-latest ]
jdk: [ '11' ]
+ experimental: [false]
+ status: ["Stable"]
+ #
+ # JDK 17 builds disabled due to non-essential fails marking CI jobs as completely failed within
+ # Github Projects, PR lists etc. This was consensus on Slack #dv-tech. See issue #8094
+ # (This is a limitation of how Github is currently handling these things.)
+ #
#include:
- # - os: ubuntu-latest
- # jdk: '16'
- runs-on: ${{ matrix.os }}
+ # - jdk: '17'
+ # experimental: true
+ # status: "Experimental"
+ continue-on-error: ${{ matrix.experimental }}
+ runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up JDK ${{ matrix.jdk }}
@@ -34,7 +46,7 @@ jobs:
key: ${{ runner.os }}-m2-${{ hashFiles('**/pom.xml') }}
restore-keys: ${{ runner.os }}-m2
- name: Build with Maven
- run: mvn -DcompilerArgument=-Xlint:unchecked -P all-unit-tests clean test
+ run: mvn -DcompilerArgument=-Xlint:unchecked -Dtarget.java.version=${{ matrix.jdk }} -P all-unit-tests clean test
- name: Maven Code Coverage
env:
CI_NAME: github
diff --git a/.gitignore b/.gitignore
index 7be8263f483..83671abf43e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -70,3 +70,6 @@ venv
scripts/search/data/binary/trees.png.thumb140
src/main/webapp/resources/images/cc0.png.thumb140
src/main/webapp/resources/images/dataverseproject.png.thumb140
+
+# apache-maven is downloaded by docker-aio
+apache-maven*
diff --git a/README.md b/README.md
index 6fd11374353..d40e5f228f7 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-Dataverse®
+Dataverse®
===============
Dataverse is an [open source][] software platform for sharing, finding, citing, and preserving research data (developed by the [Data Science and Products team](http://www.iq.harvard.edu/people/people/data-science-products) at the [Institute for Quantitative Social Science](http://iq.harvard.edu/) and the [Dataverse community][]).
diff --git a/checkstyle.xml b/checkstyle.xml
index 99185e15e97..c00fa3a8c0c 100644
--- a/checkstyle.xml
+++ b/checkstyle.xml
@@ -98,7 +98,7 @@
-->
-
+
-
@@ -421,7 +421,7 @@
maxRamMB - the maximum amount of RAM (in MB) that this cache is allowed
to occupy
-->
-
@@ -432,14 +432,14 @@
document). Since Lucene internal document ids are transient,
this cache will not be autowarmed.
-->
- LoadBalancer
-Clients --> rApache
LoadBalancer --> Apache1
LoadBalancer --> Apache2
diff --git a/doc/release-notes/5.10-release-notes.md b/doc/release-notes/5.10-release-notes.md
new file mode 100644
index 00000000000..0da42a7b527
--- /dev/null
+++ b/doc/release-notes/5.10-release-notes.md
@@ -0,0 +1,344 @@
+# Dataverse Software 5.10
+
+This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
+
+## Release Highlights
+
+### Multiple License Support
+
+Users can now select from a set of configured licenses in addition to or instead of the previous Creative Commons CC0 choice or provide custom terms of use (if configured) for their datasets. Administrators can configure their Dataverse instance via API to allow any desired license as a choice and can enable or disable the option to allow custom terms. Administrators can also mark licenses as "inactive" to disallow future use while keeping that license for existing datasets. For upgrades, only the CC0 license will be preinstalled. New installations will have both CC0 and CC BY preinstalled. The [Configuring Licenses](https://guides.dataverse.org/en/5.10/installation/config.html#configuring-licenses) section of the Installation Guide shows how to add or remove licenses.
+
+**Note: Datasets in existing installations will automatically be updated to conform to new requirements that custom terms cannot be used with a standard license and that custom terms cannot be empty. Administrators may wish to manually update datasets with these conditions if they do not like the automated migration choices. See the "Notes for Dataverse Installation Administrators" section below for details.**
+
+This release also makes the license selection and/or custom terms more prominent when publishing and viewing a dataset and when downloading files.
+
+### Ingest and File Upload Messaging Improvements
+
+Messaging around ingest failure has been softened to prevent support tickets. In addition, messaging during file upload has been improved, especially with regard to showing size limits and providing links to the guides about tabular ingest. For screenshots and additional details see PR #8271.
+
+### Downloading of Guestbook Responses with Fewer Clicks
+
+A download button has been added to the page that lists guestbooks. This saves a click but you can still download responses from the "View Responses" page, as before.
+
+Also, links to the guides about guestbooks have been added in additional places.
+
+### Dynamically Request Arbitrary Metadata Fields from Search API
+
+The Search API now allows arbitrary metadata fields to be requested when displaying results from datasets. You can request all fields from metadata blocks or pick and choose certain fields.
+
+The new parameter is called `metadata_fields` and the Search API documentation contains details and examples:
+
+### Solr 8 Upgrade
+
+The Dataverse Software now runs on Solr 8.11.1, the latest available stable release in the Solr 8.x series.
+
+### PostgreSQL Upgrade
+
+A PostgreSQL upgrade is not required for this release but is planned for the next release. See below for details.
+
+## Major Use Cases and Infrastructure Enhancements
+
+Changes and fixes in this release include:
+
+- When creating or updating datasets, users can select from a set of licenses configured by the administrator (CC, CC BY, custom licenses, etc.) or provide custom terms (if the installation is configured to allow them). (Issue #7440, PR #7920)
+- Users can get better feedback on tabular ingest errors and more information about size limits when uploading files. (Issue #8205, PR #8271)
+- Users can more easily download guestbook responses and learn how guestbooks work. (Issue #8244, PR #8402)
+- Search API users can specify additional metadata fields to be returned in the search results. (Issue #7863, PR #7942)
+- The "Preview" tab on the file page can now show restricted files. (Issue #8258, PR #8265)
+- Users wanting to upload files from GitHub to Dataverse can learn about a new GitHub Action called "Dataverse Uploader". (PR #8416)
+- Users requesting access to files now get feedback that it was successful. (Issue #7469, PR #8341)
+- Users may notice various accessibility improvements. (Issue #8321, PR #8322)
+- Users of the Social Science metadata block can now add multiples of the "Collection Mode" field. (Issue #8452, PR #8473)
+- Guestbooks now support multi-line text area fields. (Issue #8288, PR #8291)
+- Guestbooks can better handle commas in responses. (Issue #8193, PR #8343)
+- Dataset editors can now deselect a guestbook. (Issue #2257, PR #8403)
+- Administrators with a large `actionlogrecord` table can read docs on archiving and then trimming it. (Issue #5916, PR #8292)
+- Administrators can list locks across all datasets. (PR #8445)
+- Administrators can run a version of Solr that doesn't include a version of log4j2 with serious known vulnerabilities. We trust that you have patched the version of Solr you are running now following the instructions that were sent out. An upgrade to the latest version is recommended for extra peace of mind. (PR #8415)
+- Administrators can run a version of Dataverse that doesn't include a version of log4j with known vulnerabilities. (PR #8377)
+
+## Notes for Dataverse Installation Administrators
+
+### Updating for Multiple License Support
+
+#### Adding and Removing Licenses and How Existing Datasets Will Be Automatically Updated
+
+As part of installing or upgrading an existing installation, administrators may wish to add additional license choices and/or configure Dataverse to allow custom terms. Adding additional licenses is managed via API, as explained in the [Configuring Licenses](https://guides.dataverse.org/en/5.10/installation/config.html#configuring-licenses) section of the Installation Guide. Licenses are described via a JSON structure providing a name, URL, short description, and optional icon URL. Additionally licenses may be marked as active (selectable for new or updated datasets) or inactive (only allowed on existing datasets) and one license can be marked as the default. Custom Terms are allowed by default (backward compatible with the current option to select "No" to using CC0) and can be disabled by setting `:AllowCustomTermsOfUse` to false.
+
+Further, administrators should review the following automated migration of existing licenses and terms into the new license framework and, if desired, should manually find and update any datasets for which the automated update is problematic.
+To understand the migration process, it is useful to understand how the multiple license feature works in this release:
+
+"Custom Terms", aka a custom license, are defined through entries in the following fields of the dataset "Terms" tab:
+
+- Terms of Use
+- Confidentiality Declaration
+- Special Permissions
+- Restrictions
+- Citation Requirements
+- Depositor Requirements
+- Conditions
+- Disclaimer
+
+"Custom Terms" require, at a minimum, a non-blank entry in the "Terms of Use" field. Entries in other fields are optional.
+
+Since these fields are intended for terms/conditions that would potentially conflict with or modify the terms in a standard license, they are no longer shown when a standard license is selected.
+
+In earlier Dataverse releases, it was possible to select the CC0 license and have entries in the fields above. It was also possible to say "No" to using CC0 and leave all of these terms fields blank.
+
+The automated process will update existing datasets as follows.
+
+- "CC0 Waiver" and no entries in the fields above -> CC0 License (no change)
+- No CC0 Waiver and an entry in the "Terms of Use" field and possibly others fields listed above -> "Custom Terms" with the same entries in these fields (no change)
+- CC0 Waiver and an entry in some of the fields listed -> 'Custom Terms' with the following text preprended in the "Terms of Use" field: "This dataset is made available under a Creative Commons CC0 license with the following additional/modified terms and conditions:"
+- No CC0 Waiver and an entry in a field(s) other than the "Terms of Use" field -> "Custom Terms" with the following "Terms of Use" added: "This dataset is made available with limited information on how it can be used. You may wish to communicate with the Contact(s) specified before use."
+- No CC0 Waiver and no entry in any of the listed fields -> "Custom Terms" with the following "Terms of Use" added: "This dataset is made available without information on how it can be used. You should communicate with the Contact(s) specified before use."
+
+Administrators who have datasets where CC0 has been selected along with additional terms, or datasets where the Terms of Use field is empty, may wish to modify those datasets prior to upgrading to avoid the automated changes above. This is discussed next.
+
+#### Handling Datasets that No Longer Comply With Licensing Rules
+
+In most Dataverse installations, one would expect the vast majority of datasets to either use the CC0 Waiver or have non-empty Terms of Use. As noted above, these will be migrated without any issue. Administrators may however wish to find and manually update datasets that specified a CC0 license but also had terms (no longer allowed) or had no license and no terms of use (also no longer allowed) rather than accept the default migrations for these datasets listed above.
+
+##### Finding and Modifying Datasets with a CC0 License and Non-Empty Terms
+
+To find datasets with a CC0 license and non-empty terms:
+
+```
+select CONCAT('doi:', dvo.authority, '/', dvo.identifier), v.alias as dataverse_alias, case when versionstate='RELEASED' then concat(dv.versionnumber, '.', dv.minorversionnumber) else versionstate END as version, dv.id as datasetversion_id, t.id as termsofuseandaccess_id, t.termsofuse, t.confidentialitydeclaration, t.specialpermissions, t.restrictions, t.citationrequirements, t.depositorrequirements, t.conditions, t.disclaimer from dvobject dvo, termsofuseandaccess t, datasetversion dv, dataverse v where dv.dataset_id=dvo.id and dv.termsofuseandaccess_id=t.id and dvo.owner_id=v.id and t.license='CC0' and not (t.termsofuse is null and t.confidentialitydeclaration is null and t.specialpermissions is null and t.restrictions is null and citationrequirements is null and t.depositorrequirements is null and t.conditions is null and t.disclaimer is null);
+```
+
+The `datasetdoi` column will let you find and view the affected dataset in the Dataverse web interface. The `version` column will indicate which version(s) are relevant. The `dataverse_alias` will tell you which Dataverse collection the dataset is in (and may be useful if you want to adjust all datasets in a given collection). The `termsofuseandaccess_id` column indicates which specific entry in that table is associated with the dataset/version. The remaining columns show the values of any terms fields.
+
+There are two options to migrate such datasets:
+
+Option 1: Set all terms fields to null:
+
+```
+update termsofuseandaccess set termsofuse=null, confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null where id=;
+```
+
+or to change several at once:
+
+```
+update termsofuseandaccess set termsofuse=null, confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null where id in ();
+```
+
+Option 2: Change the Dataset version(s) to not use the CCO waiver and modify the Terms of Use (and/or other fields) as you wish to indicate that the CC0 waiver was previously selected:
+
+```
+ update termsofuseandaccess set license='NONE', termsofuse=concat('New text. ', termsofuse) where id=;
+```
+
+or
+
+```
+ update termsofuseandaccess set license='NONE', termsofuse=concat('New text. ', termsofuse) where id in ();
+```
+
+##### Finding and Modifying Datasets without a CC0 License and with Empty Terms
+
+To find datasets with a without a CC0 license and with empty terms:
+
+```
+select CONCAT('doi:', dvo.authority, '/', dvo.identifier), v.alias as dataverse_alias, case when versionstate='RELEASED' then concat(dv.versionnumber, '.', dv.minorversionnumber) else versionstate END as version, dv.id as datasetversion_id, t.id as termsofuseandaccess_id, t.termsofuse, t.confidentialitydeclaration, t.specialpermissions, t.restrictions, t.citationrequirements, t.depositorrequirements, t.conditions, t.disclaimer from dvobject dvo, termsofuseandaccess t, datasetversion dv, dataverse v where dv.dataset_id=dvo.id and dv.termsofuseandaccess_id=t.id and dvo.owner_id=v.id and t.license='NONE' and t.termsofuse is null;
+```
+
+As before, there are a couple options.
+
+Option 1: These datasets could be updated to use CC0:
+
+```
+update termsofuseandaccess set license='CC0', confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null where id=;
+```
+
+Option 2: Terms of Use could be added:
+
+```
+update termsofuseandaccess set termsofuse='New text. ' where id=;
+```
+
+In both cases, the same where id in (``); ending could be used to change multiple datasets/versions at once.
+
+#### Standardizing Custom Licenses
+
+If many datasets use the same set of Custom Terms, it may make sense to create and register a standard license including those terms. Doing this would include:
+
+- Creating and posting an external document that includes the custom terms, i.e. an HTML document with sections corresponding to the terms fields that are used.
+- Defining a name, short description, URL (where it is posted), and optionally an icon URL for this license
+- Using the Dataverse API to register the new license as one of the options available in your installation
+- Using the API to make sure the license is active and deciding whether the license should also be the default
+- Once the license is registered with Dataverse, making an SQL update to change datasets/versions using that license to reference it instead of having their own copy of those custom terms.
+
+The benefits of this approach are:
+
+- usability: the license can be selected for new datasets without allowing custom terms and without users having to cut/paste terms or collection administrators having to configure templates with those terms
+- efficiency: custom terms are stored per dataset whereas licenses are registered once and all uses of it refer to the same object and external URL
+- security: with the license terms maintained external to Dataverse, users cannot edit specific terms and curators do not need to check for edits
+
+Once a standardized version of your Custom Terms are registered as a license, an SQL update like the following can be used to have datasets use it:
+
+```
+UPDATE termsofuseandaccess
+SET license_id = (SELECT license.id FROM license WHERE license.name = ''), termsofuse=null, confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null
+WHERE termsofuseandaccess.termsofuse LIKE '%%';
+```
+
+Note that this information is also available in the [Configuring Licenses](https://guides.dataverse.org/en/5.10/installation/config.html#configuring-licenses) section of the Installation Guide. Look for "Standardizing Custom Licenses".
+
+### PostgreSQL Version 10+ Required
+
+If you are still using PostgreSQL 9.x, now is the time to upgrade. PostgreSQL 9.x is now EOL (no longer supported, as of January 2022), and in the next version of the Dataverse Software we plan to upgrade the Flyway library (used for database migrations) to a version that will no longer work with versions prior to PostgreSQL 10. See PR #8296 for more on this upcoming Flyway upgrade.
+
+The Dataverse Software has been tested with PostgreSQL versions up to 13. The current stable version 13.5 is recommended. If that's not an option for reasons specific to your installation (for example, if PostgreSQL 13.5 is not available for the OS distribution you are using), any 10+ version should work.
+
+See the upgrade section below for more information.
+
+### Providing S3 Storage Credentials via MicroProfile Config
+
+With this release, you may use two new JVM options (`dataverse.files..access-key` and `dataverse.files..secret-key`) to pass an access key identifier and a secret access key for S3-based storage definitions without creating the files used by the AWS CLI tools (`~/.aws/config` & `~/.aws/credentials`).
+
+This has been added to ease setups using containers (Docker, Podman, Kubernetes, OpenShift) or testing and development installations. Find additional [documentation and a word of warning in the Installation Guide](https://guides.dataverse.org/en/5.10/installation/config.html#s3-mpconfig).
+
+## New JVM Options and DB Settings
+
+The following JVM settings have been added:
+
+- `dataverse.files..access-key` - S3 access key ID.
+- `dataverse.files..secret-key` - S3 secret access key.
+
+See the [JVM Options](https://guides.dataverse.org/en/5.10/installation/config.html#jvm-options) section of the Installation Guide for more information.
+
+The following DB settings have been added:
+
+- `:AllowCustomTermsOfUse` (default: true) - allow users to provide Custom Terms instead of choosing one of the configured standard licenses.
+
+See the [Database Settings](https://guides.dataverse.org/en/5.10/installation/config.html#database-settings) section of the Guides for more information.
+
+## Notes for Developers and Integrators
+
+In the "Backward Incompatibilities" section below, note changes in the API regarding licenses and the native JSON format.
+
+## Backward Incompatibilities
+
+With the change to support multiple licenses, which can include cases where CC0 is not an option, and the decision to prohibit two previously possible cases (no license and no entry in the "Terms of Use" field, a standard license and entries in "Terms of Use", "Special Permissions", and related fields), this release contains changes to the display, API payloads, and export metadata that are not backward compatible. These include:
+
+- "CC0 Waiver" has been replaced by "CC0 1.0" (the short name specified by Creative Commons) in the web interface, API payloads, and export formats that include a license name. (Note that installation admins can alter the license name in the database to maintain the original "CC0 Waiver" text, if desired.)
+- Schema.org metadata in page headers and the Schema.org JSON-LD metadata export now reference the license via URL (which should avoid the current warning from Google about an invalid license object in the page metadata).
+- Metadata exports and import methods (including SWORD) use either the license name (e.g. in the JSON export) or URL (e.g. in the OAI_ORE export) rather than a hardcoded value of "CC0" or "CC0 Waiver" currently (if the CC0 license is available, its default name would be "CC0 1.0").
+- API calls (e.g. for import, migrate) that specify both a license and custom terms will be considered an error, as would having no license and an empty/blank value for "Terms of Use".
+- Rollback. In general, one should not deploy an earlier release over a database that has been modified by deployment of a later release. (Make a db backup before upgrading and use that copy if you go back to a prior version.) Due to the nature of the db changes in this release, attempts to deploy an earlier version of Dataverse will fail unless the database is also restored to its pre-release state.
+
+Also, note that since CC0 Waiver is no longer a hardcoded option, text strings that reference it have been edited or removed from `Bundle.properties`. This means that the ability to provide translations of the CC0 license name/description has been removed. The initial release of multiple license functionality doesn't include an alternative mechanism to provide translations of license names/descriptions, so this is a regression in capability (see #8346). The instructions and help information about license and terms remains internationalizable, it is only the name/description of the licenses themselves that cannot yet be translated.
+
+An update in the metadata block Social Science changes the field CollectionMode to allow multiple values. This changes the way the field is encoded in the native JSON format. From
+
+```
+"typeName": "collectionMode",
+"multiple": false,
+"typeClass": "primitive",
+"value": "some text"
+```
+
+to
+
+```
+"typeName": "collectionMode",
+"multiple": true,
+"typeClass": "primitive",
+"value": ["some text", "more text"]
+```
+
+## Complete List of Changes
+
+For the complete list of code changes in this release, see the [5.10 Milestone](https://github.com/IQSS/dataverse/milestone/101?closed=1) in Github.
+
+For help with upgrading, installing, or general questions please post to the [Dataverse Community Google Group](https://groups.google.com/forum/#!forum/dataverse-community) or email support@dataverse.org.
+
+## Installation
+
+If this is a new installation, please see our [Installation Guide](https://guides.dataverse.org/en/5.10/installation/). Please also contact us to get added to the [Dataverse Project Map](https://guides.dataverse.org/en/5.10/installation/config.html#putting-your-dataverse-installation-on-the-map-at-dataverse-org) if you have not done so already.
+
+## Upgrade Instructions
+
+0\. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the [Dataverse Software 5 Release Notes](https://github.com/IQSS/dataverse/releases/tag/v5.0). After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.10.
+
+If you are running Payara as a non-root user (and you should be!), **remember not to execute the commands below as root**. Use `sudo` to change to that user first. For example, `sudo -i -u dataverse` if `dataverse` is your dedicated application user.
+
+In the following commands we assume that Payara 5 is installed in `/usr/local/payara5`. If not, adjust as needed.
+
+`export PAYARA=/usr/local/payara5`
+
+(or `setenv PAYARA /usr/local/payara5` if you are using a `csh`-like shell)
+
+1\. Undeploy the previous version.
+
+- `$PAYARA/bin/asadmin list-applications`
+- `$PAYARA/bin/asadmin undeploy dataverse<-version>`
+
+2\. Stop Payara and remove the generated directory
+
+- `service payara stop`
+- `rm -rf $PAYARA/glassfish/domains/domain1/generated`
+
+3\. Start Payara
+
+- `service payara start`
+
+4\. Deploy this version.
+
+- `$PAYARA/bin/asadmin deploy dataverse-5.10.war`
+
+5\. Restart payara
+
+- `service payara stop`
+- `service payara start`
+
+6\. Update the Social Science metadata block
+
+- `wget https://github.com/IQSS/dataverse/releases/download/v5.10/social_science.tsv`
+- `curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @social_science.tsv -H "Content-type: text/tab-separated-values"`
+- Note that this update also requires an updated Solr schema. We strongly recommend that you upgrade Solr as part of this release, by installing the latest stable release from scratch (see below). In the process you will configure it with the latest version of the schema as distributed with this Dataverse release, so no further steps will be needed. If you have already upgraded, or have some **very** good reason to stay on the old version a little longer, please refer to for information on updating your Solr schema in place.
+
+7\. Run ReExportall to update Exports
+
+Following the directions in the [Admin Guide](http://guides.dataverse.org/en/5.10/admin/metadataexport.html#batch-exports-through-the-api)
+
+8\. Upgrade Solr
+
+See "Additional Release Steps" below for how to upgrade Solr.
+
+## Additional Release Steps
+
+### Solr Upgrade
+
+With this release we upgrade to the latest available stable release in the Solr 8.x branch. We recommend a fresh installation of Solr (the index will be empty) followed by an "index all".
+
+Before you start the "index all", the Dataverse installation will appear to be empty because the search results come from Solr. As indexing progresses, partial results will appear until indexing is complete.
+
+See for more information.
+
+Please note that after you have followed the instruction above you will have Solr installed with the default schema that lists all the fields in the standard Dataverse metadata blocks. If your installation uses any custom metadata blocks, please refer to for information on updating your Solr schema to include these extra fields.
+
+### PostgreSQL Upgrade
+
+The tested and recommended way of upgrading an existing database is as follows:
+
+- Export your current database with ``pg_dumpall``.
+- Install the new version of PostgreSQL (make sure it's running on the same port, so that no changes are needed in the Payara configuration).
+- Re-import the database with ``psql``, as the user ``postgres``.
+
+It is strongly recommended to use the versions of the ``pg_dumpall`` and ``psql`` from the old and new versions of PostgreSQL, respectively. For example, the commands below were used to migrate a database running under PostgreSQL 9.6 to 13.5. Adjust the versions and the path names to match your environment.
+
+Back up/export:
+
+``/usr/pgsql-9.6/bin/pg_dumpall -U postgres > /tmp/backup.sql``
+
+Restore/import:
+
+``/usr/pgsql-13/bin/psql -U postgres -f /tmp/backup.sql``
+
+When upgrading the production database here at Harvard IQSS we were able to go from version 9.6 all the way to 13.3 without any issues.
+
+You may want to try these backup and restore steps on a test server to get an accurate estimate of how much downtime to expect with the final production upgrade. That of course will depend on the size of your database.
+
+Consult the PostgreSQL upgrade documentation for more information, for example .
diff --git a/doc/release-notes/8210-importddi-fix.md b/doc/release-notes/8210-importddi-fix.md
new file mode 100644
index 00000000000..98981263e58
--- /dev/null
+++ b/doc/release-notes/8210-importddi-fix.md
@@ -0,0 +1 @@
+importddi API subject validation problem was fixed by filling subject with "N/A".
diff --git a/doc/release-notes/8452-multiple-collectionmode.md b/doc/release-notes/8452-multiple-collectionmode.md
new file mode 100644
index 00000000000..b367b9230cd
--- /dev/null
+++ b/doc/release-notes/8452-multiple-collectionmode.md
@@ -0,0 +1,12 @@
+### A small modification to the Social Science metadata block
+
+The metadata block update allows the field "collectionMode" to have multiple values.
+
+For the upgrade instruction:
+
+Update the Social Science metadata block as follows:
+
+- `wget https://github.com/IQSS/dataverse/releases/download/v5.10/social_science.tsv`
+- `curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @social_science.tsv -H "Content-type: text/tab-separated-values"`
+
+As a general reminder, please note that it is important to keep your metadata block definitions up-to-date.
\ No newline at end of file
diff --git a/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv b/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv
index 521d61735dc..952595837f1 100644
--- a/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv
+++ b/doc/sphinx-guides/source/_static/admin/dataverse-external-tools.tsv
@@ -1,5 +1,4 @@
Tool Type Scope Description
-TwoRavens explore file A system of interlocking statistical tools for data exploration, analysis, and meta-analysis: http://2ra.vn. See the :doc:`/user/data-exploration/tworavens` section of the User Guide for more information on TwoRavens from the user perspective and the :doc:`/installation/r-rapache-tworavens` section of the Installation Guide.
Data Explorer explore file A GUI which lists the variables in a tabular data file allowing searching, charting and cross tabulation analysis. See the README.md file at https://github.com/scholarsportal/dataverse-data-explorer-v2 for the instructions on adding Data Explorer to your Dataverse.
Whole Tale explore dataset A platform for the creation of reproducible research packages that allows users to launch containerized interactive analysis environments based on popular tools such as Jupyter and RStudio. Using this integration, Dataverse users can launch Jupyter and RStudio environments to analyze published datasets. For more information, see the `Whole Tale User Guide `_.
File Previewers explore file A set of tools that display the content of files - including audio, html, `Hypothes.is `_ annotations, images, PDF, text, video, tabular data, and spreadsheets - allowing them to be viewed without downloading. The previewers can be run directly from github.io, so the only required step is using the Dataverse API to register the ones you want to use. Documentation, including how to optionally brand the previewers, and an invitation to contribute through github are in the README.md file. Initial development was led by the Qualitative Data Repository and the spreasdheet previewer was added by the Social Sciences and Humanities Open Cloud (SSHOC) project. https://github.com/GlobalDataverseCommunityConsortium/dataverse-previewers
diff --git a/doc/sphinx-guides/source/_static/api/add-license.json b/doc/sphinx-guides/source/_static/api/add-license.json
new file mode 100644
index 00000000000..969d6d58dab
--- /dev/null
+++ b/doc/sphinx-guides/source/_static/api/add-license.json
@@ -0,0 +1,7 @@
+{
+ "name": "CC-BY-4.0",
+ "uri": "http://creativecommons.org/licenses/by/4.0",
+ "shortDescription": "Creative Commons Attribution 4.0 International License.",
+ "iconUrl": "https://i.creativecommons.org/l/by/4.0/88x31.png",
+ "active": true
+}
diff --git a/doc/sphinx-guides/source/_static/api/dataset-migrate.jsonld b/doc/sphinx-guides/source/_static/api/dataset-migrate.jsonld
index 07cf66ace53..f79dbd30d8f 100644
--- a/doc/sphinx-guides/source/_static/api/dataset-migrate.jsonld
+++ b/doc/sphinx-guides/source/_static/api/dataset-migrate.jsonld
@@ -23,7 +23,7 @@
},
"@id": "doi:10.33564/FK27U7YBV",
"schema:version": "1.0",
-"schema:license": "https://creativecommons.org/publicdomain/zero/1.0/",
+"schema:license": "http://creativecommons.org/publicdomain/zero/1.0",
"schema:datePublished": "2021-07-21",
"dvcore:fileTermsOfAccess": {
"dvcore:fileRequestAccess": false
diff --git a/doc/sphinx-guides/source/_static/api/ddi_dataset.xml b/doc/sphinx-guides/source/_static/api/ddi_dataset.xml
new file mode 100644
index 00000000000..79e0581131e
--- /dev/null
+++ b/doc/sphinx-guides/source/_static/api/ddi_dataset.xml
@@ -0,0 +1,191 @@
+
+
+
+
+
+ Replication Data for: Title
+
+
+ Root
+ 2020-02-19
+
+
+ 1
+
+ LastAuthor1, FirstAuthor1; LastAuthor2, FirstAuthor2, 2020, "Replication Data for: Title", Root, V1
+
+
+
+
+
+ Replication Data for: Title
+ Subtitle
+ Alternative Title
+ OtherIDIdentifier1
+ OtherIDIdentifier2
+
+
+ LastAuthor1, FirstAuthor1
+ LastAuthor2, FirstAuthor2
+ LastContributor1, FirstContributor1
+ LastContributor2, FirstContributor2
+
+
+ LastProducer1, FirstProducer1
+ LastProducer2, FirstProducer2
+ 1003-01-01
+ ProductionPlace
+ SoftwareName1
+ SoftwareName2
+ GrantInformationGrantNumber1
+ GrantInformationGrantNumber2
+
+
+ Root
+ LastDistributor1, FirstDistributor1
+ LastDistributor2, FirstDistributor2
+ LastContact1, FirstContact1
+ LastContact2, FirstContact2
+ 1004-01-01
+ LastDepositor, FirstDepositor
+ 1002-01-01
+
+
+ SeriesName
+ SeriesInformation
+
+
+
+
+ Agricultural Sciences
+ Business and Management
+ Engineering
+ Law
+ KeywordTerm1
+ KeywordTerm2
+
+ DescriptionText 1
+ DescriptionText2
+
+ 1005-01-01
+ 1005-01-02
+ 1005-02-01
+ 1005-02-02
+ 1006-01-01
+ 1006-01-01
+ 1006-02-01
+ 1006-02-02
+ KindOfData1
+ KindOfData2
+ Afghanistan
+ GeographicCoverageCity1
+ GeographicCoverageStateProvince1
+ GeographicCoverageOther1
+ Albania
+ GeographicCoverageCity2
+ GeographicCoverageStateProvince2
+ GeographicCoverageOther2
+
+ 10
+ 20
+ 30
+ 40
+
+
+ 80
+ 70
+ 60
+ 50
+
+ GeographicUnit1
+ GeographicUnit2
+ UnitOfAnalysis1
+ UnitOfAnalysis2
+ Universe1
+ Universe2
+
+ Notes1
+
+
+
+ TimeMethod
+ LastDataCollector1, FirstDataCollector1
+ CollectorTraining
+ Frequency
+ SamplingProcedure
+
+ TargetSampleSizeFormula
+ 100
+
+ MajorDeviationsForSampleDesign
+
+ DataSources1
+ DataSources2
+ OriginOfSources
+ CharacteristicOfSourcesNoted
+ DocumentationAndAccessToSources
+
+ CollectionMode
+ TypeOfResearchInstrument
+ CharacteristicsOfDataCollectionSituation
+ ActionsToMinimizeLosses
+ ControlOperations
+ Weighting
+ CleaningOperations
+
+
+ ResponseRate
+ EstimatesOfSamplingError
+ OtherFormsOfDataAppraisal
+
+ NotesText
+
+
+ Terms of Access
+
+ Data Access Place
+ Original Archive
+ Availability Status
+ Size of Collection
+ Study Completion
+
+
+ Confidentiality Declaration
+ Special Permissions
+ Restrictions
+ Contact for Access
+ Citation Requirements
+ Depositor Requirements
+ Conditions
+ Disclaimer
+
+
+
+ RelatedMaterial1
+ RelatedMaterial2
+ RelatedDatasets1
+ RelatedDatasets2
+
+
+
+ RelatedPublicationIDNumber1
+
+ RelatedPublicationCitation1
+
+
+
+
+
+
+ RelatedPublicationIDNumber2
+
+ RelatedPublicationCitation2
+
+
+
+ OtherReferences1
+ OtherReferences2
+
+ StudyLevelErrorNotes
+
+
diff --git a/doc/sphinx-guides/source/_static/api/subject-update-metadata.json b/doc/sphinx-guides/source/_static/api/subject-update-metadata.json
new file mode 100644
index 00000000000..ad9f15b8f8a
--- /dev/null
+++ b/doc/sphinx-guides/source/_static/api/subject-update-metadata.json
@@ -0,0 +1,11 @@
+{
+ "fields": [
+ {
+ "typeName": "subject",
+ "typeClass": "controlledVocabulary",
+ "value": ["Social Sciences"]
+ }
+ ]
+}
+
+
diff --git a/doc/sphinx-guides/source/_static/installation/files/etc/httpd/conf.d/ssl.conf b/doc/sphinx-guides/source/_static/installation/files/etc/httpd/conf.d/ssl.conf
index a6c1c7b419c..e1e11423d9d 100644
--- a/doc/sphinx-guides/source/_static/installation/files/etc/httpd/conf.d/ssl.conf
+++ b/doc/sphinx-guides/source/_static/installation/files/etc/httpd/conf.d/ssl.conf
@@ -223,10 +223,6 @@ CustomLog logs/ssl_request_log \
# Require all granted
#
-# don't pass paths used by rApache and TwoRavens to app server
-ProxyPassMatch ^/RApacheInfo$ !
-ProxyPassMatch ^/custom !
-ProxyPassMatch ^/dataexplore !
# don't pass paths used by Shibboleth to app server
ProxyPassMatch ^/Shibboleth.sso !
ProxyPassMatch ^/shibboleth-ds !
diff --git a/doc/sphinx-guides/source/_static/installation/files/etc/init.d/solr b/doc/sphinx-guides/source/_static/installation/files/etc/init.d/solr
index 16d364fc9cf..7ca04cdff3f 100755
--- a/doc/sphinx-guides/source/_static/installation/files/etc/init.d/solr
+++ b/doc/sphinx-guides/source/_static/installation/files/etc/init.d/solr
@@ -5,7 +5,7 @@
# chkconfig: 35 92 08
# description: Starts and stops Apache Solr
-SOLR_DIR="/usr/local/solr/solr-8.8.1"
+SOLR_DIR="/usr/local/solr/solr-8.11.1"
SOLR_COMMAND="bin/solr"
SOLR_ARGS="-m 1g -j jetty.host=127.0.0.1"
SOLR_USER=solr
diff --git a/doc/sphinx-guides/source/_static/installation/files/etc/systemd/solr.service b/doc/sphinx-guides/source/_static/installation/files/etc/systemd/solr.service
index 96960793938..d89ee108377 100644
--- a/doc/sphinx-guides/source/_static/installation/files/etc/systemd/solr.service
+++ b/doc/sphinx-guides/source/_static/installation/files/etc/systemd/solr.service
@@ -5,9 +5,9 @@ After = syslog.target network.target remote-fs.target nss-lookup.target
[Service]
User = solr
Type = forking
-WorkingDirectory = /usr/local/solr/solr-8.8.1
-ExecStart = /usr/local/solr/solr-8.8.1/bin/solr start -m 1g -j "jetty.host=127.0.0.1"
-ExecStop = /usr/local/solr/solr-8.8.1/bin/solr stop
+WorkingDirectory = /usr/local/solr/solr-8.11.1
+ExecStart = /usr/local/solr/solr-8.11.1/bin/solr start -m 1g -j "jetty.host=127.0.0.1"
+ExecStop = /usr/local/solr/solr-8.11.1/bin/solr stop
LimitNOFILE=65000
LimitNPROC=65000
Restart=on-failure
diff --git a/doc/sphinx-guides/source/_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.6-rpm0.x86_64.rpm b/doc/sphinx-guides/source/_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.6-rpm0.x86_64.rpm
deleted file mode 100644
index b578d63144f..00000000000
Binary files a/doc/sphinx-guides/source/_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.6-rpm0.x86_64.rpm and /dev/null differ
diff --git a/doc/sphinx-guides/source/_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.7-rpm0.x86_64.rpm b/doc/sphinx-guides/source/_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.7-rpm0.x86_64.rpm
deleted file mode 100644
index 9ca6086c86a..00000000000
Binary files a/doc/sphinx-guides/source/_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.7-rpm0.x86_64.rpm and /dev/null differ
diff --git a/doc/sphinx-guides/source/_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.9_R-3.5-RH6.x86_64.rpm b/doc/sphinx-guides/source/_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.9_R-3.5-RH6.x86_64.rpm
deleted file mode 100644
index 1743a2ce9a7..00000000000
Binary files a/doc/sphinx-guides/source/_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.9_R-3.5-RH6.x86_64.rpm and /dev/null differ
diff --git a/doc/sphinx-guides/source/_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.9_R-3.5.x86_64.rpm b/doc/sphinx-guides/source/_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.9_R-3.5.x86_64.rpm
deleted file mode 100644
index 6c4e30672eb..00000000000
Binary files a/doc/sphinx-guides/source/_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.9_R-3.5.x86_64.rpm and /dev/null differ
diff --git a/doc/sphinx-guides/source/_static/installation/files/root/external-tools/twoRavens.json b/doc/sphinx-guides/source/_static/installation/files/root/external-tools/twoRavens.json
deleted file mode 100644
index 99a40ecb02c..00000000000
--- a/doc/sphinx-guides/source/_static/installation/files/root/external-tools/twoRavens.json
+++ /dev/null
@@ -1,19 +0,0 @@
-{
- "displayName": "TwoRavens",
- "description": "A system of interlocking statistical tools for data exploration, analysis, and meta-analysis.",
- "toolName": "TwoRavens",
- "scope": "file",
- "type": "explore",
- "toolUrl": "https://tworavens.dataverse.example.edu/dataexplore/gui.html",
- "contentType": "text/tab-separated-values",
- "toolParameters": {
- "queryParameters": [
- {
- "dfId": "{fileId}"
- },
- {
- "key": "{apiToken}"
- }
- ]
- }
-}
\ No newline at end of file
diff --git a/doc/sphinx-guides/source/_static/util/default.config b/doc/sphinx-guides/source/_static/util/default.config
index 3fb3d98472c..48252caf1c9 100644
--- a/doc/sphinx-guides/source/_static/util/default.config
+++ b/doc/sphinx-guides/source/_static/util/default.config
@@ -9,7 +9,6 @@ POSTGRES_DATABASE dvndb
POSTGRES_USER dvnapp
POSTGRES_PASSWORD secret
SOLR_LOCATION localhost:8983
-TWORAVENS_LOCATION NOT INSTALLED
RSERVE_HOST localhost
RSERVE_PORT 6311
RSERVE_USER rserve
diff --git a/doc/sphinx-guides/source/_templates/navbar.html b/doc/sphinx-guides/source/_templates/navbar.html
index 08e363408f4..538cccf74d7 100644
--- a/doc/sphinx-guides/source/_templates/navbar.html
+++ b/doc/sphinx-guides/source/_templates/navbar.html
@@ -39,10 +39,11 @@
diff --git a/doc/sphinx-guides/source/admin/integrations.rst b/doc/sphinx-guides/source/admin/integrations.rst
index 5e99f139fbb..5ee6372d56d 100644
--- a/doc/sphinx-guides/source/admin/integrations.rst
+++ b/doc/sphinx-guides/source/admin/integrations.rst
@@ -11,6 +11,13 @@ Getting Data In
A variety of integrations are oriented toward making it easier for your researchers to deposit data into your Dataverse installation.
+GitHub
++++++++
+
+Dataverse integration with GitHub is implemented via a Dataverse Uploader GitHub Action. It is a reusable, composite workflow for uploading a git repository or subdirectory into a dataset on a target Dataverse installation. The action is customizable, allowing users to choose to replace a dataset, add to the dataset, publish it or leave it as a draft version on Dataverse. The action provides some metadata to the dataset, such as the origin GitHub repository, and it preserves the directory tree structure.
+
+For instructions on using Dataverse Uploader GitHub Action, visit https://github.com/marketplace/actions/dataverse-uploader-action
+
Dropbox
+++++++
@@ -50,7 +57,7 @@ their research results and retain links to imported and exported data. Users
can organize their data in "Datasets", which can be exported to a Dataverse installation via
the command-line interface (CLI).
-Renku dataset documentation: https://renku-python.readthedocs.io/en/latest/commands.html#module-renku.cli.dataset
+Renku dataset documentation: https://renku-python.readthedocs.io/en/latest/reference/commands.html#module-renku.cli.dataset
Flagship deployment of the Renku platform: https://renkulab.io
@@ -85,13 +92,6 @@ Data Explorer is a GUI which lists the variables in a tabular data file allowing
For installation instructions, see the :doc:`external-tools` section.
-TwoRavens/Zelig
-+++++++++++++++
-
-TwoRavens is a web application for tabular data exploration and statistical analysis with Zelig.
-
-For installation instructions, see the :doc:`external-tools` section.
-
Compute Button
++++++++++++++
diff --git a/doc/sphinx-guides/source/admin/metadatacustomization.rst b/doc/sphinx-guides/source/admin/metadatacustomization.rst
index c307adb56af..b7d0086e221 100644
--- a/doc/sphinx-guides/source/admin/metadatacustomization.rst
+++ b/doc/sphinx-guides/source/admin/metadatacustomization.rst
@@ -76,453 +76,310 @@ Each of the three main sections own sets of properties:
#metadataBlock properties
~~~~~~~~~~~~~~~~~~~~~~~~~
-+-----------------------+-----------------------+-----------------------+
-| **Property** | **Purpose** | **Allowed values and |
-| | | restrictions** |
-+-----------------------+-----------------------+-----------------------+
-| name | A user-definable | \• No spaces or |
-| | string used to | punctuation, |
-| | identify a | except underscore. |
-| | #metadataBlock | |
-| | | \• By convention, |
-| | | should start with |
-| | | a letter, and use |
-| | | lower camel |
-| | | case [3]_ |
-| | | |
-| | | \• Must not collide |
-| | | with a field of |
-| | | the same name in |
-| | | the same or any |
-| | | other |
-| | | #datasetField |
-| | | definition, |
-| | | including metadata |
-| | | blocks defined |
-| | | elsewhere. [4]_ |
-+-----------------------+-----------------------+-----------------------+
-| dataverseAlias | If specified, this | Free text. For an |
-| | metadata block will | example, see |
-| | be available only to | custom_hbgdki.tsv. |
-| | the Dataverse | |
-| | collection | |
-| | designated here by | |
-| | its alias and to | |
-| | children of that | |
-| | Dataverse collection. | |
-+-----------------------+-----------------------+-----------------------+
-| displayName | Acts as a brief label | Should be relatively |
-| | for display related | brief. The limit is |
-| | to this | 256 character, but |
-| | #metadataBlock. | very long names might |
-| | | cause display |
-| | | problems. |
-+-----------------------+-----------------------+-----------------------+
-| blockURI | Associates the | The citation |
-| | properties in a block | #metadataBlock has |
-| | with an external URI. | the blockURI |
-| | Properties will be | https://dataverse.org |
-| | assigned the global | /schema/citation/ |
-| | identifier | which assigns a |
-| | blockURI in the | global URI to terms |
-| | OAI_ORE metadata | such as 'https:// |
-| | and archival Bags | dataverse.org/schema/ |
-| | | citation/subtitle' |
-+-----------------------+-----------------------+-----------------------+
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| **Property** | **Purpose** | **Allowed values and restrictions** |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| name | A user-definable string used to identify a | \• No spaces or punctuation, except underscore. |
+| | #metadataBlock | |
+| | | \• By convention, should start with a letter, and use |
+| | | lower camel case [3]_ |
+| | | |
+| | | \• Must not collide with a field of the same name in |
+| | | the same or any other #datasetField definition, |
+| | | including metadata blocks defined elsewhere. [4]_ |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| dataverseAlias | If specified, this metadata block will be available | Free text. For an example, see custom_hbgdki.tsv. |
+| | only to the Dataverse collection designated here by | |
+| | its alias and to children of that Dataverse collection. | |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| displayName | Acts as a brief label for display related to this | Should be relatively brief. The limit is 256 character, |
+| | #metadataBlock. | but very long names might cause display problems. |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| blockURI | Associates the properties in a block with an external | The citation #metadataBlock has the blockURI |
+| | URI. | https://dataverse.org/schema/citation/ which assigns a |
+| | Properties will be assigned the global assigned the | global URI to terms such as |
+| | global identifier blockURI in the OAI_ORE | https://dataverse.org/schema/citation/subtitle |
+| | metadata and archival Bags | |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
#datasetField (field) properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-+-----------------------+-----------------------+------------------------+
-| **Property** | **Purpose** | **Allowed values and |
-| | | restrictions** |
-+-----------------------+-----------------------+------------------------+
-| name | A user-definable | \• (from |
-| | string used to | DatasetFieldType.java) |
-| | identify a | The internal |
-| | #datasetField. Maps | DDI-like name, no |
-| | directly to field | spaces, etc. |
-| | name used by Solr. | |
-| | | \• (from Solr) Field |
-| | | names should |
-| | | consist of |
-| | | alphanumeric or |
-| | | underscore |
-| | | characters only |
-| | | and not start with |
-| | | a digit. This is |
-| | | not currently |
-| | | strictly enforced, |
-| | | but other field |
-| | | names will not |
-| | | have first class |
-| | | support from all |
-| | | components and |
-| | | back compatibility |
-| | | is not guaranteed. |
-| | | Names with both |
-| | | leading and |
-| | | trailing |
-| | | underscores (e.g. |
-| | | \_version_) are |
-| | | reserved. |
-| | | |
-| | | \• Must not collide |
-| | | with a field of |
-| | | the same same name |
-| | | in another |
-| | | #metadataBlock |
-| | | definition or any |
-| | | name already |
-| | | included as a |
-| | | field in the Solr |
-| | | index. |
-+-----------------------+-----------------------+------------------------+
-| title | Acts as a brief label | Should be relatively |
-| | for display related | brief. |
-| | to this | |
-| | #datasetField. | |
-+-----------------------+-----------------------+------------------------+
-| description | Used to provide a | Free text |
-| | description of the | |
-| | field. | |
-+-----------------------+-----------------------+------------------------+
-| watermark | A string to initially | Free text |
-| | display in a field as | |
-| | a prompt for what the | |
-| | user should enter. | |
-+-----------------------+-----------------------+------------------------+
-| fieldType | Defines the type of | | \• none |
-| | content that the | | \• date |
-| | field, if not empty, | | \• email |
-| | is meant to contain. | | \• text |
-| | | | \• textbox |
-| | | | \• url |
-| | | | \• int |
-| | | | \• float |
-| | | | \• See below for |
-| | | | fieldtype definitions|
-+-----------------------+-----------------------+------------------------+
-| displayOrder | Controls the sequence | Non-negative integer. |
-| | in which the fields | |
-| | are displayed, both | |
-| | for input and | |
-| | presentation. | |
-+-----------------------+-----------------------+------------------------+
-| displayFormat | Controls how the | See below for |
-| | content is displayed | displayFormat |
-| | for presentation (not | variables |
-| | entry). The value of | |
-| | this field may | |
-| | contain one or more | |
-| | special variables | |
-| | (enumerated below). | |
-| | HTML tags, likely in | |
-| | conjunction with one | |
-| | or more of these | |
-| | values, may be used | |
-| | to control the | |
-| | display of content in | |
-| | the web UI. | |
-+-----------------------+-----------------------+------------------------+
-| advancedSearchField | Specify whether this | TRUE (available) or |
-| | field is available in | FALSE (not available) |
-| | advanced search. | |
-+-----------------------+-----------------------+------------------------+
-| allowControlledVocabu\| Specify whether the | TRUE (controlled) or |
-| \lary | possible values of | FALSE (not |
-| | this field are | controlled) |
-| | determined by values | |
-| | in the | |
-| | #controlledVocabulary | |
-| | section. | |
-+-----------------------+-----------------------+------------------------+
-| allowmultiples | Specify whether this | TRUE (repeatable) or |
-| | field is repeatable. | FALSE (not |
-| | | repeatable) |
-+-----------------------+-----------------------+------------------------+
-| facetable | Specify whether the | TRUE (controlled) or |
-| | field is facetable | FALSE (not |
-| | (i.e., if the | controlled) |
-| | expected values for | |
-| | this field are | |
-| | themselves useful | |
-| | search terms for this | |
-| | field). If a field is | |
-| | "facetable" (able to | |
-| | be faceted on), it | |
-| | appears under | |
-| | "Browse/Search | |
-| | Facets" when you edit | |
-| | "General Information" | |
-| | for a Dataverse | |
-| | collection. | |
-| | Setting this value to | |
-| | TRUE generally makes | |
-| | sense for enumerated | |
-| | or controlled | |
-| | vocabulary fields, | |
-| | fields representing | |
-| | identifiers (IDs, | |
-| | names, email | |
-| | addresses), and other | |
-| | fields that are | |
-| | likely to share | |
-| | values across | |
-| | entries. It is less | |
-| | likely to make sense | |
-| | for fields containing | |
-| | descriptions, | |
-| | floating point | |
-| | numbers, and other | |
-| | values that are | |
-| | likely to be unique. | |
-+-----------------------+-----------------------+------------------------+
-| displayoncreate [5]_ | Designate fields that | TRUE (display during |
-| | should display during | creation) or FALSE |
-| | the creation of a new | (don’t display during |
-| | dataset, even before | creation) |
-| | the dataset is saved. | |
-| | Fields not so | |
-| | designated will not | |
-| | be displayed until | |
-| | the dataset has been | |
-| | saved. | |
-+-----------------------+-----------------------+------------------------+
-| required | For primitive | For primitive |
-| | fields, specify | fields, TRUE |
-| | whether or not the | (required) or FALSE |
-| | field is required. | (optional). |
-| | For compound | |
-| | fields, also | For compound fields: |
-| | specify if one or | |
-| | more subfields are | \• To make one or more |
-| | required or | subfields optional, |
-| | conditionally | the parent field and |
-| | required. At least | subfield(s) must be |
-| | one instance of a | FALSE (optional). |
-| | required field must | |
-| | be present. More | \• To make one or more |
-| | than one instance | subfields required, |
-| | of a field may be | the parent field and |
-| | allowed, depending | the required |
-| | on the value of | subfield(s) must be |
-| | allowmultiples. | TRUE (required). |
-| | | |
-| | | \• To make one or more |
-| | | subfields |
-| | | conditionally |
-| | | required, make the |
-| | | parent field FALSE |
-| | | (optional) and make |
-| | | TRUE (required) any |
-| | | subfield or subfields |
-| | | that are required if |
-| | | any other subfields |
-| | | are filled. |
-+-----------------------+-----------------------+------------------------+
-| parent | For subfields, | \• Must not result in |
-| | specify the name of | a cyclical |
-| | the parent or | reference. |
-| | containing field. | |
-| | | \• Must reference an |
-| | | existing field in |
-| | | the same |
-| | | #metadataBlock. |
-+-----------------------+-----------------------+------------------------+
-| metadatablock_id | Specify the name of | \• Must reference an |
-| | the #metadataBlock | existing |
-| | that contains this | #metadataBlock. |
-| | field. | |
-| | | \• As a best |
-| | | practice, the |
-| | | value should |
-| | | reference the |
-| | | #metadataBlock in |
-| | | the current |
-| | | definition |
-| | | (it is technically |
-| | | possible to |
-| | | reference another |
-| | | existing metadata |
-| | | block.) |
-+-----------------------+-----------------------+------------------------+
-| termURI | Specify a global URI | For example, the |
-| | identifying this term | existing citation |
-| | in an external | #metadataBlock |
-| | community vocabulary. | defines the property |
-| | | names 'title' |
-| | This value overrides | as http://purl.org/dc/ |
-| | the default created | terms/title - i.e. |
-| | by appending the | indicating that it can |
-| | property name to the | be interpreted as the |
-| | blockURI defined | Dublin Core term |
-| | for the | 'title' |
-| | #metadataBlock | |
-+-----------------------+-----------------------+------------------------+
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| **Property** | **Purpose** | **Allowed values and restrictions** |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| name | A user-definable string used to identify a | \• (from DatasetFieldType.java) The internal DDI-like |
+| | #datasetField. Maps directly to field name used by | name, no spaces, etc. |
+| | Solr. | |
+| | | \• (from Solr) Field names should consist of |
+| | | alphanumeric or underscore characters only and not start|
+| | | with a digit. This is not currently strictly enforced, |
+| | | but other field names will not have first class |
+| | | support from all components and back compatibility |
+| | | is not guaranteed. |
+| | | Names with both leading and trailing underscores |
+| | | (e.g. \_version_) are reserved. |
+| | | |
+| | | \• Must not collide with a field of |
+| | | the same same name in another #metadataBlock |
+| | | definition or any name already included as a |
+| | | field in the Solr index. |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| title | Acts as a brief label for display | Should be relatively brief. |
+| | related to this #datasetField. | |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| description | Used to provide a description of the | Free text |
+| | field. | |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| watermark | A string to initially display in a field | Free text |
+| | as a prompt for what the user should enter. | |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| fieldType | Defines the type of content that the | | \• none |
+| | field, if not empty, is meant to contain. | | \• date |
+| | | | \• email |
+| | | | \• text |
+| | | | \• textbox |
+| | | | \• url |
+| | | | \• int |
+| | | | \• float |
+| | | | \• See below for |
+| | | | fieldtype definitions |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| displayOrder | Controls the sequence in which the fields | Non-negative integer. |
+| | are displayed, both for input and | |
+| | presentation. | |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| displayFormat | Controls how the content is displayed | See below for displayFormat |
+| | for presentation (not entry). The value of | variables |
+| | this field may contain one or more | |
+| | special variables (enumerated below). | |
+| | HTML tags, likely in conjunction with one | |
+| | or more of these values, may be used | |
+| | to control the display of content in | |
+| | the web UI. | |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| advancedSearchField | Specify whether this field is available in | TRUE (available) or |
+| | advanced search. | FALSE (not available) |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| allowControlledVocabulary | Specify whether the possible values of | TRUE (controlled) or FALSE (not |
+| | this field are determined by values | controlled) |
+| | in the #controlledVocabulary section. | |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| allowmultiples | Specify whether this field is repeatable. | TRUE (repeatable) or FALSE (not |
+| | | repeatable) |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| facetable | Specify whether the field is facetable | TRUE (controlled) or FALSE (not |
+| | (i.e., if the expected values for | controlled) |
+| | this field are themselves useful | |
+| | search terms for this field). If a field is | |
+| | "facetable" (able to be faceted on), it | |
+| | appears under "Browse/Search | |
+| | Facets" when you edit | |
+| | "General Information" for a Dataverse | |
+| | collection. | |
+| | Setting this value to TRUE generally makes | |
+| | sense for enumerated or controlled | |
+| | vocabulary fields, fields representing | |
+| | identifiers (IDs, names, email | |
+| | addresses), and other fields that are | |
+| | likely to share values across | |
+| | entries. It is less likely to make sense | |
+| | for fields containing descriptions, | |
+| | floating point numbers, and other | |
+| | values that are likely to be unique. | |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| displayoncreate [5]_ | Designate fields that should display during | TRUE (display during creation) or FALSE |
+| | the creation of a new dataset, even before | (don’t display during creation) |
+| | the dataset is saved. | |
+| | Fields not so designated will not | |
+| | be displayed until the dataset has been | |
+| | saved. | |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| required | For primitive fields, specify whether or not the | For primitive fields, TRUE |
+| | field is required. | (required) or FALSE (optional). |
+| | | |
+| | For compound fields, also specify if one or more | For compound fields: |
+| | subfields are required or conditionally required. At | |
+| | least one instance of a required field must be | \• To make one or more |
+| | present. More than one instance of a field may be | subfields optional, the parent |
+| | allowed, depending on the value of allowmultiples. | field and subfield(s) must be |
+| | | FALSE (optional). |
+| | | |
+| | | \• To make one or more subfields |
+| | | required, the parent field and |
+| | | the required subfield(s) must be |
+| | | TRUE (required). |
+| | | |
+| | | \• To make one or more subfields |
+| | | conditionally required, make the |
+| | | parent field FALSE (optional) |
+| | | and make TRUE (required) any |
+| | | subfield or subfields that are |
+| | | required if any other subfields |
+| | | are filled. |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| parent | For subfields, specify the name of the parent or | \• Must not result in a cyclical reference. |
+| | containing field. | |
+| | | \• Must reference an existing field in the same |
+| | | #metadataBlock. |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| metadatablock_id | Specify the name of the #metadataBlock that contains | \• Must reference an existing #metadataBlock. |
+| | this field. | |
+| | | \• As a best practice, the value should reference the |
+| | | #metadataBlock in the current |
+| | | definition (it is technically |
+| | | possible to reference another |
+| | | existing metadata block.) |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| termURI | Specify a global URI identifying this term in an | For example, the existing citation |
+| | external community vocabulary. | #metadataBlock defines the property |
+| | | names 'title' as http://purl.org/dc/terms/title |
+| | This value overrides the default created by appending | - i.e. indicating that it can |
+| | the property name to the blockURI defined for the | be interpreted as the Dublin Core term 'title' |
+| | #metadataBlock | |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
#controlledVocabulary (enumerated) properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-+-----------------------+-----------------------+-----------------------+
-| **Property** | **Purpose** | **Allowed values and |
-| | | restrictions** |
-+-----------------------+-----------------------+-----------------------+
-| DatasetField | Specifies the | Must reference an |
-| | #datasetField to which| existing |
-| | this entry applies. | #datasetField. |
-| | | As a best practice, |
-| | | the value should |
-| | | reference a |
-| | | #datasetField in the |
-| | | current metadata |
-| | | block definition. (It |
-| | | is technically |
-| | | possible to reference |
-| | | an existing |
-| | | #datasetField from |
-| | | another metadata |
-| | | block.) |
-+-----------------------+-----------------------+-----------------------+
-| Value | A short display | Free text |
-| | string, representing | |
-| | an enumerated value | |
-| | for this field. If | |
-| | the identifier | |
-| | property is empty, | |
-| | this value is used as | |
-| | the identifier. | |
-+-----------------------+-----------------------+-----------------------+
-| identifier | A string used to | Free text |
-| | encode the selected | |
-| | enumerated value of a | |
-| | field. If this | |
-| | property is empty, | |
-| | the value of the | |
-| | “Value” field is used | |
-| | as the identifier. | |
-+-----------------------+-----------------------+-----------------------+
-| displayOrder | Control the order in | Non-negative integer. |
-| | which the enumerated | |
-| | values are displayed | |
-| | for selection. | |
-+-----------------------+-----------------------+-----------------------+
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| **Property** | **Purpose** | **Allowed values and restrictions** |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| DatasetField | Specifies the #datasetField to which | Must reference an existing |
+| | #datasetField to which this entry applies. | #datasetField. |
+| | | As a best practice, the value should |
+| | | reference a #datasetField in the |
+| | | current metadata block definition. (It |
+| | | is technically possible to reference |
+| | | an existing #datasetField from |
+| | | another metadata block.) |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| Value | A short display string, representing | Free text |
+| | an enumerated value for this field. If | |
+| | the identifier property is empty, | |
+| | this value is used as the identifier. | |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| identifier | A string used to encode the selected | Free text |
+| | enumerated value of a field. If this | |
+| | property is empty, the value of the | |
+| | “Value” field is used as the identifier. | |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
+| displayOrder | Control the order in which the enumerated | Non-negative integer. |
+| | values are displayed for selection. | |
++---------------------------------------------------------+---------------------------------------------------------+---------------------------------------------------------+
FieldType definitions
~~~~~~~~~~~~~~~~~~~~~
-+-----------------------------------+-----------------------------------+
-| **Fieldtype** | **Definition** |
-+-----------------------------------+-----------------------------------+
-| none | Used for compound fields, in which|
-| | case the parent field would have |
-| | no value and display no data |
-| | entry control. |
-+-----------------------------------+-----------------------------------+
-| date | A date, expressed in one of three |
-| | resolutions of the form |
-| | YYYY-MM-DD, YYYY-MM, or YYYY. |
-+-----------------------------------+-----------------------------------+
-| email | A valid email address. Not |
-| | indexed for privacy reasons. |
-+-----------------------------------+-----------------------------------+
-| text | Any text other than newlines may |
-| | be entered into this field. |
-+-----------------------------------+-----------------------------------+
-| textbox | Any text may be entered. For |
-| | input, the Dataverse Software |
-| | presents a |
-| | multi-line area that accepts |
-| | newlines. While any HTML is |
-| | permitted, only a subset of HTML |
-| | tags will be rendered in the UI. |
-| | See the |
-| | :ref:`supported-html-fields` |
-| | section of the Dataset + File |
-| | Management page in the User Guide.|
-+-----------------------------------+-----------------------------------+
-| url | If not empty, field must contain |
-| | a valid URL. |
-+-----------------------------------+-----------------------------------+
-| int | An integer value destined for a |
-| | numeric field. |
-+-----------------------------------+-----------------------------------+
-| float | A floating point number destined |
-| | for a numeric field. |
-+-----------------------------------+-----------------------------------+
++---------------------------------------------------------+---------------------------------------------------------+
+| **Fieldtype** | **Definition** |
++---------------------------------------------------------+---------------------------------------------------------+
+| none | Used for compound fields, in which |
+| | case the parent field would have |
+| | no value and display no data |
+| | entry control. |
++---------------------------------------------------------+---------------------------------------------------------+
+| date | A date, expressed in one of three |
+| | resolutions of the form |
+| | YYYY-MM-DD, YYYY-MM, or YYYY. |
++---------------------------------------------------------+---------------------------------------------------------+
+| email | A valid email address. Not |
+| | indexed for privacy reasons. |
++---------------------------------------------------------+---------------------------------------------------------+
+| text | Any text other than newlines may |
+| | be entered into this field. |
++---------------------------------------------------------+---------------------------------------------------------+
+| textbox | Any text may be entered. For |
+| | input, the Dataverse Software |
+| | presents a |
+| | multi-line area that accepts |
+| | newlines. While any HTML is |
+| | permitted, only a subset of HTML |
+| | tags will be rendered in the UI. |
+| | See the |
+| | :ref:`supported-html-fields` |
+| | section of the Dataset + File |
+| | Management page in the User Guide. |
++---------------------------------------------------------+---------------------------------------------------------+
+| url | If not empty, field must contain |
+| | a valid URL. |
++---------------------------------------------------------+---------------------------------------------------------+
+| int | An integer value destined for a |
+| | numeric field. |
++---------------------------------------------------------+---------------------------------------------------------+
+| float | A floating point number destined |
+| | for a numeric field. |
++---------------------------------------------------------+---------------------------------------------------------+
displayFormat variables
~~~~~~~~~~~~~~~~~~~~~~~
These are common ways to use the displayFormat to control how values are displayed in the UI. This list is not exhaustive.
-+-----------------------------------+-----------------------------------+
-| **Variable** | **Description** |
-+-----------------------------------+-----------------------------------+
-| (blank) | The displayFormat is left blank |
-| | for primitive fields (e.g. |
-| | subtitle) and fields that do not |
-| | take values (e.g. author), since |
-| | displayFormats do not work for |
-| | these fields. |
-+-----------------------------------+-----------------------------------+
-| #VALUE | The value of the field (instance |
-| | level). |
-+-----------------------------------+-----------------------------------+
-| #NAME | The name of the field (class |
-| | level). |
-+-----------------------------------+-----------------------------------+
-| #EMAIL | For displaying emails. |
-+-----------------------------------+-----------------------------------+
-| #VALUE | For displaying the value as a |
-| | link (if the value entered is a |
-| | link). |
-+-----------------------------------+-----------------------------------+
-| #VALUE | For displaying the value as a |
-| | link, with the value included in |
-| | the URL (e.g. if URL is |
-| | \http://emsearch.rutgers.edu/atla\|
-| | \s/#VALUE_summary.html, |
-| | and the value entered is 1001, |
-| | the field is displayed as |
-| | `1001 `__ |
-| | (hyperlinked to |
-| | \http://emsearch.rutgers.edu/atlas|
-| | /1001_summary.html)). |
-+-----------------------------------+-----------------------------------+
-| | entered image URL (used to |
-| | display images in the producer |
-| | and distributor logos metadata |
-| | fields). |
-+-----------------------------------+-----------------------------------+
-| #VALUE: | Appends and/or prepends |
-| | characters to the value of the |
-| \- #VALUE: | field. e.g. if the displayFormat |
-| | for the distributorAffiliation is |
-| (#VALUE) | (#VALUE) (wrapped with parens) |
-| | and the value entered |
-| | is University of North |
-| | Carolina, the field is displayed |
-| | in the UI as (University of |
-| | North Carolina). |
-+-----------------------------------+-----------------------------------+
-| ; | Displays the character (e.g. |
-| | semicolon, comma) between the |
-| : | values of fields within |
-| | compound fields. For example, |
-| , | if the displayFormat for the |
-| | compound field “series” is a |
-| | colon, and if the value |
-| | entered for seriesName is |
-| | IMPs and for |
-| | seriesInformation is A |
-| | collection of NMR data, the |
-| | compound field is displayed in |
-| | the UI as IMPs: A |
-| | collection of NMR data. |
-+-----------------------------------+-----------------------------------+
++---------------------------------------------------------+---------------------------------------------------------+
+| **Variable** | **Description** |
++---------------------------------------------------------+---------------------------------------------------------+
+| (blank) | The displayFormat is left blank |
+| | for primitive fields (e.g. |
+| | subtitle) and fields that do not |
+| | take values (e.g. author), since |
+| | displayFormats do not work for |
+| | these fields. |
++---------------------------------------------------------+---------------------------------------------------------+
+| #VALUE | The value of the field (instance level). |
++---------------------------------------------------------+---------------------------------------------------------+
+| #NAME | The name of the field (class level). |
++---------------------------------------------------------+---------------------------------------------------------+
+| #EMAIL | For displaying emails. |
++---------------------------------------------------------+---------------------------------------------------------+
+| #VALUE | For displaying the value as a |
+| | link (if the value entered is a |
+| | link). |
++---------------------------------------------------------+---------------------------------------------------------+
+| #VALUE | For displaying the value as a |
+| | link, with the value included in |
+| | the URL (e.g. if URL is |
+| | \http://emsearch.rutgers.edu/atla\ |
+| | \s/#VALUE_summary.html, |
+| | and the value entered is 1001, |
+| | the field is displayed as |
+| | `1001 `__ |
+| | (hyperlinked to |
+| | http://emsearch.rutgers.edu/atlas/1001_summary.html)). |
++---------------------------------------------------------+---------------------------------------------------------+
+| | entered image URL (used to |
+| | display images in the producer |
+| | and distributor logos metadata |
+| | fields). |
++---------------------------------------------------------+---------------------------------------------------------+
+| #VALUE: | Appends and/or prepends |
+| | characters to the value of the |
+| \- #VALUE: | field. e.g. if the displayFormat |
+| | for the distributorAffiliation is |
+| (#VALUE) | (#VALUE) (wrapped with parens) |
+| | and the value entered |
+| | is University of North |
+| | Carolina, the field is displayed |
+| | in the UI as (University of |
+| | North Carolina). |
++---------------------------------------------------------+---------------------------------------------------------+
+| ; | Displays the character (e.g. |
+| | semicolon, comma) between the |
+| : | values of fields within |
+| | compound fields. For example, |
+| , | if the displayFormat for the |
+| | compound field “series” is a |
+| | colon, and if the value |
+| | entered for seriesName is |
+| | IMPs and for |
+| | seriesInformation is A |
+| | collection of NMR data, the |
+| | compound field is displayed in |
+| | the UI as IMPs: A |
+| | collection of NMR data. |
++---------------------------------------------------------+---------------------------------------------------------+
Metadata Block Setup
--------------------
@@ -559,7 +416,7 @@ Editing TSV files
Early in Dataverse Software 4.0 development, metadata blocks were edited in the Google spreadsheet mentioned above and then exported in TSV format. This worked fine when there was only one person editing the Google spreadsheet but now that contributions are coming in from all over, the TSV files are edited directly. We are somewhat painfully aware that another format such as XML might make more sense these days. Please see https://github.com/IQSS/dataverse/issues/4451 for a discussion of non-TSV formats.
-Please note that metadata fields share a common namespace so they must be unique. The following curl command will print list of metadata fields already available in the system:
+Please note that metadata fields share a common namespace so they must be unique. The following curl command will print the list of metadata fields already available in the system:
``curl http://localhost:8080/api/admin/index/solr/schema``
@@ -570,10 +427,10 @@ Loading TSV files into a Dataverse Installation
A number of TSV files are loaded into a newly-installed Dataverse installation, becoming the metadata blocks you see in the UI. For the list of metadata blocks that are included with the Dataverse Software out of the box, see the :doc:`/user/appendix` section of the User Guide.
-Along with TSV file, there are corresponding ResourceBundle property files with key=value pair `here `__. To add other language files, see the :doc:`/installation/config` for dataverse.lang.directory JVM Options section, and add a file, for example: "citation_lang.properties" to the path you specified for the ``dataverse.lang.directory`` JVM option, and then restart the app server.
+Along with TSV file, there are corresponding ResourceBundle property files with key=value pair `here `__. To add other language files, see the :doc:`/installation/config` for dataverse.lang.directory JVM Options section, and add a file, for example: "citation_lang.properties" to the path you specified for the ``dataverse.lang.directory`` JVM option, and then restart the app server.
If you are improving an existing metadata block, the Dataverse Software installation process will load the TSV for you, assuming you edited the TSV file in place. The TSV file for the Citation metadata block, for example, can be found at ``scripts/api/data/metadatablocks/citation.tsv``.
-If any of the below mentioned property values are changed, corresponsing ResourceBundle property file has to be edited and stored under ``dataverse.lang.directory`` location
+If any of the below mentioned property values are changed, corresponding ResourceBundle property file has to be edited and stored under ``dataverse.lang.directory`` location
- name, displayName property under #metadataBlock
- name, title, description, watermark properties under #datasetfield
@@ -644,7 +501,7 @@ the Solr schema configuration, including any enabled metadata schemas:
``curl "http://localhost:8080/api/admin/index/solr/schema"``
-You can use :download:`update-fields.sh <../../../../conf/solr/8.8.1/update-fields.sh>` to easily add these to the
+You can use :download:`update-fields.sh <../../../../conf/solr/8.11.1/update-fields.sh>` to easily add these to the
Solr schema you installed for your Dataverse installation.
The script needs a target XML file containing your Solr schema. (See the :doc:`/installation/prerequisites/` section of
@@ -668,14 +525,14 @@ from some place else than your Dataverse installation).
Please note that reconfigurations of your Solr index might require a re-index. Usually release notes indicate
a necessary re-index, but for your custom metadata you will need to keep track on your own.
-Please note also that if you are going to make a pull request updating ``conf/solr/8.8.1/schema.xml`` with fields you have
+Please note also that if you are going to make a pull request updating ``conf/solr/8.11.1/schema.xml`` with fields you have
added, you should first load all the custom metadata blocks in ``scripts/api/data/metadatablocks`` (including ones you
don't care about) to create a complete list of fields. (This might change in the future.)
Reloading a Metadata Block
--------------------------
-As mentioned above, changes to metadata blocks that ship with the Dataverse Software will be made over time to improve them and release notes will sometimes instruct you to reload an existing metadata block. The syntax for reloading is the same as reloading. Here's an example with the "citation" metadata block:
+As mentioned above, changes to metadata blocks that ship with the Dataverse Software will be made over time to improve them and release notes will sometimes instruct you to reload an existing metadata block. The syntax for reloading is the same as loading. Here's an example with the "citation" metadata block:
``curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv``
diff --git a/doc/sphinx-guides/source/admin/metadataexport.rst b/doc/sphinx-guides/source/admin/metadataexport.rst
index 55b1ee3617c..c9518b465fc 100644
--- a/doc/sphinx-guides/source/admin/metadataexport.rst
+++ b/doc/sphinx-guides/source/admin/metadataexport.rst
@@ -23,6 +23,8 @@ In addition to the automated exports, a Dataverse installation admin can start a
The former will attempt to export all the published, local (non-harvested) datasets that haven't been exported yet.
The latter will *force* a re-export of every published, local dataset, regardless of whether it has already been exported or not.
+These calls return a status message informing the administrator, that the process has been launched (``{"status":"WORKFLOW_IN_PROGRESS"}``). The administrator can check the progress of the process via log files: ``[Payara directory]/glassfish/domains/domain1/logs/export_[time stamp].log``.
+
Note, that creating, modifying, or re-exporting an OAI set will also attempt to export all the unexported datasets found in the set.
Export Failures
diff --git a/doc/sphinx-guides/source/admin/monitoring.rst b/doc/sphinx-guides/source/admin/monitoring.rst
index 9a21d03ed20..a4affda1302 100644
--- a/doc/sphinx-guides/source/admin/monitoring.rst
+++ b/doc/sphinx-guides/source/admin/monitoring.rst
@@ -103,6 +103,11 @@ actionlogrecord
There is a database table called ``actionlogrecord`` that captures events that may be of interest. See https://github.com/IQSS/dataverse/issues/2729 for more discussion around this table.
+An Important Note about ActionLogRecord Table:
+++++++++++++++++++++++++++++++++++++++++++++++
+
+Please note that in a busy production installation this table will be growing constantly. See the note on :ref:`How to Keep ActionLogRecord in Trim ` in the Troubleshooting section of the guide.
+
.. _edit-draft-versions-logging:
Edit Draft Versions Logging
diff --git a/doc/sphinx-guides/source/admin/troubleshooting.rst b/doc/sphinx-guides/source/admin/troubleshooting.rst
index d198aaeb253..ce5ecf866a0 100644
--- a/doc/sphinx-guides/source/admin/troubleshooting.rst
+++ b/doc/sphinx-guides/source/admin/troubleshooting.rst
@@ -148,6 +148,27 @@ Many Files with a File Type of "Unknown", "Application", or "Binary"
From the home page of a Dataverse installation you can get a count of files by file type by clicking "Files" and then scrolling down to "File Type". If you see a lot of files that are "Unknown", "Application", or "Binary" you can have the Dataverse installation attempt to redetect the file type by using the :ref:`Redetect File Type ` API endpoint.
+.. _actionlogrecord-trimming:
+
+What's with this Table "ActionLogRecord" in Our Database, It Seems to be Growing Uncontrollably?
+------------------------------------------------------------------------------------------------
+
+An entry is created in ActionLogRecord table every time an application command is executed (to be precise, certain non-command actions, such as logins are recorded there as well). This is very useful for investigating problems or usage patterns. However, please note that there is no builtin mechanism in the Application for trimming this table, so it will continue growing as your Dataverse installation is kept in operation. For example, multiple entries in this table are created every time a guest user views the page of a published dataset. Many more are created when an author is actively working on a dataset, making edits, adding new files, etc. On a busy installation this table is likely to grow at a faster rate than the actual data holdings. For example, after five years of production use at Harvard IQSS, the raw size of ActionLogRecord appeared to exceed the combined size of the rest of the database (!). It's worth pointing out that the sheer size of this one table does not by itself result in performance issues in any linear way. But it may still be undesirable to keep that much extra data around; especially since for most installations these records are unlikely to have much value past a certain number of months or years. Some installations may be purchasing their database services from cloud computing providers (RDS, etc.) where extra data may result in higher costs.
+Here at Harvard we chose to periodically trim the table manually, deleting all the entries older than 2 years. We recommend that you check on the size of this table in your database, and choose whether, and how often you want to trim it. You will also need to decide whether you want to archive these older records outside the database before deleting them. If you see no reason to keep them around, older records can be erased with a simple query. For example, to delete everything before the year 2021:
+
+``DELETE FROM ACTIONLOGRECORD WHERE starttime < '2021-01-01 00:00:00';``
+
+If you want to preserve these old entries before deleting them, you can save them with, for example, psql:
+
+``psql -d -t -c "SELECT * FROM actionlogrecord WHERE starttime < '2021-01-01 00:00:00' ORDER BY starttime;"``
+
+A full backup of the table can be made with pg_dump, for example:
+
+``pg_dump --table=actionlogrecord --data-only > /tmp/actionlogrecord_backup.sql``
+
+(In the example above, the output will be saved in raw SQL format. It is portable and human-readable, but uses a lot of space. It does, however, compress very well. Add the ``-Fc`` option to save the output in a proprietary, binary format that's already compressed).
+
+
Getting Help
------------
diff --git a/doc/sphinx-guides/source/api/apps.rst b/doc/sphinx-guides/source/api/apps.rst
index df56880ac70..75853d3b2f8 100755
--- a/doc/sphinx-guides/source/api/apps.rst
+++ b/doc/sphinx-guides/source/api/apps.rst
@@ -32,13 +32,6 @@ File Previewers are tools that display the content of files - including audio, h
https://github.com/GlobalDataverseCommunityConsortium/dataverse-previewers
-TwoRavens
-~~~~~~~~~
-
-TwoRavens is a system of interlocking statistical tools for data exploration, analysis, and meta-analysis.
-
-https://github.com/IQSS/TwoRavens
-
Python
------
diff --git a/doc/sphinx-guides/source/api/dataaccess.rst b/doc/sphinx-guides/source/api/dataaccess.rst
index c22b1d8c442..e76ea167587 100755
--- a/doc/sphinx-guides/source/api/dataaccess.rst
+++ b/doc/sphinx-guides/source/api/dataaccess.rst
@@ -321,7 +321,7 @@ Preprocessed Data
``/api/access/datafile/$id?format=prep``
-This method provides the "preprocessed data" - a summary record that describes the values of the data vectors in the tabular file, in JSON. These metadata values are used by TwoRavens, an external tool that integrates with a Dataverse installation. Please note that this format might change in the future.
+This method provides the "preprocessed data" - a summary record that describes the values of the data vectors in the tabular file, in JSON. These metadata values are used by earlier versions of Data Explorer, an external tool that integrates with a Dataverse installation (see :doc:`/admin/external-tools`). Please note that this format might change in the future.
Authentication and Authorization
--------------------------------
diff --git a/doc/sphinx-guides/source/api/metrics.rst b/doc/sphinx-guides/source/api/metrics.rst
index d4841803804..6a878d73a98 100755
--- a/doc/sphinx-guides/source/api/metrics.rst
+++ b/doc/sphinx-guides/source/api/metrics.rst
@@ -138,7 +138,7 @@ The following table lists the available metrics endpoints (not including the Mak
/api/info/metrics/datasets/monthly,"date, count","json, csv",collection subtree,"released, choice of all, local or remote (harvested)",y,monthly cumulative timeseries from first date of first entry to now,released means only currently released dataset versions (not unpublished or DEACCESSIONED versions)
/api/info/metrics/datasets/pastDays/{n},count,json,collection subtree,"released, choice of all, local or remote (harvested)",y,aggregate count for past n days,
/api/info/metrics/datasets/bySubject,"subject, count","json, csv",collection subtree,"released, choice of all, local or remote (harvested)",y,total count per subject,
- /api/info/metrics/datasets/bySubjecttoMonth/{yyyy-MM},"subject, count","json, csv",collection subtree,"released, choice of all, local or remote (harvested)",y,cumulative cont per subject up to month specified,
+ /api/info/metrics/datasets/bySubject/toMonth/{yyyy-MM},"subject, count","json, csv",collection subtree,"released, choice of all, local or remote (harvested)",y,cumulative cont per subject up to month specified,
/api/info/metrics/files,count,json,collection subtree,in released datasets,y,as of now/total,
/api/info/metrics/files/toMonth/{yyyy-MM},count,json,collection subtree,in released datasets,y,cumulative up to month specified,
/api/info/metrics/files/monthly,"date, count","json, csv",collection subtree,in released datasets,y,monthly cumulative timeseries from first date of first entry to now,date is the month when the first version containing the file was released (or created for harvested versions)
diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst
index 8ec1b4a7ab3..31b6d777526 100644
--- a/doc/sphinx-guides/source/api/native-api.rst
+++ b/doc/sphinx-guides/source/api/native-api.rst
@@ -552,7 +552,9 @@ The optional ``pid`` parameter holds a persistent identifier (such as a DOI or H
The optional ``release`` parameter tells the Dataverse installation to immediately publish the dataset. If the parameter is changed to ``no``, the imported dataset will remain in ``DRAFT`` status.
-The file is a DDI xml file.
+The file is a DDI XML file. A sample DDI XML file may be downloaded here: :download:`ddi_dataset.xml <../_static/api/ddi_dataset.xml>`
+
+Note that DDI XML does not have a field that corresponds to the "Subject" field in Dataverse. Therefore the "Import DDI" API endpoint populates the "Subject" field with ``N/A``. To update the "Subject" field one will need to call the :ref:`edit-dataset-metadata-api` API with a JSON file that contains an update to "Subject" such as :download:`subject-update-metadata.json <../_static/api/subject-update-metadata.json>`. Alternatively, the web interface can be used to add a subject.
.. warning::
@@ -584,9 +586,13 @@ The fully expanded example above (without environment variables) looks like this
You should expect a 200 ("OK") response and JSON output.
+.. _download-guestbook-api:
+
Retrieve Guestbook Responses for a Dataverse Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+For more about guestbooks, see :ref:`dataset-guestbooks` in the User Guide.
+
In order to retrieve the Guestbook Responses for a Dataverse collection, you must know either its "alias" (which the GUI calls an "identifier") or its database ID. If the Dataverse collection has more than one guestbook you may provide the id of a single guestbook as an optional parameter. If no guestbook id is provided the results returned will be the same as pressing the "Download All Responses" button on the Manage Dataset Guestbook page. If the guestbook id is provided then only those responses from that guestbook will be included. The FILENAME parameter is optional, and if it is not included, the responses will be displayed in the console.
.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of ``export`` below.
@@ -709,8 +715,10 @@ It returns a list of versions with their metadata, and file list:
"lastUpdateTime": "2015-04-20T09:58:35Z",
"releaseTime": "2015-04-20T09:58:35Z",
"createTime": "2015-04-20T09:57:32Z",
- "license": "CC0",
- "termsOfUse": "CC0 Waiver",
+ "license": {
+ "name": "CC0 1.0",
+ "uri": "http://creativecommons.org/publicdomain/zero/1.0"
+ },
"termsOfAccess": "You need to request for access.",
"fileAccessRequest": true,
"metadataBlocks": {...},
@@ -728,8 +736,10 @@ It returns a list of versions with their metadata, and file list:
"lastUpdateTime": "2015-04-20T09:56:34Z",
"releaseTime": "2015-04-20T09:56:34Z",
"createTime": "2015-04-20T09:43:45Z",
- "license": "CC0",
- "termsOfUse": "CC0 Waiver",
+ "license": {
+ "name": "CC0 1.0",
+ "uri": "http://creativecommons.org/publicdomain/zero/1.0"
+ },
"termsOfAccess": "You need to request for access.",
"fileAccessRequest": true,
"metadataBlocks": {...},
@@ -991,6 +1001,8 @@ Now that the resulting JSON file only contains the ``metadataBlocks`` key, you c
Now that you've made edits to the metadata in your JSON file, you can send it to a Dataverse installation as described above.
+.. _edit-dataset-metadata-api:
+
Edit Dataset Metadata
~~~~~~~~~~~~~~~~~~~~~
@@ -1166,7 +1178,7 @@ The fully expanded example above (without environment variables) looks like this
.. _assign-role-on-a-dataset-api:
Assign a New Role on a Dataset
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Assigns a new role, based on the POSTed JSON:
@@ -1194,7 +1206,7 @@ POSTed JSON example (the content of ``role.json`` file)::
.. _revoke-role-on-a-dataset-api:
Delete Role Assignment from a Dataset
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Delete the assignment whose id is ``$id``:
@@ -1400,7 +1412,7 @@ In practice, you only need one the ``dataset_id`` or the ``persistentId``. The e
print r.status_code
Report the data (file) size of a Dataset
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Shows the combined size in bytes of all the files uploaded into the dataset ``id``.
@@ -1516,6 +1528,9 @@ The fully expanded example above (without environment variables) looks like this
Dataset Locks
~~~~~~~~~~~~~
+Manage Locks on a Specific Dataset
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
To check if a dataset is locked:
.. code-block:: bash
@@ -1547,7 +1562,7 @@ The fully expanded example above (without environment variables) looks like this
curl "https://demo.dataverse.org/api/datasets/24/locks?type=Ingest"
-Currently implemented lock types are ``Ingest``, ``Workflow``, ``InReview``, ``DcmUpload``, ``pidRegister``, and ``EditInProgress``.
+Currently implemented lock types are ``Ingest``, ``Workflow``, ``InReview``, ``DcmUpload``, ``finalizePublication``, ``EditInProgress`` and ``FileValidationFailed``.
The API will output the list of locks, for example::
@@ -1556,12 +1571,14 @@ The API will output the list of locks, for example::
{
"lockType":"Ingest",
"date":"Fri Aug 17 15:05:51 EDT 2018",
- "user":"dataverseAdmin"
+ "user":"dataverseAdmin",
+ "dataset":"doi:12.34567/FK2/ABCDEF"
},
{
"lockType":"Workflow",
"date":"Fri Aug 17 15:02:00 EDT 2018",
- "user":"dataverseAdmin"
+ "user":"dataverseAdmin",
+ "dataset":"doi:12.34567/FK2/ABCDEF"
}
]
}
@@ -1608,7 +1625,7 @@ Or, to delete a lock of the type specified only. Note that this requires “supe
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=24
- export LOCK_TYPE=pidRegister
+ export LOCK_TYPE=finalizePublication
curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE $SERVER_URL/api/datasets/$ID/locks?type=$LOCK_TYPE
@@ -1616,12 +1633,35 @@ The fully expanded example above (without environment variables) looks like this
.. code-block:: bash
- curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/datasets/24/locks?type=pidRegister
+ curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/datasets/24/locks?type=finalizePublication
If the dataset is not locked (or if there is no lock of the specified type), the API will exit with a warning message.
(Note that the API calls above all support both the database id and persistent identifier notation for referencing the dataset)
+List Locks Across All Datasets
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Note that this API requires “superuser” credentials. You must supply the ``X-Dataverse-key`` header with the api token of an admin user (as in the example below).
+
+The output of this API is formatted identically to the API that lists the locks for a specific dataset, as in one of the examples above.
+
+Use the following API to list ALL the locks on all the datasets in your installation:
+
+ ``/api/datasets/locks``
+
+The listing can be filtered by specific lock type **and/or** user, using the following *optional* query parameters:
+
+* ``userIdentifier`` - To list the locks owned by a specific user
+* ``type`` - To list the locks of the type specified. If the supplied value does not match a known lock type, the API will return an error and a list of valid lock types. As of writing this, the implemented lock types are ``Ingest``, ``Workflow``, ``InReview``, ``DcmUpload``, ``finalizePublication``, ``EditInProgress`` and ``FileValidationFailed``.
+
+For example:
+
+.. code-block:: bash
+
+ curl -H "X-Dataverse-key: xxx" "http://localhost:8080/api/datasets/locks?type=Ingest&userIdentifier=davis4ever"
+
+
.. _dataset-metrics-api:
Dataset Metrics
@@ -2511,7 +2551,7 @@ In order to obtain a new token use::
curl -H X-Dataverse-key:$API_TOKEN -X POST $SERVER_URL/api/users/token/recreate
Delete a Token
-~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~
In order to delete a token use::
@@ -2828,7 +2868,7 @@ Shows all Harvesting Sets defined in the installation::
GET http://$SERVER/api/harvest/server/oaisets/
-List A Specific Harvesting Set
+List A Specific Harvesting Set
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Shows a Harvesting Set with a defined specname::
@@ -3153,7 +3193,7 @@ Deletes an authentication provider from the system. The command succeeds even if
DELETE http://$SERVER/api/admin/authenticationProviders/$id/
List Global Roles
-~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~
List all global roles in the system. ::
@@ -3717,3 +3757,51 @@ Recursively applies the role assignments of the specified Dataverse collection,
GET http://$SERVER/api/admin/dataverse/{dataverse alias}/addRoleAssignmentsToChildren
Note: setting ``:InheritParentRoleAssignments`` will automatically trigger inheritance of the parent Dataverse collection's role assignments for a newly created Dataverse collection. Hence this API call is intended as a way to update existing child Dataverse collections or to update children after a change in role assignments has been made on a parent Dataverse collection.
+
+.. _license-management-api:
+
+Manage Available Standard License Terms
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For more context about configuring licenses, see :ref:`license-config` in the Installation Guide.
+
+View the list of standard license terms that can be selected for a dataset:
+
+.. code-block:: bash
+
+ export SERVER_URL=https://demo.dataverse.org
+ curl $SERVER_URL/api/licenses
+
+View the details of the standard license with the database ID specified in ``$ID``:
+
+.. code-block:: bash
+
+ export ID=1
+ curl $SERVER_URL/api/licenses/$ID
+
+
+Superusers can add a new license by posting a JSON file adapted from this example :download:`add-license.json <../_static/api/add-license.json>`. The ``name`` and ``uri`` of the new license must be unique. If you are interested in adding a Creative Commons license, you are encouarged to use the JSON files under :ref:`adding-creative-commons-licenses`:
+
+.. code-block:: bash
+
+ export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+ curl -X POST -H 'Content-Type: application/json' -H X-Dataverse-key:$API_TOKEN --data-binary @add-license.json $SERVER_URL/api/licenses
+
+Superusers can change whether an existing license is active (usable for new dataset versions) or inactive (only allowed on already-published versions) specified by the license ``$ID``:
+
+.. code-block:: bash
+
+ export STATE=true
+ curl -X PUT -H 'Content-Type: application/json' -H X-Dataverse-key:$API_TOKEN $SERVER_URL/api/licenses/$ID/:active/$STATE
+
+Superusers can set which license is the default specified by the license ``$ID``:
+
+.. code-block:: bash
+
+ curl -X PUT -H 'Content-Type: application/json' -H X-Dataverse-key:$API_TOKEN --data-binary @edit-license.json $SERVER_URL/api/licenses/default/$ID
+
+Superusers can delete a license that is not in use by the license ``$ID``:
+
+.. code-block:: bash
+
+ curl -X DELETE -H X-Dataverse-key:$API_TOKEN $SERVER_URL/api/licenses/$ID
diff --git a/doc/sphinx-guides/source/api/search.rst b/doc/sphinx-guides/source/api/search.rst
index cae9bb2716a..d5e56543fb1 100755
--- a/doc/sphinx-guides/source/api/search.rst
+++ b/doc/sphinx-guides/source/api/search.rst
@@ -35,6 +35,7 @@ show_relevance boolean Whether or not to show details of which fields were ma
show_facets boolean Whether or not to show facets that can be operated on by the "fq" parameter. False by default. See :ref:`advanced search example `.
fq string A filter query on the search term. Multiple "fq" parameters can be used. See :ref:`advanced search example `.
show_entity_ids boolean Whether or not to show the database IDs of the search results (for developer use).
+metadata_fields string Includes the requested fields for each dataset in the response. Multiple "metadata_fields" parameters can be used to include several fields. The value must be in the form "{metadata_block_name}:{field_name}" to include a specific field from a metadata block (see :ref:`example `) or "{metadata_field_set_name}:\*" to include all the fields for a metadata block (see :ref:`example `). "{field_name}" cannot be a subfield of a compound field. If "{field_name}" is a compound field, all subfields are included.
=============== ======= ===========
Basic Search Example
@@ -150,6 +151,9 @@ https://demo.dataverse.org/api/search?q=trees
Advanced Search Examples
------------------------
+Narrowed to Collection, Show Relevance and Facets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
https://demo.dataverse.org/api/search?q=finch&show_relevance=true&show_facets=true&fq=publicationDate:2016&subtree=birds
In this example, ``show_relevance=true`` matches per field are shown. Available facets are shown with ``show_facets=true`` and of the facets is being used with ``fq=publicationDate:2016``. The search is being narrowed to the Dataverse collection with the identifier "birds" with the parameter ``subtree=birds``.
@@ -262,6 +266,9 @@ In this example, ``show_relevance=true`` matches per field are shown. Available
}
}
+Retrieve Released Versions Only
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
https://demo.dataverse.org/api/search?q=finch&fq=publicationStatus:Published&type=dataset
The above example ``fq=publicationStatus:Published`` retrieves only "RELEASED" versions of datasets. The same could be done to retrieve "DRAFT" versions, ``fq=publicationStatus:Draft``
@@ -346,6 +353,317 @@ The above example ``fq=publicationStatus:Published`` retrieves only "RELEASED" v
"count_in_response": 2
}
}
+
+.. _dynamic-citation-all:
+
+Include Metadata Blocks and/or Metadata Fields
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+https://demo.dataverse.org/api/search?q=\*&type=dataset&metadata_fields=citation:\*
+
+The above example ``metadata_fields=citation:*`` returns under "metadataBlocks" all fields from the "citation" metadata block.
+
+.. code-block:: json
+
+ {
+ "status": "OK",
+ "data": {
+ "q": "*",
+ "total_count": 4,
+ "start": 0,
+ "spelling_alternatives": {},
+ "items": [
+ {
+ "name": "JDD avec GeoJson 2021-07-13T10:23:46.409Z",
+ "type": "dataset",
+ "url": "https://doi.org/10.5072/FK2/GIWCKB",
+ "global_id": "doi:10.5072/FK2/GIWCKB",
+ "description": "Démo sprint 5. Cette couche représente l'emprise des cimetières sur le territoire des Métropole. Ces périmètres d'emprise des cimetières sont issus du recensement des informations des PLU/POS de chaque commune de la métropole, des données du cadastre DGFiP et d'un inventaire terrain du Service Planification et Études Urbaines de Métropole",
+ "publisher": "Sample Data",
+ "citationHtml": "Rennes Métropole, 2021, \"JDD avec GeoJson 2021-07-13T10:23:46.409Z\", https://doi.org/10.5072/FK2/GIWCKB, Root, DRAFT VERSION",
+ "identifier_of_dataverse": "Sample_data",
+ "name_of_dataverse": "Sample Data",
+ "citation": "Métropole, 2021, \"JDD avec GeoJson 2021-07-13T10:23:46.409Z\", https://doi.org/10.5072/FK2/GIWCKB, Root, DRAFT VERSION",
+ "storageIdentifier": "file://10.5072/FK2/GIWCKB",
+ "subjects": [
+ "Other"
+ ],
+ "fileCount": 0,
+ "versionId": 9976,
+ "versionState": "DRAFT",
+ "createdAt": "2021-07-13T10:28:45Z",
+ "updatedAt": "2021-07-13T10:28:45Z",
+ "contacts": [
+ {
+ "name": "string",
+ "affiliation": "string"
+ }
+ ],
+ "metadataBlocks": {
+ "citation": {
+ "displayName": "Citation Metadata",
+ "fields": [
+ {
+ "typeName": "dsDescription",
+ "multiple": true,
+ "typeClass": "compound",
+ "value": [
+ {
+ "dsDescriptionValue": {
+ "typeName": "dsDescriptionValue",
+ "multiple": false,
+ "typeClass": "primitive",
+ "value": "Démo sprint 5. Cette couche représente l'emprise des cimetières sur le territoire des Métropole. Ces périmètres d'emprise des cimetières sont issus du recensement des informations des PLU/POS de chaque commune de la métropole, des données du cadastre DGFiP et d'un inventaire terrain du Service Planification et Études Urbaines de Métropole"
+ },
+ "dsDescriptionDate": {
+ "typeName": "dsDescriptionDate",
+ "multiple": false,
+ "typeClass": "primitive",
+ "value": "2021-07-13"
+ }
+ }
+ ]
+ },
+ {
+ "typeName": "author",
+ "multiple": true,
+ "typeClass": "compound",
+ "value": [
+ {
+ "authorName": {
+ "typeName": "authorName",
+ "multiple": false,
+ "typeClass": "primitive",
+ "value": "Métropole"
+ },
+ "authorAffiliation": {
+ "typeName": "authorAffiliation",
+ "multiple": false,
+ "typeClass": "primitive",
+ "value": "string"
+ }
+ }
+ ]
+ },
+ {
+ "typeName": "datasetContact",
+ "multiple": true,
+ "typeClass": "compound",
+ "value": [
+ {
+ "datasetContactName": {
+ "typeName": "datasetContactName",
+ "multiple": false,
+ "typeClass": "primitive",
+ "value": "string"
+ },
+ "datasetContactAffiliation": {
+ "typeName": "datasetContactAffiliation",
+ "multiple": false,
+ "typeClass": "primitive",
+ "value": "string"
+ },
+ "datasetContactEmail": {
+ "typeName": "datasetContactEmail",
+ "multiple": false,
+ "typeClass": "primitive",
+ "value": "contact@Sample.fr"
+ }
+ }
+ ]
+ },
+ {
+ "typeName": "subject",
+ "multiple": true,
+ "typeClass": "controlledVocabulary",
+ "value": [
+ "Other"
+ ]
+ },
+ {
+ "typeName": "title",
+ "multiple": false,
+ "typeClass": "primitive",
+ "value": "JDD avec GeoJson 2021-07-13T10:23:46.409Z"
+ }
+ ]
+ }
+ },
+ "authors": [
+ "Métropole"
+ ]
+ },
+ {
+ "name": "Raja Ampat Islands",
+ "type": "dataset",
+ "url": "https://doi.org/10.5072/FK2/ITNXGR",
+ "global_id": "doi:10.5072/FK2/ITNXGR",
+ "description": "Raja Ampat is located off the northwest tip of Bird's Head Peninsula on the island of New Guinea, in Indonesia's West Papua province, Raja Ampat, or the Four Kings, is an archipelago comprising over 1,500 small islands, cays, and shoals surrounding the four main islands of Misool, Salawati, Batanta, and Waigeo, and the smaller island of Kofiau. The Raja Ampat archipelago straddles the Equator and forms part of Coral Triangle which contains the richest marine biodiversity on earth. Administratively, the archipelago is part of the province of West Papua (formerly known as Irian Jaya). Most of the islands constitute the Raja Ampat Regency, which was separated out from Sorong Regency in 2004. The regency encompasses around 70,000 square kilometres (27,000 sq mi) of land and sea, and has a population of about 50,000 (as of 2017). (Wikipedia: https://en.wikipedia.org/wiki/Raja_Ampat_Islands)",
+ "published_at": "2020-07-30T09:23:34Z",
+ "publisher": "Root",
+ "citationHtml": "Admin, Dataverse, 2020, \"Raja Ampat Islands\", https://doi.org/10.5072/FK2/ITNXGR, Root, V1",
+ "identifier_of_dataverse": "root",
+ "name_of_dataverse": "Root",
+ "citation": "Admin, Dataverse, 2020, \"Raja Ampat Islands\", https://doi.org/10.5072/FK2/ITNXGR, Root, V1",
+ "authors": [
+ "Admin, Dataverse"
+ ]
+ },
+ {
+ "name": "Sample Test",
+ "type": "dataverse",
+ "url": "https://68b2d8bb37c6/dataverse/Sample_test",
+ "identifier": "Sample_test",
+ "description": "Dataverse utilisé pour les tests unitaires de Sample",
+ "published_at": "2021-03-16T08:11:54Z"
+ },
+ {
+ "name": "Sample Media Test",
+ "type": "dataverse",
+ "url": "https://68b2d8bb37c6/dataverse/Sample_media_test",
+ "identifier": "Sample_media_test",
+ "description": "Dataverse de test contenant les médias de Sample, comme les images des fournisseurs et des producteurs",
+ "published_at": "2021-04-08T15:04:14Z"
+ }
+ ],
+ "count_in_response": 4
+ }
+ }
+
+.. _dynamic-citation-some:
+
+Include Specific Fields Only
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+https://demo.dataverse.org/api/search?q=*&type=dataset&metadata_fields=citation:dsDescription&metadata_fields=citation:author
+
+The above example ``metadata_fields=citation:dsDescription&metadata_fields=citation:author`` returns under "metadataBlocks" only the compound fields "dsDescription" and "author" metadata fields from the "citation" metadata block.
+
+.. code-block:: json
+
+ {
+ "status": "OK",
+ "data": {
+ "q": "*",
+ "total_count": 4,
+ "start": 0,
+ "spelling_alternatives": {},
+ "items": [
+ {
+ "name": "JDD avec GeoJson 2021-07-13T10:23:46.409Z",
+ "type": "dataset",
+ "url": "https://doi.org/10.5072/FK2/GIWCKB",
+ "global_id": "doi:10.5072/FK2/GIWCKB",
+ "description": "Démo sprint 5. Cette couche représente l'emprise des cimetières sur le territoire des Métropole. Ces périmètres d'emprise des cimetières sont issus du recensement des informations des PLU/POS de chaque commune de la métropole, des données du cadastre DGFiP et d'un inventaire terrain du Service Planification et Études Urbaines de Métropole",
+ "publisher": "Sample Data",
+ "citationHtml": "Rennes Métropole, 2021, \"JDD avec GeoJson 2021-07-13T10:23:46.409Z\", https://doi.org/10.5072/FK2/GIWCKB, Root, DRAFT VERSION",
+ "identifier_of_dataverse": "Sample_data",
+ "name_of_dataverse": "Sample Data",
+ "citation": "Métropole, 2021, \"JDD avec GeoJson 2021-07-13T10:23:46.409Z\", https://doi.org/10.5072/FK2/GIWCKB, Root, DRAFT VERSION",
+ "storageIdentifier": "file://10.5072/FK2/GIWCKB",
+ "subjects": [
+ "Other"
+ ],
+ "fileCount": 0,
+ "versionId": 9976,
+ "versionState": "DRAFT",
+ "createdAt": "2021-07-13T10:28:45Z",
+ "updatedAt": "2021-07-13T10:28:45Z",
+ "contacts": [
+ {
+ "name": "string",
+ "affiliation": "string"
+ }
+ ],
+ "metadataBlocks": {
+ "citation": {
+ "displayName": "Citation Metadata",
+ "fields": [
+ {
+ "typeName": "dsDescription",
+ "multiple": true,
+ "typeClass": "compound",
+ "value": [
+ {
+ "dsDescriptionValue": {
+ "typeName": "dsDescriptionValue",
+ "multiple": false,
+ "typeClass": "primitive",
+ "value": "Démo sprint 5. Cette couche représente l'emprise des cimetières sur le territoire des Métropole. Ces périmètres d'emprise des cimetières sont issus du recensement des informations des PLU/POS de chaque commune de la métropole, des données du cadastre DGFiP et d'un inventaire terrain du Service Planification et Études Urbaines de Métropole"
+ },
+ "dsDescriptionDate": {
+ "typeName": "dsDescriptionDate",
+ "multiple": false,
+ "typeClass": "primitive",
+ "value": "2021-07-13"
+ }
+ }
+ ]
+ },
+ {
+ "typeName": "author",
+ "multiple": true,
+ "typeClass": "compound",
+ "value": [
+ {
+ "authorName": {
+ "typeName": "authorName",
+ "multiple": false,
+ "typeClass": "primitive",
+ "value": "Métropole"
+ },
+ "authorAffiliation": {
+ "typeName": "authorAffiliation",
+ "multiple": false,
+ "typeClass": "primitive",
+ "value": "string"
+ }
+ }
+ ]
+ }
+ ]
+ }
+ },
+ "authors": [
+ "Métropole"
+ ]
+ },
+ {
+ "name": "Raja Ampat Islands",
+ "type": "dataset",
+ "url": "https://doi.org/10.5072/FK2/ITNXGR",
+ "global_id": "doi:10.5072/FK2/ITNXGR",
+ "description": "Raja Ampat is located off the northwest tip of Bird's Head Peninsula on the island of New Guinea, in Indonesia's West Papua province, Raja Ampat, or the Four Kings, is an archipelago comprising over 1,500 small islands, cays, and shoals surrounding the four main islands of Misool, Salawati, Batanta, and Waigeo, and the smaller island of Kofiau. The Raja Ampat archipelago straddles the Equator and forms part of Coral Triangle which contains the richest marine biodiversity on earth. Administratively, the archipelago is part of the province of West Papua (formerly known as Irian Jaya). Most of the islands constitute the Raja Ampat Regency, which was separated out from Sorong Regency in 2004. The regency encompasses around 70,000 square kilometres (27,000 sq mi) of land and sea, and has a population of about 50,000 (as of 2017). (Wikipedia: https://en.wikipedia.org/wiki/Raja_Ampat_Islands)",
+ "published_at": "2020-07-30T09:23:34Z",
+ "publisher": "Root",
+ "citationHtml": "Admin, Dataverse, 2020, \"Raja Ampat Islands\", https://doi.org/10.5072/FK2/ITNXGR, Root, V1",
+ "identifier_of_dataverse": "root",
+ "name_of_dataverse": "Root",
+ "citation": "Admin, Dataverse, 2020, \"Raja Ampat Islands\", https://doi.org/10.5072/FK2/ITNXGR, Root, V1",
+ "authors": [
+ "Admin, Dataverse"
+ ]
+ },
+ {
+ "name": "Sample Media Test",
+ "type": "dataverse",
+ "url": "https://68b2d8bb37c6/dataverse/Sample_media_test",
+ "identifier": "Sample_media_test",
+ "description": "Dataverse de test contenant les médias de Sample, comme les images des fournisseurs et des producteurs",
+ "published_at": "2021-04-08T15:04:14Z"
+ },
+ {
+ "name": "Sample Test",
+ "type": "dataverse",
+ "url": "https://68b2d8bb37c6/dataverse/Sample_test",
+ "identifier": "Sample_test",
+ "description": "Dataverse utilisé pour les tests unitaires de Sample",
+ "published_at": "2021-03-16T08:11:54Z"
+ }
+ ],
+ "count_in_response": 4
+ }
+ }
.. _search-date-range:
diff --git a/doc/sphinx-guides/source/conf.py b/doc/sphinx-guides/source/conf.py
index 42988690329..2d08c687467 100755
--- a/doc/sphinx-guides/source/conf.py
+++ b/doc/sphinx-guides/source/conf.py
@@ -65,9 +65,9 @@
# built documents.
#
# The short X.Y version.
-version = '5.9'
+version = '5.10'
# The full version, including alpha/beta/rc tags.
-release = '5.9'
+release = '5.10'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
diff --git a/doc/sphinx-guides/source/developers/aux-file-support.rst b/doc/sphinx-guides/source/developers/aux-file-support.rst
index 666685ff987..9b2734b3a25 100644
--- a/doc/sphinx-guides/source/developers/aux-file-support.rst
+++ b/doc/sphinx-guides/source/developers/aux-file-support.rst
@@ -24,8 +24,7 @@ You should expect a 200 ("OK") response and JSON with information about your new
Downloading an Auxiliary File that Belongs to a Datafile
--------------------------------------------------------
-To download an auxiliary file, use the primary key of the datafile, and the
-formatTag and formatVersion (if applicable) associated with the auxiliary file:
+To download an auxiliary file, use the primary key of the datafile, and the formatTag and formatVersion (if applicable) associated with the auxiliary file. An API token is shown in the example below but it is not necessary if the auxiliary file was uploaded with isPublic=true and the dataset has been published.
.. code-block:: bash
@@ -35,7 +34,7 @@ formatTag and formatVersion (if applicable) associated with the auxiliary file:
export FORMAT_TAG='dpJson'
export FORMAT_VERSION='v1'
- curl "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION"
+ curl -H X-Dataverse-key:$API_TOKEN "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION"
Listing Auxiliary Files for a Datafile by Origin
------------------------------------------------
@@ -48,7 +47,7 @@ To list auxiliary files, specify the primary key of the datafile (FILE_ID), and
export SERVER_URL=https://demo.dataverse.org
export ORIGIN='app1'
- curl "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$ORIGIN"
+ curl -H X-Dataverse-key:$API_TOKEN "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$ORIGIN"
You should expect a 200 ("OK") response and a JSON array with objects representing the auxiliary files found, or a 404/Not Found response if no auxiliary files exist with that origin.
@@ -65,6 +64,6 @@ formatTag and formatVersion (if applicable) associated with the auxiliary file:
export FORMAT_TAG='dpJson'
export FORMAT_VERSION='v1'
- curl -X DELETE "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION"
+ curl -H X-Dataverse-key:$API_TOKEN DELETE -X "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION"
diff --git a/doc/sphinx-guides/source/developers/big-data-support.rst b/doc/sphinx-guides/source/developers/big-data-support.rst
index 37a14b63f34..21675bd4960 100644
--- a/doc/sphinx-guides/source/developers/big-data-support.rst
+++ b/doc/sphinx-guides/source/developers/big-data-support.rst
@@ -59,7 +59,7 @@ with the contents of the file cors.json as follows:
Alternatively, you can enable CORS using the AWS S3 web interface, using json-encoded rules as in the example above.
-Since the direct upload mechanism creates the final file rather than an intermediate temporary file, user actions, such as neither saving or canceling an upload session before closing the browser page, can leave an abandoned file in the store. The direct upload mechanism attempts to use S3 Tags to aid in identifying/removing such files. Upon upload, files are given a "dv-status":"temp" tag which is removed when the dataset changes are saved and the new file(s) are added in the Dataverse installation. Note that not all S3 implementations support Tags: Minio does not. WIth such stores, direct upload works, but Tags are not used.
+Since the direct upload mechanism creates the final file rather than an intermediate temporary file, user actions, such as neither saving or canceling an upload session before closing the browser page, can leave an abandoned file in the store. The direct upload mechanism attempts to use S3 Tags to aid in identifying/removing such files. Upon upload, files are given a "dv-state":"temp" tag which is removed when the dataset changes are saved and the new file(s) are added in the Dataverse installation. Note that not all S3 implementations support Tags: Minio does not. WIth such stores, direct upload works, but Tags are not used.
Data Capture Module (DCM)
-------------------------
diff --git a/doc/sphinx-guides/source/developers/dataset-migration-api.rst b/doc/sphinx-guides/source/developers/dataset-migration-api.rst
index 1dc8f7866e0..fc86b7ccdcf 100644
--- a/doc/sphinx-guides/source/developers/dataset-migration-api.rst
+++ b/doc/sphinx-guides/source/developers/dataset-migration-api.rst
@@ -31,7 +31,7 @@ To import a dataset with an existing persistent identifier (PID), the provided j
curl -H X-Dataverse-key:$API_TOKEN -X POST $SERVER_URL/api/dataverses/$DATAVERSE_ID/datasets/:startmigration --upload-file dataset-migrate.jsonld
-An example jsonld file is available at :download:`dataset-migrate.jsonld <../_static/api/dataset-migrate.jsonld>` . Note that you would need to replace the PID in the sample file with one supported in your Dataverse instance. (Also note that `Issue #8028 `_ currently breaks testing this API with DataCite test DOIs.)
+An example jsonld file is available at :download:`dataset-migrate.jsonld <../_static/api/dataset-migrate.jsonld>` . Note that you would need to replace the PID in the sample file with one supported in your Dataverse instance.
Publish a Migrated Dataset
--------------------------
@@ -55,4 +55,4 @@ An optional query parameter: updatepidatprovider (default is false) can be set t
curl -H 'Content-Type: application/ld+json' -H X-Dataverse-key:$API_TOKEN -X POST -d '{"schema:datePublished": "2020-10-26","@context":{ "schema":"http://schema.org/"}}' "$SERVER_URL/api/datasets/{id}/actions/:releasemigrated?updatepidatprovider=true"
-If the parameter is not added and set to true, other existing APIs can be used to update the PID at the provider later, e.g. :ref:`send-metadata-to-pid-provider`
\ No newline at end of file
+If the parameter is not added and set to true, other existing APIs can be used to update the PID at the provider later, e.g. :ref:`send-metadata-to-pid-provider`
diff --git a/doc/sphinx-guides/source/developers/dependencies.rst b/doc/sphinx-guides/source/developers/dependencies.rst
index 564d47c5972..65edfa3ffac 100644
--- a/doc/sphinx-guides/source/developers/dependencies.rst
+++ b/doc/sphinx-guides/source/developers/dependencies.rst
@@ -3,16 +3,24 @@ Dependency Management
=====================
.. contents:: |toctitle|
- :local:
+ :local:
-The Dataverse Software is (currently) a Jakarta EE 8 based application, that uses a lot of additional libraries for special purposes.
-This includes features like support for SWORD-API, S3 storage and many others.
+Introduction
+------------
+
+As explained under :ref:`core-technologies`, the Dataverse Software is a Jakarta EE 8 based application that uses a lot of additional libraries for
+special purposes. This includes support for the SWORD API, S3 storage, and many other features.
+
+Besides the code that glues together individual pieces, any developer needs to describe dependencies used within the
+Maven-based build system. As is familiar to any Maven user, this happens inside the "Project Object Model" (POM) file, ``pom.xml``.
+
+Recursive and convergent dependency resolution makes dependency management with Maven quite easy, but sometimes, in
+projects with many complex dependencies like the Dataverse Software, you have to help Maven make the right choices.
+
+Maven can foster good development practices by enabling modulithic (modular monolithic) architecture: splitting
+functionalities into different Maven submodules while expressing dependencies between them. But there's more: the
+parent-child model allows you to create consistent dependency versioning (see below) within children.
-Besides the code that glues together the single pieces, any developer needs to describe used dependencies for the
-Maven-based build system. As is familiar to any Maven user, this happens inside the "Project Object Model" (POM) living in
-``pom.xml`` at the root of the project repository. Recursive and convergent dependency resolution makes dependency
-management with Maven very easy. But sometimes, in projects with many complex dependencies like the Dataverse Software, you have
-to help Maven make the right choices.
Terms
-----
@@ -23,7 +31,7 @@ As a developer, you should familiarize yourself with the following terms:
- **Transitive dependencies**: things *others use* for things you use, pulled in recursively.
See also: `Maven docs `_.
-.. graphviz::
+ .. graphviz::
digraph {
rankdir="LR";
@@ -44,6 +52,94 @@ As a developer, you should familiarize yourself with the following terms:
yc -> dtz;
}
+- **Project Object Model** (POM): the basic XML file unit to describe a Maven-based project.
+- **Bill Of Materials** (BOM): larger projects like Payara, Amazon SDK etc. provide lists of their direct dependencies.
+ This comes in handy when adding these dependencies (transitive for us) as direct dependencies, see below.
+
+ .. graphviz::
+
+ digraph {
+ rankdir="TD";
+ node [fontsize=10]
+ edge [fontsize=8]
+
+ msp [label="Maven Super POM"]
+ sp [label="Your POM"]
+ bom [label="Some BOM"]
+ td [label="Direct & Transitive\nDependency"]
+
+ msp -> sp [label="inherit", dir="back"];
+ bom -> sp [label="import", dir="back"];
+ bom -> td [label="depend on"];
+ sp -> td [label="depend on\n(same version)", constraint=false];
+ }
+
+- **Parent POM**, **Super POM**: any project may be a child of a parent.
+
+ Project silently inherit from a "super POM", which is the global Maven standard parent POM.
+ Children may also be aggregated by a parent (without them knowing) for convenient builds of larger projects.
+
+ .. graphviz::
+
+ digraph {
+ rankdir="TD";
+ node [fontsize=10]
+ edge [fontsize=8]
+
+ msp [label="Maven Super POM"]
+ ap [label="Any POM"]
+ msp -> ap [label="inherit", dir="back"];
+
+ pp [label="Parent 1 POM"]
+ cp1 [label="Submodule 1 POM"]
+ cp2 [label="Submodule 2 POM"]
+
+ msp -> pp [label="inherit", dir="back", constraint=false];
+ pp -> cp1 [label="aggregate"];
+ pp -> cp2 [label="aggregate"];
+ }
+
+ Children may inherit dependencies, properties, settings, plugins etc. from the parent (making it possible to share
+ common ground). Both approaches may be combined. Children may import as many BOMs as they want, but can have only a
+ single parent to inherit from at a time.
+
+ .. graphviz::
+
+ digraph {
+ rankdir="TD";
+ node [fontsize=10]
+ edge [fontsize=8]
+
+ msp [label="Maven Super POM"]
+ pp [label="Parent POM"]
+ cp1 [label="Submodule 1 POM"]
+ cp2 [label="Submodule 2 POM"]
+
+ msp -> pp [label="inherit", dir="back", constraint=false];
+ pp -> cp1 [label="aggregate"];
+ pp -> cp2 [label="aggregate"];
+ cp1 -> pp [label="inherit"];
+ cp2 -> pp [label="inherit"];
+
+ d [label="Dependency"]
+ pp -> d [label="depends on"]
+ cp1 -> d [label="inherit:\ndepends on", style=dashed];
+ cp2 -> d [label="inherit:\ndepends on", style=dashed];
+ }
+
+- **Modules**: when using parents and children, these are called "modules" officially, each having their own POM.
+
+ Using modules allows bundling different aspects of (Dataverse) software in their own domains, with their own
+ behavior, dependencies etc. Parent modules allow for sharing of common settings, properties, dependencies and more.
+ Submodules may also be used as parent modules for a lower level of submodules.
+
+ Maven modules within the same software project may also depend on each other, allowing to create complex structures
+ of packages and projects. Each module may be released on their own (e. g. on Maven Central) and other projects may
+ rely on and reuse them. This is especially useful for parent POMs: they may be reused as BOMs or to share a standard
+ between independent software projects.
+
+ Maven modules should not be confused with the `Java Platform Module System (JPMS) `_ introduced in Java 9 under Project Jigsaw.
+
Direct dependencies
-------------------
@@ -62,24 +158,34 @@ Within the POM, any direct dependencies reside within the ```` tag
Anytime you add a ````, Maven will try to fetch it from defined/configured repositories and use it
-within the build lifecycle. You have to define a ````, but ```` is optional for ``compile``.
-(See `Maven docs: Dep. Scope `_)
+within the build lifecycle. You have to define a ```` (note exception below), but ```` is optional for
+``compile``. (See `Maven docs: Dep. Scope `_)
-
-During fetching, Maven will analyse all transitive dependencies (see graph above) and, if necessary, fetch those, too.
+During fetching, Maven will analyze all transitive dependencies (see graph above) and, if necessary, fetch those too.
Everything downloaded once is cached locally by default, so nothing needs to be fetched again and again, as long as the
dependency definition does not change.
**Rules to follow:**
1. You should only use direct dependencies for **things you are actually using** in your code.
-2. **Clean up** direct dependencies no longer in use. It will bloat the deployment package otherwise!
-3. Care about the **scope**. Do not include "testing only" dependencies in the package - it will hurt you in IDEs and bloat things. [#f1]_
-4. Avoid using different dependencies for the **same purpose**, e. g. different JSON parsing libraries.
-5. Refactor your code to **use Jakarta EE** standards as much as possible.
-6. When you rely on big SDKs or similar big cool stuff, try to **include the smallest portion possible**. Complete SDK
+2. When declaring a direct dependency with its **version** managed by ````, a BOM or parent POM, you
+ may not provide one unless you want to explicitly override!
+3. **Clean up** direct dependencies no longer in use. It will bloat the deployment package otherwise!
+4. Care about the **scope** [#f1]_:
+
+ * Do not include "testing only" dependencies in the final package - it will hurt you in IDEs and bloat things.
+ There is scope ``test`` for this!
+ * Make sure to use the ``runtime`` scope when you need to ensure a library is present on our classpath at runtime.
+ An example is the SLF4J JUL bridge: we want to route logs from SLF4J into ``java.util.logging``, so it needs to be
+ present on the classpath, although we aren't using SLF4J unlike, some of our dependencies.
+ * Some dependencies might be ``provided`` by the runtime environment. Good example: everything from Jakarta EE!
+ We use the Payara BOM to ensure using the same version during development and runtime.
+
+5. Avoid using different dependencies for the **same purpose**, e. g. different JSON parsing libraries.
+6. Refactor your code to **use Jakarta EE** standards as much as possible.
+7. When you rely on big SDKs or similar big cool stuff, try to **include the smallest portion possible**. Complete SDK
bundles are typically heavyweight and most of the time unnecessary.
-7. **Don't include transitive dependencies.** [#f2]_
+8. **Don't include transitive dependencies.** [#f2]_
* Exception: if you are relying on it in your code (see *Z* in the graph above), you must declare it. See below
for proper handling in these (rare) cases.
@@ -92,8 +198,8 @@ Maven is comfortable for developers; it handles recursive resolution, downloadin
However, as life is a box of chocolates, you might find yourself in *version conflict hell* sooner than later without even
knowing, but experiencing unintended side effects.
-When you look at the graph above, imagine *B* and *TB* rely on different *versions* of *TC*. How does Maven decide
-which version it will include? Easy: the dependent version of the nearest version wins:
+When you look at the topmost graph above, imagine *B* and *TB* rely on different *versions* of *TC*. How does Maven
+decide which version it will include? Easy: the version of the dependency nearest to our project ("Your Code)" wins. The following graph gives an example:
.. graphviz::
@@ -110,19 +216,19 @@ which version it will include? Easy: the dependent version of the nearest versio
yc -> dtz2;
}
-In this case, version "2.0" will be included. If you know something about semantic versioning, a red alert should ring in your mind right now.
-How do we know that *B* is compatible with *Z v2.0* when depending on *Z v1.0*?
+In this case, version "2.0" will be included. If you know something about semantic versioning, a red alert should ring
+in your mind right now. How do we know that *B* is compatible with *Z v2.0* when depending on *Z v1.0*?
Another scenario getting us in trouble: indirect use of transitive dependencies. Imagine the following: we rely on *Z*
-in our code, but do not include a direct dependency for it within the POM. Now *B* is updated and removed its dependency
-on *Z*. You definitely don't want to head down that road.
+in our code, but do not include a direct dependency for it within the POM. Now assume *B* is updated and removed its
+dependency on *Z*. You definitely don't want to head down that road.
**Follow the rules to be safe:**
-1. Do **not use transitive deps implicit**: add a direct dependency for transitive deps you re-use in your code.
-2. On every build check that no implicit usage was added by accident.
+1. Do **not use transitive deps implicitly**: add a direct dependency for transitive deps you re-use in your code.
+2. On every build, check that no implicit usage was added by accident.
3. **Explicitly declare versions** of transitive dependencies in use by multiple direct dependencies.
-4. On every build check that there are no convergence problems hiding in the shadows.
+4. On every build, check that there are no convergence problems hiding in the shadows.
5. **Do special tests** on every build to verify these explicit combinations work.
Managing transitive dependencies in ``pom.xml``
@@ -130,15 +236,24 @@ Managing transitive dependencies in ``pom.xml``
Maven can manage versions of transitive dependencies in four ways:
-1. Make a transitive-only dependency not used in your code a direct one and add a ```` tag.
- Typically a bad idea, don't do that.
-2. Use ```` or ```` tags on direct dependencies that request the transitive dependency.
- *Last resort*, you really should avoid this. Not explained or used here.
- `See Maven docs `_.
-3. Explicitly declare the transitive dependency in ```` and add a ```` tag.
-4. For more complex transitive dependencies, reuse a "Bill of Materials" (BOM) within ````
- and add a ```` tag. Many bigger and standard use projects provide those, making the POM much less bloated
- compared to adding every bit yourself.
+.. list-table::
+ :align: left
+ :stub-columns: 1
+ :widths: 12 40 40
+
+ * - Safe Good Practice
+ - (1) Explicitly declare the transitive dependency in ```` with a ```` tag.
+ - (2) For more complex transitive dependencies, reuse a "Bill of Materials" (BOM) within ````.
+ Many bigger projects provide them, making the POM much less bloated compared to adding every bit yourself.
+ * - Better Avoid or Don't
+ - (3) Use ```` or ```` tags on direct dependencies that request the transitive dependency.
+ *Last resort*, you really should avoid this. Not explained or used here, but sometimes unavoidable.
+ `See Maven docs `_.
+ - (4) Make a transitive-only dependency not used in your code a direct one and add a ```` tag.
+ Typically a bad idea; don't do that.
+
+**Note:** when the same transitive dependency is used in multiple Maven modules of a software project, it might be added
+to a common ```` section of an inherited parent POM instead. (Overrides are still possible.)
A reduced example, only showing bits relevant to the above cases and usage of an explicit transitive dep directly:
@@ -214,12 +329,12 @@ Helpful tools
Maven provides some plugins that are of great help to detect possible conflicts and implicit usage.
-For *implicit usage detection*, use `mvn dependency:analyze`. Examine the output with great care. Sometimes you will
+For *implicit usage detection*, use ``mvn dependency:analyze``. Examine the output with great care. Sometimes you will
see implicit usages that do no harm, especially if you are using bigger SDKs having some kind of `core` package.
This will also report on any direct dependency which is not in use and can be removed from the POM. Again, do this with
great caution and double check.
-If you want to see the dependencies both direct and transitive in a *dependency tree format*, use `mvn dependency:tree`.
+If you want to see the dependencies both direct and transitive in a *dependency tree format*, use ``mvn dependency:tree``.
This will however not help you with detecting possible version conflicts. For this you need to use the `Enforcer Plugin
`_ with its built in `dependency convergence rule
@@ -228,7 +343,7 @@ This will however not help you with detecting possible version conflicts. For th
Repositories
------------
-Maven receives all dependencies from *repositories*. Those can be public like `Maven Central `_
+Maven receives all dependencies from *repositories*. These can be public like `Maven Central `_
and others, but you can also use a private repository on premises or in the cloud. Last but not least, you can use
local repositories, which can live next to your application code (see ``local_lib`` dir within the Dataverse Software codebase).
@@ -262,6 +377,73 @@ Typically you will skip the addition of the central repository, but adding it to
dependencies are first looked up there (which in theory can speed up downloads). You should keep in mind that repositories
are used in the order they appear.
+
+Dataverse Parent POM
+--------------------
+
+Within ``modules/dataverse-parent`` you will find the parent POM for the Dataverse codebase. It serves different
+purposes:
+
+1. Provide the common version number for a Dataverse release (may be overriden where necessary).
+2. Provide common metadata necessary for releasing modules to repositories like Maven Central.
+3. Declare aggregated submodules via ````.
+4. Collate common BOMs and transitive dependencies within ````.
+ (Remember: a direct dependency declaration may omit the version element when defined in that area!)
+5. Collect common ```` regarding the Maven project (encoding, ...), dependency versions, target Java version, etc.
+6. Gather common ```` and ```` - no need to repeat those in submodules.
+7. Make submodules use current Maven plugin release versions via ````.
+
+As of this writing (2022-02-10), our parent module looks like this:
+
+.. graphviz::
+
+ digraph {
+ rankdir="TD";
+ node [fontsize=10]
+ edge [fontsize=8]
+
+ dvp [label="Dataverse Parent"]
+ dvw [label="Submodule:\nDataverse WAR"]
+ zip [label="Submodule:\nZipdownloader JAR"]
+
+ dvw -> dvp [label="inherit"];
+ dvp -> dvw [label="aggregate"];
+ zip -> dvp [label="inherit"];
+ dvp -> zip [label="aggregate"];
+
+ pay [label="Payara BOM"]
+ aws [label="AWS SDK BOM"]
+ ggl [label="Googe Cloud BOM"]
+ tc [label="Testcontainers BOM"]
+ td [label="Multiple (transitive) dependencies\n(PSQL, Logging, Apache Commons, ...)"]
+
+ dvp -> td [label="manage"];
+
+ pay -> dvp [label="import", dir="back"];
+ aws -> dvp [label="import", dir="back"];
+ ggl -> dvp [label="import", dir="back"];
+ tc -> dvp [label="import", dir="back"];
+
+ }
+
+The codebase is structured like this:
+
+.. code-block::
+
+ # Dataverse WAR Module
+ ├── pom.xml # (POM file of WAR module)
+ ├── modules #
+ │ └── dataverse-parent # Dataverse Parent Module
+ │ └── pom.xml # (POM file of Parent Module)
+ └── scripts #
+ └── zipdownload # Zipdownloader JAR Module
+ └── pom.xml # (POM file of Zipdownloader Module)
+
+- Any developer cloning the project and running ``mvn`` within the project root will interact with the Dataverse WAR
+ module, which is the same behavior since Dataverse 4.0 has been released.
+- Running ``mvn`` targets within the parent module will execute all aggregated submodules in one go.
+
+
----
.. rubric:: Footnotes
diff --git a/doc/sphinx-guides/source/developers/dev-environment.rst b/doc/sphinx-guides/source/developers/dev-environment.rst
index 2ec5dbea29e..4b76e475da7 100755
--- a/doc/sphinx-guides/source/developers/dev-environment.rst
+++ b/doc/sphinx-guides/source/developers/dev-environment.rst
@@ -94,30 +94,30 @@ To install Payara, run the following commands:
Install PostgreSQL
~~~~~~~~~~~~~~~~~~
-For the past few release cycles much of the development has been done under PostgreSQL 9.6. While that version is known to be very stable, it is nearing its end-of-life (in Nov. 2021). The Dataverse Software has now been tested with versions up to 13 (13.2 is the latest released version as of writing this).
+The Dataverse Software has been tested with PostgreSQL versions up to 13. PostgreSQL version 10+ is required.
-On Mac, go to https://www.postgresql.org/download/macosx/ and choose "Interactive installer by EDB" option. Note that version 9.6 is used in the command line examples below, but the process will be identical for any version up to 13. When prompted to set a password for the "database superuser (postgres)" just enter "password".
+On Mac, go to https://www.postgresql.org/download/macosx/ and choose "Interactive installer by EDB" option. Note that version 13.5 is used in the command line examples below, but the process should be similar for other versions. When prompted to set a password for the "database superuser (postgres)" just enter "password".
After installation is complete, make a backup of the ``pg_hba.conf`` file like this:
-``sudo cp /Library/PostgreSQL/9.6/data/pg_hba.conf /Library/PostgreSQL/9.6/data/pg_hba.conf.orig``
+``sudo cp /Library/PostgreSQL/13/data/pg_hba.conf /Library/PostgreSQL/13/data/pg_hba.conf.orig``
Then edit ``pg_hba.conf`` with an editor such as vi:
-``sudo vi /Library/PostgreSQL/9.6/data/pg_hba.conf``
+``sudo vi /Library/PostgreSQL/13/data/pg_hba.conf``
-In the "METHOD" column, change all instances of "md5" to "trust". This will make it so PostgreSQL doesn't require a password.
+In the "METHOD" column, change all instances of "scram-sha-256" (or whatever is in that column) to "trust". This will make it so PostgreSQL doesn't require a password.
-In the Finder, click "Applications" then "PostgreSQL 9.6" and launch the "Reload Configuration" app. Click "OK" after you see "server signaled".
+In the Finder, click "Applications" then "PostgreSQL 13" and launch the "Reload Configuration" app. Click "OK" after you see "server signaled".
-Next, to confirm the edit worked, launch the "pgAdmin" application from the same folder. Under "Browser", expand "Servers" and double click "PostgreSQL 9.6". When you are prompted for a password, leave it blank and click "OK". If you have successfully edited "pg_hba.conf", you can get in without a password.
+Next, to confirm the edit worked, launch the "pgAdmin" application from the same folder. Under "Browser", expand "Servers" and double click "PostgreSQL 13". When you are prompted for a password, leave it blank and click "OK". If you have successfully edited "pg_hba.conf", you can get in without a password.
On Linux, you should just install PostgreSQL using your favorite package manager, such as ``yum``. (Consult the PostgreSQL section of :doc:`/installation/prerequisites` in the main Installation guide for more info and command line examples). Find ``pg_hba.conf`` and set the authentication method to "trust" and restart PostgreSQL.
Install Solr
~~~~~~~~~~~~
-`Solr `_ 8.8.1 is required.
+`Solr `_ 8.11.1 is required.
To install Solr, execute the following commands:
@@ -127,25 +127,25 @@ To install Solr, execute the following commands:
``cd /usr/local/solr``
-``curl -O http://archive.apache.org/dist/lucene/solr/8.8.1/solr-8.8.1.tgz``
+``curl -O http://archive.apache.org/dist/lucene/solr/8.11.1/solr-8.11.1.tgz``
-``tar xvfz solr-8.8.1.tgz``
+``tar xvfz solr-8.11.1.tgz``
-``cd solr-8.8.1/server/solr``
+``cd solr-8.11.1/server/solr``
``cp -r configsets/_default collection1``
-``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/8.8.1/schema.xml``
+``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/8.11.1/schema.xml``
-``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/8.8.1/schema_dv_mdb_fields.xml``
+``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/8.11.1/schema_dv_mdb_fields.xml``
``mv schema*.xml collection1/conf``
-``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/8.8.1/solrconfig.xml``
+``curl -O https://raw.githubusercontent.com/IQSS/dataverse/develop/conf/solr/8.11.1/solrconfig.xml``
``mv solrconfig.xml collection1/conf/solrconfig.xml``
-``cd /usr/local/solr/solr-8.8.1``
+``cd /usr/local/solr/solr-8.11.1``
(Please note that the extra jetty argument below is a security measure to limit connections to Solr to only your computer. For extra security, run a firewall.)
diff --git a/doc/sphinx-guides/source/developers/intro.rst b/doc/sphinx-guides/source/developers/intro.rst
index 8fc0c679a8b..7f4e8c1ba34 100755
--- a/doc/sphinx-guides/source/developers/intro.rst
+++ b/doc/sphinx-guides/source/developers/intro.rst
@@ -21,6 +21,8 @@ Getting Help
If you have any questions at all, please reach out to other developers via the channels listed in https://github.com/IQSS/dataverse/blob/develop/CONTRIBUTING.md such as http://chat.dataverse.org, the `dataverse-dev `_ mailing list, `community calls `_, or support@dataverse.org.
+.. _core-technologies:
+
Core Technologies
-----------------
@@ -65,8 +67,6 @@ As a developer, you also may be interested in these projects related to Datavers
- Configuration management scripts - Ansible, Puppet, etc.: See :ref:`advanced` section in the Installation Guide.
- :doc:`/developers/unf/index` (Java) - a Universal Numerical Fingerprint: https://github.com/IQSS/UNF
- `DataTags `_ (Java and Scala) - tag datasets with privacy levels: https://github.com/IQSS/DataTags
-- `TwoRavens `_ (Javascript) - a `d3.js `_ interface for exploring data and running Zelig models: https://github.com/IQSS/TwoRavens
-- `Zelig `_ (R) - run statistical models on files uploaded to a Dataverse installation: https://github.com/IQSS/Zelig
- `Matrix `_ - a visualization showing the connectedness and collaboration between authors and their affiliations.
- Third party apps - make use of Dataverse installation APIs: :doc:`/api/apps`
- chat.dataverse.org - chat interface for Dataverse Project users and developers: https://github.com/IQSS/chat.dataverse.org
diff --git a/doc/sphinx-guides/source/developers/making-releases.rst b/doc/sphinx-guides/source/developers/making-releases.rst
index cbd88b1a357..064ed6f1b78 100755
--- a/doc/sphinx-guides/source/developers/making-releases.rst
+++ b/doc/sphinx-guides/source/developers/making-releases.rst
@@ -22,7 +22,7 @@ Make the following changes in the release branch:
Increment the version number to the milestone (e.g. 4.6.2) in the following two files:
-- pom.xml
+- modules/dataverse-parent/pom.xml -> ```` -> ````
- doc/sphinx-guides/source/conf.py (two places)
Add the version being released to the lists in the following two files:
@@ -31,6 +31,7 @@ Add the version being released to the lists in the following two files:
- scripts/database/releases.txt
Here's an example commit where three of the four files above were updated at once: https://github.com/IQSS/dataverse/commit/99e23f96ec362ac2f524cb5cd80ca375fa13f196
+(Note: the version has been moved to a property in parent module since this commit was created.)
2. Check in the Changes Above...
================================
diff --git a/doc/sphinx-guides/source/developers/remote-users.rst b/doc/sphinx-guides/source/developers/remote-users.rst
index 3f8dd836661..a5e51aa5e54 100755
--- a/doc/sphinx-guides/source/developers/remote-users.rst
+++ b/doc/sphinx-guides/source/developers/remote-users.rst
@@ -26,7 +26,7 @@ In addition to setting up OAuth on your laptop for real per above, you can also
For a list of possible values, please "find usages" on the settings key above and look at the enum.
-Now when you go to http://localhost:8080/oauth2/firstLogin.xhtml you should be prompted to create a Shibboleth account.
+Now when you go to http://localhost:8080/oauth2/firstLogin.xhtml you should be prompted to create an OAuth account.
----
diff --git a/doc/sphinx-guides/source/developers/selinux.rst b/doc/sphinx-guides/source/developers/selinux.rst
index fe230e3ff68..dcbf3ee594f 100644
--- a/doc/sphinx-guides/source/developers/selinux.rst
+++ b/doc/sphinx-guides/source/developers/selinux.rst
@@ -44,7 +44,7 @@ Use ``semodule -l | grep shibboleth`` to see if the ``shibboleth.te`` rules are
Exercising SELinux denials
~~~~~~~~~~~~~~~~~~~~~~~~~~
-As of this writing, there are two optional components of the Dataverse Software that are known not to work with SELinux out of the box with SELinux: Shibboleth and rApache.
+As of this writing, the only component of the Dataverse Software which is known not to work with SELinux out of the box is Shibboleth.
We will be exercising SELinux denials with Shibboleth, and the SELinux-related issues are expected out the box:
diff --git a/doc/sphinx-guides/source/developers/version-control.rst b/doc/sphinx-guides/source/developers/version-control.rst
index c88a0b71d82..18cb7ca5cba 100644
--- a/doc/sphinx-guides/source/developers/version-control.rst
+++ b/doc/sphinx-guides/source/developers/version-control.rst
@@ -11,7 +11,7 @@ The Dataverse Project uses git for version control and GitHub for hosting. On th
Where to Find the Dataverse Software Code
-----------------------------------------
-The main Dataverse Software code at https://github.com/IQSS/dataverse but as explained in the :doc:`intro` section under "Related Projects", there are many other code bases you can hack on if you wish!
+The main Dataverse Software code is available at https://github.com/IQSS/dataverse but as explained in the :doc:`intro` section under "Related Projects", there are many other code bases you can hack on if you wish!
Branching Strategy
------------------
@@ -67,7 +67,7 @@ If you tell us your GitHub username we are happy to add you to the "read only" t
Create a New Branch off the develop Branch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Always create your feature branch from the latest code in develop, pulling the latest code if necessary. As mentioned above, your branch should have a name like "3728-doc-apipolicy-fix" that starts with the issue number you are addressing, and ends with a short, descriptive name. Dashes ("-") and underscores ("_") in your branch name are ok, but please try to avoid other special characters such as ampersands ("&") than have special meaning in Unix shells.
+Always create your feature branch from the latest code in develop, pulling the latest code if necessary. As mentioned above, your branch should have a name like "3728-doc-apipolicy-fix" that starts with the issue number you are addressing, and ends with a short, descriptive name. Dashes ("-") and underscores ("_") in your branch name are ok, but please try to avoid other special characters such as ampersands ("&") that have special meaning in Unix shells.
Commit Your Change to Your New Branch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/doc/sphinx-guides/source/installation/advanced.rst b/doc/sphinx-guides/source/installation/advanced.rst
index 19fea65f0ca..4f06ed37d01 100644
--- a/doc/sphinx-guides/source/installation/advanced.rst
+++ b/doc/sphinx-guides/source/installation/advanced.rst
@@ -36,46 +36,80 @@ You would repeat the steps above for all of your app servers. If users seem to b
Please note that :ref:`network-ports` under the Configuration section has more information on fronting your app server with Apache. The :doc:`shibboleth` section talks about the use of ``ProxyPassMatch``.
+Licensing
+---------
+
+Dataverse allows superusers to specify the list of allowed licenses, to define which license is the default, to decide whether users can instead define custom terms, and to mark obsolete licenses as "inactive" to stop further use of them.
+These can be accomplished using the :ref:`native API ` and the :ref:`:AllowCustomTermsOfUse <:AllowCustomTermsOfUse>` setting. See also :ref:`license-config`.
+
+.. _standardizing-custom-licenses:
+
+Standardizing Custom Licenses
++++++++++++++++++++++++++++++
+
+In addition, if many datasets use the same set of Custom Terms, it may make sense to create and register a standard license including those terms. Doing this would include:
+
+- Creating and posting an external document that includes the custom terms, i.e. an HTML document with sections corresponding to the terms fields that are used.
+- Defining a name, short description, URL (where it is posted), and optionally an icon URL for this license.
+- Using the Dataverse API to register the new license as one of the options available in your installation.
+- Using the API to make sure the license is active and deciding whether the license should also be the default.
+- Once the license is registered with Dataverse, making an SQL update to change datasets/versions using that license to reference it instead of having their own copy of those custom terms.
+
+The benefits of this approach are:
+
+- usability: the license can be selected for new datasets without allowing custom terms and without users having to cut/paste terms or collection administrators having to configure templates with those terms
+- efficiency: custom terms are stored per dataset whereas licenses are registered once and all uses of it refer to the same object and external URL
+- security: with the license terms maintained external to Dataverse, users cannot edit specific terms and curators do not need to check for edits
+
+Once a standardized version of you Custom Terms are registered as a license, an SQL update like the following can be used to have datasets use it:
+
+::
+
+ UPDATE termsofuseandaccess
+ SET license_id = (SELECT license.id FROM license WHERE license.name = ''), termsofuse=null, confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null
+ WHERE termsofuseandaccess.termsofuse LIKE '%%';
+
Optional Components
-------------------
+.. _zipdownloader:
+
Standalone "Zipper" Service Tool
++++++++++++++++++++++++++++++++
-As of Dataverse Software 5.0 we offer an experimental optimization for the multi-file, download-as-zip functionality. If this option
-(``:CustomZipDownloadServiceUrl``) is enabled, instead of enforcing
-the size limit on multi-file zipped downloads (as normally specified
-by the option ``:ZipDownloadLimit``), we attempt to serve all the
-files that the user requested (that they are authorized to download),
-but the request is redirected to a standalone zipper service running
-as a cgi-bin executable under Apache. Thus moving these potentially
-long-running jobs completely outside the Application Server (Payara);
-and preventing worker threads from becoming locked serving them. Since
-zipping is also a CPU-intensive task, it is possible to have this
-service running on a different host system, freeing the cycles on the
-main Application Server. (The system running the service needs to have
-access to the database as well as to the storage filesystem, and/or S3
-bucket).
-
-Please consult the scripts/zipdownload/README.md in the Dataverse Software 5.0+ source tree for more information.
-
-To install: You can follow the instructions in the file above to build
-``ZipDownloadService-v1.0.0.jar``. It will also be available, pre-built as part of the Dataverse Software 5.0 release on GitHub. Copy it, together with the shell
-script scripts/zipdownload/cgi-bin/zipdownload to the cgi-bin
-directory of the chosen Apache server (/var/www/cgi-bin standard).
-
-Make sure the shell script (zipdownload) is executable, and edit it to configure the
-database access credentials. Do note that the executable does not need
-access to the entire Dataverse installation database. A security-conscious admin
-can create a dedicated database user with access to just one table:
-``CUSTOMZIPSERVICEREQUEST``.
-
-You may need to make extra Apache configuration changes to make sure /cgi-bin/zipdownload is accessible from the outside.
-For example, if this is the same Apache that's in front of your Dataverse installation Payara instance, you will need to add another pass through statement to your configuration:
+As of Dataverse Software 5.0 we offer an **experimental** optimization for the multi-file, download-as-zip functionality.
+If this option (``:CustomZipDownloadServiceUrl``) is enabled, instead of enforcing the size limit on multi-file zipped
+downloads (as normally specified by the option ``:ZipDownloadLimit``), we attempt to serve all the files that the user
+requested (that they are authorized to download), but the request is redirected to a standalone zipper service running
+as a cgi-bin executable under Apache. This moves these potentially long-running jobs completely outside the Application Server (Payara), and prevents worker threads from becoming locked serving them. Since zipping is also a CPU-intensive task, it is possible to have
+this service running on a different host system, freeing the cycles on the main Application Server. (The system running
+the service needs to have access to the database as well as to the storage filesystem, and/or S3 bucket).
+
+Please consult the `README at scripts/zipdownload `_
+in the Dataverse Software 5.0+ source tree for more information.
+
+To install:
+
+1. Follow the instructions in the file above to build ``zipdownloader-0.0.1.jar``. Please note that the package name and
+ the version were changed as of the release 5.10, as part of an overall cleanup and reorganization of the project
+ tree. In the releases 5.0-5.9 it existed under the name ``ZipDownloadService-v1.0.0``. (A pre-built jar file was
+ distributed under that name as part of the 5.0 release on GitHub. Aside from the name change, there have been no
+ changes in the functionality of the tool).
+2. Copy it, together with the shell script :download:`cgi-bin/zipdownload <../../../../scripts/zipdownload/cgi-bin/zipdownload>`
+ to the ``cgi-bin`` directory of the chosen Apache server (``/var/www/cgi-bin`` standard).
+3. Make sure the shell script (``zipdownload``) is executable, and edit it to configure the database access credentials.
+ Do note that the executable does not need access to the entire Dataverse installation database. A security-conscious
+ admin can create a dedicated database user with access to just one table: ``CUSTOMZIPSERVICEREQUEST``.
+
+You may need to make extra Apache configuration changes to make sure ``/cgi-bin/zipdownload`` is accessible from the outside.
+For example, if this is the same Apache that's in front of your Dataverse installation Payara instance, you will need to
+add another pass through statement to your configuration:
``ProxyPassMatch ^/cgi-bin/zipdownload !``
-Test this by accessing it directly at ``/cgi-bin/download``. You should get a ``404 No such download job!``. If instead you are getting an "internal server error", this may be an SELinux issue; try ``setenforce Permissive``. If you are getting a generic Dataverse collection "not found" page, review the ``ProxyPassMatch`` rule you have added.
+Test this by accessing it directly at ``/cgi-bin/download``. You should get a ``404 No such download job!``.
+If instead you are getting an "internal server error", this may be an SELinux issue; try ``setenforce Permissive``.
+If you are getting a generic Dataverse collection "not found" page, review the ``ProxyPassMatch`` rule you have added.
To activate in your Dataverse installation::
diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst
index e213b08c704..69bec48ed04 100644
--- a/doc/sphinx-guides/source/installation/config.rst
+++ b/doc/sphinx-guides/source/installation/config.rst
@@ -6,7 +6,7 @@ Now that you've successfully logged into your Dataverse installation with a supe
Settings within your Dataverse installation itself are managed via JVM options or by manipulating values in the ``setting`` table directly or through API calls.
-Once you have finished securing and configuring your Dataverse installation, you may proceed to the :doc:`/admin/index` for more information on the ongoing administration of a Dataverse installation. Advanced configuration topics are covered in the :doc:`r-rapache-tworavens`, :doc:`shibboleth` and :doc:`oauth2` sections.
+Once you have finished securing and configuring your Dataverse installation, you may proceed to the :doc:`/admin/index` for more information on the ongoing administration of a Dataverse installation. Advanced configuration topics are covered in the :doc:`shibboleth` and :doc:`oauth2` sections.
.. contents:: |toctitle|
:local:
@@ -112,9 +112,7 @@ The need to redirect port HTTP (port 80) to HTTPS (port 443) for security has al
Your decision to proxy or not should primarily be driven by which features of the Dataverse Software you'd like to use. If you'd like to use Shibboleth, the decision is easy because proxying or "fronting" Payara with Apache is required. The details are covered in the :doc:`shibboleth` section.
-If you'd like to use TwoRavens, you should also consider fronting with Apache because you will be required to install an Apache anyway to make use of the rApache module. For details, see the :doc:`r-rapache-tworavens` section.
-
-Even if you have no interest in Shibboleth nor TwoRavens, you may want to front your Dataverse installation with Apache or nginx to simply the process of installing SSL certificates. There are many tutorials on the Internet for adding certs to Apache, including a some `notes used by the Dataverse Project team `_, but the process of adding a certificate to Payara is arduous and not for the faint of heart. The Dataverse Project team cannot provide much help with adding certificates to Payara beyond linking to `tips `_ on the web.
+Even if you have no interest in Shibboleth, you may want to front your Dataverse installation with Apache or nginx to simply the process of installing SSL certificates. There are many tutorials on the Internet for adding certs to Apache, including a some `notes used by the Dataverse Project team `_, but the process of adding a certificate to Payara is arduous and not for the faint of heart. The Dataverse Project team cannot provide much help with adding certificates to Payara beyond linking to `tips `_ on the web.
Still not convinced you should put Payara behind another web server? Even if you manage to get your SSL certificate into Payara, how are you going to run Payara on low ports such as 80 and 443? Are you going to run Payara as root? Bad idea. This is a security risk. Under "Additional Recommendations" under "Securing Your Installation" above you are advised to configure Payara to run as a user other than root.
@@ -240,8 +238,8 @@ As for the "Remote only" authentication mode, it means that:
- ``:DefaultAuthProvider`` has been set to use the desired authentication provider
- The "builtin" authentication provider has been disabled (:ref:`api-toggle-auth-provider`). Note that disabling the "builtin" authentication provider means that the API endpoint for converting an account from a remote auth provider will not work. Converting directly from one remote authentication provider to another (i.e. from GitHub to Google) is not supported. Conversion from remote is always to "builtin". Then the user initiates a conversion from "builtin" to remote. Note that longer term, the plan is to permit multiple login options to the same Dataverse installation account per https://github.com/IQSS/dataverse/issues/3487 (so all this talk of conversion will be moot) but for now users can only use a single login option, as explained in the :doc:`/user/account` section of the User Guide. In short, "remote only" might work for you if you only plan to use a single remote authentication provider such that no conversion between remote authentication providers will be necessary.
-File Storage: Using a Local Filesystem and/or Swift and/or S3 object stores
----------------------------------------------------------------------------
+File Storage: Using a Local Filesystem and/or Swift and/or object stores
+------------------------------------------------------------------------
By default, a Dataverse installation stores all data files (files uploaded by end users) on the filesystem at ``/usr/local/payara5/glassfish/domains/domain1/files``. This path can vary based on answers you gave to the installer (see the :ref:`dataverse-installer` section of the Installation Guide) or afterward by reconfiguring the ``dataverse.files.\.directory`` JVM option described below.
@@ -386,6 +384,9 @@ of two methods described below:
1. Manually through creation of the credentials and config files or
2. Automatically via the AWS console commands.
+Some usage scenarios might be eased without generating these files. You may also provide :ref:`static credentials via
+MicroProfile Config `, see below.
+
Preparation When Using Amazon's S3 Service
##########################################
@@ -526,28 +527,69 @@ been tested already and what other options have been set for a successful integr
Lastly, go ahead and restart your Payara server. With Dataverse deployed and the site online, you should be able to upload datasets and data files and see the corresponding files in your S3 bucket. Within a bucket, the folder structure emulates that found in local file storage.
-S3 Storage Options
-##################
-
-=========================================== ================== ========================================================================== =============
-JVM Option Value Description Default value
-=========================================== ================== ========================================================================== =============
-dataverse.files.storage-driver-id Enable as the default storage driver. ``file``
-dataverse.files..bucket-name > The bucket name. See above. (none)
-dataverse.files..download-redirect ``true``/``false`` Enable direct download or proxy through Dataverse. ``false``
-dataverse.files..upload-redirect ``true``/``false`` Enable direct upload of files added to a dataset to the S3 store. ``false``
-dataverse.files..ingestsizelimit Maximum size of directupload files that should be ingested (none)
-dataverse.files..url-expiration-minutes > If direct uploads/downloads: time until links expire. Optional. 60
-dataverse.files..min-part-size > Multipart direct uploads will occur for files larger than this. Optional. ``1024**3``
-dataverse.files..custom-endpoint-url > Use custom S3 endpoint. Needs URL either with or without protocol. (none)
-dataverse.files..custom-endpoint-region > Only used when using custom endpoint. Optional. ``dataverse``
-dataverse.files..profile > Allows the use of AWS profiles for storage spanning multiple AWS accounts. (none)
-dataverse.files..proxy-url > URL of a proxy protecting the S3 store. Optional. (none)
-dataverse.files..path-style-access ``true``/``false`` Use path style buckets instead of subdomains. Optional. ``false``
-dataverse.files..payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
-dataverse.files..chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
-dataverse.files..connection-pool-size > The maximum number of open connections to the S3 server ``256``
-=========================================== ================== ========================================================================== =============
+List of S3 Storage Options
+##########################
+
+.. table::
+ :align: left
+
+ =========================================== ================== ========================================================================== =============
+ JVM Option Value Description Default value
+ =========================================== ================== ========================================================================== =============
+ dataverse.files.storage-driver-id Enable as the default storage driver. ``file``
+ dataverse.files..type ``s3`` **Required** to mark this storage as S3 based. (none)
+ dataverse.files..label > **Required** label to be shown in the UI for this storage (none)
+ dataverse.files..bucket-name > The bucket name. See above. (none)
+ dataverse.files..download-redirect ``true``/``false`` Enable direct download or proxy through Dataverse. ``false``
+ dataverse.files..upload-redirect ``true``/``false`` Enable direct upload of files added to a dataset to the S3 store. ``false``
+ dataverse.files..ingestsizelimit Maximum size of directupload files that should be ingested (none)
+ dataverse.files..url-expiration-minutes > If direct uploads/downloads: time until links expire. Optional. 60
+ dataverse.files..min-part-size > Multipart direct uploads will occur for files larger than this. Optional. ``1024**3``
+ dataverse.files..custom-endpoint-url > Use custom S3 endpoint. Needs URL either with or without protocol. (none)
+ dataverse.files..custom-endpoint-region > Only used when using custom endpoint. Optional. ``dataverse``
+ dataverse.files..profile > Allows the use of AWS profiles for storage spanning multiple AWS accounts. (none)
+ dataverse.files..proxy-url > URL of a proxy protecting the S3 store. Optional. (none)
+ dataverse.files..path-style-access ``true``/``false`` Use path style buckets instead of subdomains. Optional. ``false``
+ dataverse.files..payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
+ dataverse.files..chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
+ dataverse.files..connection-pool-size > The maximum number of open connections to the S3 server ``256``
+ =========================================== ================== ========================================================================== =============
+
+.. table::
+ :align: left
+
+ =========================================== ================== ========================================================================== =============
+ MicroProfile Config Option Value Description Default value
+ =========================================== ================== ========================================================================== =============
+ dataverse.files..access-key > :ref:`Provide static access key ID. Read before use! ` ``""``
+ dataverse.files..secret-key > :ref:`Provide static secret access key. Read before use! ` ``""``
+ =========================================== ================== ========================================================================== =============
+
+
+.. _s3-mpconfig:
+
+Credentials via MicroProfile Config
+###################################
+
+Optionally, you may provide static credentials for each S3 storage using MicroProfile Config options:
+
+- ``dataverse.files..access-key`` for this storages "access key ID"
+- ``dataverse.files..secret-key`` for this storages "secret access key"
+
+You may provide the values for these via any of the
+`supported config sources `_.
+
+**WARNING:**
+
+*For security, do not use the sources "environment variable" or "system property" (JVM option) in a production context!*
+*Rely on password alias, secrets directory or cloud based sources instead!*
+
+**NOTE:**
+
+1. Providing both AWS CLI profile files (as setup in first step) and static keys, credentials from ``~/.aws``
+ will win over configured keys when valid!
+2. A non-empty ``dataverse.files..profile`` will be ignored when no credentials can be found for this profile name.
+ Current codebase does not make use of "named profiles" as seen for AWS CLI besides credentials.
Reported Working S3-Compatible Storage
######################################
@@ -557,6 +599,11 @@ Reported Working S3-Compatible Storage
**Can be used for quick testing, too:** just use the example values above. Uses the public (read: unsecure and
possibly slow) https://play.minio.io:9000 service.
+`StorJ Object Store `_
+ StorJ is a distributed object store that can be configured with an S3 gateway. Per the S3 Storage instructions above, you'll first set up the StorJ S3 store by defining the id, type, and label. After following the general installation, set the following configurations to use a StorJ object store: ``dataverse.files..payload-signing=true`` and ``dataverse.files..chunked-encoding=false``.
+
+ Note that for direct uploads and downloads, Dataverse redirects to the proxy-url but presigns the urls based on the ``dataverse.files..custom-endpoint-url``. Also, note that if you choose to enable ``dataverse.files..download-redirect`` the S3 URLs expire after 60 minutes by default. You can change that minute value to reflect a timeout value that’s more appropriate by using ``dataverse.files..url-expiration-minutes``.
+
`Surf Object Store v2019-10-30 `_
Set ``dataverse.files..payload-signing=true`` and ``dataverse.files..chunked-encoding=false`` to use Surf Object
Store.
@@ -610,7 +657,7 @@ If you prefer to start with less of a blank slate, you can review the custom hom
Note that the ``custom-homepage.html`` file provided has multiple elements that assume your root Dataverse collection still has an alias of "root". While you were branding your root Dataverse collection, you may have changed the alias to "harvard" or "librascholar" or whatever and you should adjust the custom homepage code as needed.
-For more background on what this curl command above is doing, see the "Database Settings" section below. If you decide you'd like to remove this setting, use the following curl command:
+For more background on what this curl command above is doing, see the :ref:`database-settings` section below. If you decide you'd like to remove this setting, use the following curl command:
``curl -X DELETE http://localhost:8080/api/admin/settings/:HomePageCustomizationFile``
@@ -680,9 +727,10 @@ When a user selects one of the available choices, the Dataverse user interfaces
Allowing the Language Used for Dataset Metadata to be Specified
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-Since dataset metadata can only be entered in one language, and administrators may wish to limit which languages metadata can be entered in, Dataverse also offers a separate setting defining allowed metadata languages.
-The presence of the :ref:`:MetadataLanguages` database setting identifies the available options (which can be different from those in the :Languages setting above, with fewer or more options).
-Dataverse collection admins can select from these options to indicate which language should be used for new Datasets created with that specific collection.
+Since dataset metadata can only be entered in one language, and administrators may wish to limit which languages metadata can be entered in, Dataverse also offers a separate setting defining allowed metadata languages.
+The presence of the :ref:`:MetadataLanguages` database setting identifies the available options (which can be different from those in the :Languages setting above, with fewer or more options).
+
+Dataverse collection admins can select from these options to indicate which language should be used for new Datasets created with that specific collection. If they do not, users will be asked when creating a dataset to select the language they want to use when entering metadata.
When creating or editing a dataset, users will be asked to enter the metadata in that language. The metadata language selected will also be shown when dataset metadata is viewed and will be included in metadata exports (as appropriate for each format) for published datasets:
@@ -798,6 +846,65 @@ For Google Analytics, the example script at :download:`analytics-code.html `_ are provided below. Note that a new installation of Dataverse already includes CC0 and CC BY.
+
+- :download:`licenseCC0-1.0.json <../../../../scripts/api/data/licenses/licenseCC0-1.0.json>`
+- :download:`licenseCC-BY-4.0.json <../../../../scripts/api/data/licenses/licenseCC-BY-4.0.json>`
+- :download:`licenseCC-BY-SA-4.0.json <../../../../scripts/api/data/licenses/licenseCC-BY-SA-4.0.json>`
+- :download:`licenseCC-BY-NC-4.0.json <../../../../scripts/api/data/licenses/licenseCC-BY-NC-4.0.json>`
+- :download:`licenseCC-BY-NC-SA-4.0.json <../../../../scripts/api/data/licenses/licenseCC-BY-NC-SA-4.0.json>`
+- :download:`licenseCC-BY-ND-4.0.json <../../../../scripts/api/data/licenses/licenseCC-BY-ND-4.0.json>`
+- :download:`licenseCC-BY-NC-ND-4.0.json <../../../../scripts/api/data/licenses/licenseCC-BY-NC-ND-4.0.json>`
+
+.. _adding-custom-licenses:
+
+Adding Custom Licenses
+^^^^^^^^^^^^^^^^^^^^^^
+
+If you are interested in adding a custom license, you will need to create your own JSON file as explained in see :ref:`standardizing-custom-licenses`.
+
+Removing Licenses
++++++++++++++++++
+
+Licenses can be removed with a curl command as explained in the API Guide under :ref:`license-management-api`.
+
+Disabling Custom Dataset Terms
+++++++++++++++++++++++++++++++
+
+See :ref:`:AllowCustomTermsOfUse` for how to disable the "Custom Dataset Terms" option.
+
.. _BagIt Export:
BagIt Export
@@ -1108,26 +1215,27 @@ Can also be set via *MicroProfile Config API* sources, e.g. the environment vari
dataverse.rserve.host
+++++++++++++++++++++
-Configuration for :doc:`r-rapache-tworavens`.
+Host name for Rserve, used for tasks that require use of R (to ingest RData files and to save tabular data as RData frames).
dataverse.rserve.port
+++++++++++++++++++++
-Configuration for :doc:`r-rapache-tworavens`.
+Port number for Rserve, used for tasks that require use of R (to ingest RData files and to save tabular data as RData frames).
dataverse.rserve.user
+++++++++++++++++++++
-Configuration for :doc:`r-rapache-tworavens`.
-
-dataverse.rserve.tempdir
-++++++++++++++++++++++++
-Configuration for :doc:`r-rapache-tworavens`.
+Username for Rserve, used for tasks that require use of R (to ingest RData files and to save tabular data as RData frames).
dataverse.rserve.password
+++++++++++++++++++++++++
-Configuration for :doc:`r-rapache-tworavens`.
+Password for Rserve, used for tasks that require use of R (to ingest RData files and to save tabular data as RData frames).
+
+dataverse.rserve.tempdir
+++++++++++++++++++++++++
+
+Temporary directory used by Rserve (defaults to /tmp/Rserv). Note that this location is local to the host on which Rserv is running (specified in ``dataverse.rserve.host`` above). When talking to Rserve, Dataverse needs to know this location in order to generate absolute path names of the files on the other end.
.. _dataverse.dropbox.key:
@@ -1678,6 +1786,8 @@ Note: After making a change to this setting, a reExportAll needs to be run befor
This will *force* a re-export of every published, local dataset, regardless of whether it has already been exported or not.
+The call returns a status message informing the administrator, that the process has been launched (``{"status":"WORKFLOW_IN_PROGRESS"}``). The administrator can check the progress of the process via log files: ``[Payara directory]/glassfish/domains/domain1/logs/export_[time stamp].log``.
+
:NavbarAboutUrl
+++++++++++++++
@@ -1747,7 +1857,12 @@ Notes:
- For larger file upload sizes, you may need to configure your reverse proxy timeout. If using apache2 (httpd) with Shibboleth, add a timeout to the ProxyPass defined in etc/httpd/conf.d/ssl.conf (which is described in the :doc:`/installation/shibboleth` setup).
+:MultipleUploadFilesLimit
++++++++++++++++++++++++++
+This setting controls the number of files that can be uploaded through the UI at once. The default is 1000. It should be set to 1 or higher since 0 has no effect. To limit the number of files in a zip file, see ``:ZipUploadFilesLimit``.
+
+``curl -X PUT -d 500 http://localhost:8080/api/admin/settings/:MultipleUploadFilesLimit``
:ZipDownloadLimit
+++++++++++++++++
@@ -1824,16 +1939,6 @@ In the example below we reduce the timeout to 4 hours:
``curl -X PUT -d 240 http://localhost:8080/api/admin/settings/:LoginSessionTimeout``
-:TwoRavensUrl
-+++++++++++++
-
-The ``:TwoRavensUrl`` option is no longer valid. See :doc:`r-rapache-tworavens` and the :doc:`/admin/external-tools` section of the Admin Guide.
-
-:TwoRavensTabularView
-+++++++++++++++++++++
-
-The ``:TwoRavensTabularView`` option is no longer valid. See :doc:`r-rapache-tworavens` and the :doc:`/admin/external-tools` section of the Admin Guide.
-
.. _:DatasetPublishPopupCustomText:
:DatasetPublishPopupCustomText
@@ -2229,7 +2334,7 @@ See :ref:`i18n` for a curl example and related settings.
:MetadataLanguages
++++++++++++++++++
-Sets which languages can be used when entering dataset metadata.
+Sets which languages can be used when entering dataset metadata.
See :ref:`i18n` for further discussion, a curl example, and related settings.
@@ -2265,7 +2370,10 @@ If you don’t want date facets to be sorted chronologically, set:
:CustomZipDownloadServiceUrl
++++++++++++++++++++++++++++
-The location of the "Standalone Zipper" service. If this option is specified, the Dataverse installation will be redirecing bulk/mutli-file zip download requests to that location, instead of serving them internally. See the "Advanced" section of the Installation guide for information on how to install the external zipper. (This is still an experimental feature, as of Dataverse Software 5.0).
+The location of the "Standalone Zipper" service. If this option is specified, the Dataverse installation will be
+redirecing bulk/multi-file zip download requests to that location, instead of serving them internally.
+See :ref:`zipdownloader` of the Advanced Installation guide for information on how to install the external zipper.
+(This is still an **experimental** feature, as of Dataverse Software 5.0).
To enable redirects to the zipper installed on the same server as the main Dataverse Software application:
@@ -2273,7 +2381,7 @@ To enable redirects to the zipper installed on the same server as the main Datav
To enable redirects to the zipper on a different server:
-``curl -X PUT -d 'https://zipper.example.edu/cgi-bin/zipdownload' http://localhost:8080/api/admin/settings/:CustomZipDownloadServiceUrl``
+``curl -X PUT -d 'https://zipper.example.edu/cgi-bin/zipdownload' http://localhost:8080/api/admin/settings/:CustomZipDownloadServiceUrl``
:ArchiverClassName
++++++++++++++++++
@@ -2367,8 +2475,7 @@ Also refer to the "Datafile Integrity" API :ref:`datafile-integrity`
:SendNotificationOnDatasetCreation
++++++++++++++++++++++++++++++++++
-A boolean setting that, if true will send an email and notification to users when a Dataset is created. Messages go to those, other than the dataset creator,
- who have the ability/permission necessary to publish the dataset. The intent of this functionality is to simplify tracking activity and planning to follow-up contact.
+A boolean setting that, if true, will send an email and notification to users when a Dataset is created. Messages go to those, other than the dataset creator, who have the ability/permission necessary to publish the dataset. The intent of this functionality is to simplify tracking activity and planning to follow-up contact.
``curl -X PUT -d true http://localhost:8080/api/admin/settings/:SendNotificationOnDatasetCreation``
@@ -2379,7 +2486,7 @@ A boolean setting that, if true will send an email and notification to users whe
A JSON-structured setting that configures Dataverse to associate specific metadatablock fields with external vocabulary services and specific vocabularies/sub-vocabularies managed by that service. More information about this capability is available at :doc:`/admin/metadatacustomization`.
-Scripts that implement this association for specific service protocols are maintained at https://github.com/gdcc/dataverse-external-vocab-support. That repository also includes a json-schema for validating the structure required by this setting along with an example metadatablock and sample :CVocConf setting values associating entries in the example block with ORCID and SKOSMOS based services.
+Scripts that implement this association for specific service protocols are maintained at https://github.com/gdcc/dataverse-external-vocab-support. That repository also includes a json-schema for validating the structure required by this setting along with an example metadatablock and sample :CVocConf setting values associating entries in the example block with ORCID and SKOSMOS based services.
``wget https://gdcc.github.io/dataverse-external-vocab-support/examples/config/cvoc-conf.json``
@@ -2389,23 +2496,32 @@ Scripts that implement this association for specific service protocols are maint
:AllowedCurationLabels
++++++++++++++++++++++
-
-A JSON Object containing lists of allowed labels (up to 32 characters, spaces allowed) that can be set, via API or UI by users with the permission to publish a dataset. The set of labels allowed
-for datasets can be selected by a superuser - via the Dataverse collection page (Edit/General Info) or set via API call.
-The labels in a set should correspond to the states in an organization's curation process and are intended to help users/curators track the progress of a dataset through a defined curation process.
-A dataset may only have one label at a time and if a label is set, it will be removed at publication time.
+
+A JSON Object containing lists of allowed labels (up to 32 characters, spaces allowed) that can be set, via API or UI by users with the permission to publish a dataset. The set of labels allowed
+for datasets can be selected by a superuser - via the Dataverse collection page (Edit/General Info) or set via API call.
+The labels in a set should correspond to the states in an organization's curation process and are intended to help users/curators track the progress of a dataset through a defined curation process.
+A dataset may only have one label at a time and if a label is set, it will be removed at publication time.
This functionality is disabled when this setting is empty/not set.
Each set of labels is identified by a curationLabelSet name and a JSON Array of the labels allowed in that set.
``curl -X PUT -d '{"Standard Process":["Author contacted", "Privacy Review", "Awaiting paper publication", "Final Approval"], "Alternate Process":["State 1","State 2","State 3"]}' http://localhost:8080/api/admin/settings/:AllowedCurationLabels``
+.. _:AllowCustomTermsOfUse:
+
+:AllowCustomTermsOfUse
+++++++++++++++++++++++
+
+By default, custom terms of data use and access can be specified after selecting "Custom Terms" from the License/DUA dropdown on the Terms tab. When ``:AllowCustomTermsOfUse`` is set to ``false`` the "Custom Terms" item is not made available to the depositor.
+
+``curl -X PUT -d false http://localhost:8080/api/admin/settings/:AllowCustomTermsOfUse``
+
.. _:MaxEmbargoDurationInMonths:
:MaxEmbargoDurationInMonths
+++++++++++++++++++++++++++
-This setting controls whether embargoes are allowed in a Dataverse instance and can limit the maximum duration users are allowed to specify. A value of 0 months or non-existent
-setting indicates embargoes are not supported. A value of -1 allows embargoes of any length. Any other value indicates the maximum number of months (from the current date) a user
+This setting controls whether embargoes are allowed in a Dataverse instance and can limit the maximum duration users are allowed to specify. A value of 0 months or non-existent
+setting indicates embargoes are not supported. A value of -1 allows embargoes of any length. Any other value indicates the maximum number of months (from the current date) a user
can enter for an embargo end date. This limit will be enforced in the popup dialog in which users enter the embargo date. For example, to set a two year maximum:
``curl -X PUT -d 24 http://localhost:8080/api/admin/settings/:MaxEmbargoDurationInMonths``
diff --git a/doc/sphinx-guides/source/installation/img/3webservers.png b/doc/sphinx-guides/source/installation/img/3webservers.png
index f072411ac10..b8bd222a56f 100644
Binary files a/doc/sphinx-guides/source/installation/img/3webservers.png and b/doc/sphinx-guides/source/installation/img/3webservers.png differ
diff --git a/doc/sphinx-guides/source/installation/img/tworavens_components.png b/doc/sphinx-guides/source/installation/img/tworavens_components.png
deleted file mode 100644
index 23952d74d05..00000000000
Binary files a/doc/sphinx-guides/source/installation/img/tworavens_components.png and /dev/null differ
diff --git a/doc/sphinx-guides/source/installation/img/tworavens_components.sh b/doc/sphinx-guides/source/installation/img/tworavens_components.sh
deleted file mode 100755
index 234bcf15a05..00000000000
--- a/doc/sphinx-guides/source/installation/img/tworavens_components.sh
+++ /dev/null
@@ -1,3 +0,0 @@
-#!/bin/bash -x
-java -jar ~/lib/plantuml.jar -tpng tworavens_components.uml
-#java -jar ~/bin/plantuml.jar -tsvg tworavens_components.uml
diff --git a/doc/sphinx-guides/source/installation/img/tworavens_components.uml b/doc/sphinx-guides/source/installation/img/tworavens_components.uml
deleted file mode 100644
index 6e64b60f672..00000000000
--- a/doc/sphinx-guides/source/installation/img/tworavens_components.uml
+++ /dev/null
@@ -1,26 +0,0 @@
-//http://plantuml.com/component.html#Component
-@startuml
-
-node "Server" {
- component "Apache" {
- component "TwoRavens (static content)" as Static {
- }
- component "rApache" {
- }
- }
- component "Glassfish" as Glassfish {
- component "Dataverse" {
- }
- }
- rApache <-- Dataverse
- component "Rserve" {
- }
-}
-
-Browser <-- Static
-Browser <--> rApache
-Browser <-- Dataverse
-
-Dataverse <--> Rserve
-
-@enduml
diff --git a/doc/sphinx-guides/source/installation/img/tworavens_test_empty.png b/doc/sphinx-guides/source/installation/img/tworavens_test_empty.png
deleted file mode 100644
index aa33825baa5..00000000000
Binary files a/doc/sphinx-guides/source/installation/img/tworavens_test_empty.png and /dev/null differ
diff --git a/doc/sphinx-guides/source/installation/img/tworavens_test_file_ingested.png b/doc/sphinx-guides/source/installation/img/tworavens_test_file_ingested.png
deleted file mode 100644
index d9505d22da6..00000000000
Binary files a/doc/sphinx-guides/source/installation/img/tworavens_test_file_ingested.png and /dev/null differ
diff --git a/doc/sphinx-guides/source/installation/img/tworavens_test_init.png b/doc/sphinx-guides/source/installation/img/tworavens_test_init.png
deleted file mode 100644
index ab94a91a1a7..00000000000
Binary files a/doc/sphinx-guides/source/installation/img/tworavens_test_init.png and /dev/null differ
diff --git a/doc/sphinx-guides/source/installation/img/tworavens_test_output.png b/doc/sphinx-guides/source/installation/img/tworavens_test_output.png
deleted file mode 100644
index b75bf1126ea..00000000000
Binary files a/doc/sphinx-guides/source/installation/img/tworavens_test_output.png and /dev/null differ
diff --git a/doc/sphinx-guides/source/installation/img/tworavens_test_select_model.png b/doc/sphinx-guides/source/installation/img/tworavens_test_select_model.png
deleted file mode 100644
index dda8aa75153..00000000000
Binary files a/doc/sphinx-guides/source/installation/img/tworavens_test_select_model.png and /dev/null differ
diff --git a/doc/sphinx-guides/source/installation/img/tworavens_test_select_var.png b/doc/sphinx-guides/source/installation/img/tworavens_test_select_var.png
deleted file mode 100644
index 91bcfb1be57..00000000000
Binary files a/doc/sphinx-guides/source/installation/img/tworavens_test_select_var.png and /dev/null differ
diff --git a/doc/sphinx-guides/source/installation/index.rst b/doc/sphinx-guides/source/installation/index.rst
index 931c4016394..1965448aedb 100755
--- a/doc/sphinx-guides/source/installation/index.rst
+++ b/doc/sphinx-guides/source/installation/index.rst
@@ -16,7 +16,6 @@ Installation Guide
installation-main
config
upgrading
- r-rapache-tworavens
shibboleth
oauth2
oidc
diff --git a/doc/sphinx-guides/source/installation/installation-main.rst b/doc/sphinx-guides/source/installation/installation-main.rst
index 5d337aeabe3..0042a0db6d8 100755
--- a/doc/sphinx-guides/source/installation/installation-main.rst
+++ b/doc/sphinx-guides/source/installation/installation-main.rst
@@ -22,7 +22,7 @@ You should have already downloaded the installer from https://github.com/IQSS/da
Unpack the zip file - this will create the directory ``dvinstall``.
-**Important:** The installer will need to use the PostgreSQL command line utility ``psql`` in order to configure the database. If the executable is not in your system PATH, the installer will try to locate it on your system. However, we strongly recommend that you check and make sure it is in the PATH. This is especially important if you have multiple versions of PostgreSQL installed on your system. Make sure the psql that came with the version that you want to use with your Dataverse installation is the first on your path. For example, if the PostgreSQL distribution you are running is installed in /Library/PostgreSQL/9.6, add /Library/PostgreSQL/9.6/bin to the beginning of your $PATH variable. If you are *running* multiple PostgreSQL servers, make sure you know the port number of the one you want to use, as the installer will need it in order to connect to the database (the first PostgreSQL distribution installed on your system is likely using the default port 5432; but the second will likely be on 5433, etc.) Does every word in this paragraph make sense? If it does, great - because you definitely need to be comfortable with basic system tasks in order to install the Dataverse Software. If not - if you don't know how to check where your PostgreSQL is installed, or what port it is running on, or what a $PATH is... it's not too late to stop. Because it will most likely not work. And if you contact us for help, these will be the questions we'll be asking you - so, again, you need to be able to answer them comfortably for it to work.
+**Important:** The installer will need to use the PostgreSQL command line utility ``psql`` in order to configure the database. If the executable is not in your system PATH, the installer will try to locate it on your system. However, we strongly recommend that you check and make sure it is in the PATH. This is especially important if you have multiple versions of PostgreSQL installed on your system. Make sure the psql that came with the version that you want to use with your Dataverse installation is the first on your path. For example, if the PostgreSQL distribution you are running is installed in /Library/PostgreSQL/13, add /Library/PostgreSQL/13/bin to the beginning of your $PATH variable. If you are *running* multiple PostgreSQL servers, make sure you know the port number of the one you want to use, as the installer will need it in order to connect to the database (the first PostgreSQL distribution installed on your system is likely using the default port 5432; but the second will likely be on 5433, etc.) Does every word in this paragraph make sense? If it does, great - because you definitely need to be comfortable with basic system tasks in order to install the Dataverse Software. If not - if you don't know how to check where your PostgreSQL is installed, or what port it is running on, or what a $PATH is... it's not too late to stop. Because it will most likely not work. And if you contact us for help, these will be the questions we'll be asking you - so, again, you need to be able to answer them comfortably for it to work.
**It is no longer necessary to run the installer as root!**
@@ -131,7 +131,7 @@ Next you'll want to check out the :doc:`config` section, especially the section
Troubleshooting
---------------
-If the following doesn't apply, please get in touch as explained in the :doc:`intro`. You may be asked to provide ``payara5/glassfish/domains/domain1/logs/server.log`` for debugging.
+If the following doesn't apply, please get in touch as explained in :ref:`support`.
Dataset Cannot Be Published
^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -254,3 +254,8 @@ Rerun Installer
With all the data cleared out, you should be ready to rerun the installer per above.
Related to all this is a series of scripts at https://github.com/IQSS/dataverse/blob/develop/scripts/deploy/phoenix.dataverse.org/deploy that Dataverse Project Team and Community developers use have the test server http://phoenix.dataverse.org rise from the ashes before integration tests are run against it. For more on this topic, see :ref:`rebuilding-dev-environment` section of the Developer Guide.
+
+Getting Support for Installation Trouble
+----------------------------------------
+
+See :ref:`support`.
diff --git a/doc/sphinx-guides/source/installation/intro.rst b/doc/sphinx-guides/source/installation/intro.rst
index 4dd5f9e8795..2251af7b81b 100644
--- a/doc/sphinx-guides/source/installation/intro.rst
+++ b/doc/sphinx-guides/source/installation/intro.rst
@@ -39,6 +39,17 @@ To get help installing or configuring a Dataverse installation, please try one o
- asking at http://chat.dataverse.org
- emailing support@dataverse.org to open a private ticket at https://help.hmdc.harvard.edu
+Information to Send to Support When Installation Fails
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If you've encountered a problem installing Dataverse and are ready to ask for help, please consider sending along the following information so that the Dataverse team and community can more easily assist you.
+
+- Version of Dataverse you are trying to install.
+- Operating system (usually a Linux distribution) and version.
+- Output from the installer (STDOUT, STDERR).
+- The ``scripts/api/setup-all.*.log`` files left behind by the installer.
+- The ``server.log`` file from Payara (by default at ``/usr/local/payara5/glassfish/domains/domain1/logs/server.log``).
+
Improving this Guide
--------------------
diff --git a/doc/sphinx-guides/source/installation/prep.rst b/doc/sphinx-guides/source/installation/prep.rst
index 7be6da6584b..c841cd55fb3 100644
--- a/doc/sphinx-guides/source/installation/prep.rst
+++ b/doc/sphinx-guides/source/installation/prep.rst
@@ -82,7 +82,7 @@ A basic Dataverse installation runs fine on modest hardware. For example, as of
In contrast, before we moved it to the Amazon Cloud, the production installation at https://dataverse.harvard.edu was backed by six servers with two Intel Xeon 2.53 Ghz CPUs and either 48 or 64 GB of RAM. The three servers with 48 GB of RAM run were web frontends running Glassfish 4 and Apache and were load balanced by a hardware device. The remaining three servers with 64 GB of RAM were the primary and backup database servers and a server dedicated to running Rserve. Multiple TB of storage were mounted from a SAN via NFS.
-Currently, the Harvard Dataverse Repository is served by four AWS server nodes: two "m4.4xlarge" instances (64GB/16 vCPU) as web frontends, one 32GB/8 vCPU ("m4.2xlarge") instance for the Solr search engine, and one 16GB/4 vCPU ("m4.xlarge") instance for R and TwoRavens. The PostgreSQL database is served by Amazon RDS, and physical files are stored on Amazon S3.
+Currently, the Harvard Dataverse Repository is served by four AWS server nodes: two "m4.4xlarge" instances (64GB/16 vCPU) as web frontends, one 32GB/8 vCPU ("m4.2xlarge") instance for the Solr search engine, and one 16GB/4 vCPU ("m4.xlarge") instance for R. The PostgreSQL database is served by Amazon RDS, and physical files are stored on Amazon S3.
The Dataverse Software installation script will attempt to give your app server the right amount of RAM based on your system.
@@ -106,6 +106,7 @@ Here are some questions to keep in the back of your mind as you test and move in
- Do I want to to run my app server on the standard web ports (80 and 443) or do I want to "front" my app server with a proxy such as Apache or nginx? See "Network Ports" in the :doc:`config` section.
- How many points of failure am I willing to tolerate? How much complexity do I want?
- How much does it cost to subscribe to a service to create persistent identifiers such as DOIs or handles?
+- What licenses should I make available to my users?
Next Steps
----------
diff --git a/doc/sphinx-guides/source/installation/prerequisites.rst b/doc/sphinx-guides/source/installation/prerequisites.rst
index e03b24ef7e6..6778fe6a236 100644
--- a/doc/sphinx-guides/source/installation/prerequisites.rst
+++ b/doc/sphinx-guides/source/installation/prerequisites.rst
@@ -98,7 +98,7 @@ PostgreSQL
Installing PostgreSQL
=======================
-The application has been tested with PostgreSQL versions up to 13. We recommend installing the latest version that is available for your OS distribution. *For example*, to install PostgreSQL 13 under RHEL7/derivative::
+The application has been tested with PostgreSQL versions up to 13 and version 10+ is required. We recommend installing the latest version that is available for your OS distribution. *For example*, to install PostgreSQL 13 under RHEL7/derivative::
# yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
# yum makecache fast
@@ -157,7 +157,7 @@ The Dataverse Software search index is powered by Solr.
Supported Versions
==================
-The Dataverse Software has been tested with Solr version 8.8.1. Future releases in the 8.x series are likely to be compatible; however, this cannot be confirmed until they are officially tested. Major releases above 8.x (e.g. 9.x) are not supported.
+The Dataverse Software has been tested with Solr version 8.11.1. Future releases in the 8.x series are likely to be compatible; however, this cannot be confirmed until they are officially tested. Major releases above 8.x (e.g. 9.x) are not supported.
Installing Solr
===============
@@ -172,19 +172,19 @@ Become the ``solr`` user and then download and configure Solr::
su - solr
cd /usr/local/solr
- wget https://archive.apache.org/dist/lucene/solr/8.8.1/solr-8.8.1.tgz
- tar xvzf solr-8.8.1.tgz
- cd solr-8.8.1
+ wget https://archive.apache.org/dist/lucene/solr/8.11.1/solr-8.11.1.tgz
+ tar xvzf solr-8.11.1.tgz
+ cd solr-8.11.1
cp -r server/solr/configsets/_default server/solr/collection1
You should already have a "dvinstall.zip" file that you downloaded from https://github.com/IQSS/dataverse/releases . Unzip it into ``/tmp``. Then copy the files into place::
- cp /tmp/dvinstall/schema*.xml /usr/local/solr/solr-8.8.1/server/solr/collection1/conf
- cp /tmp/dvinstall/solrconfig.xml /usr/local/solr/solr-8.8.1/server/solr/collection1/conf
+ cp /tmp/dvinstall/schema*.xml /usr/local/solr/solr-8.11.1/server/solr/collection1/conf
+ cp /tmp/dvinstall/solrconfig.xml /usr/local/solr/solr-8.11.1/server/solr/collection1/conf
Note: The Dataverse Project team has customized Solr to boost results that come from certain indexed elements inside the Dataverse installation, for example prioritizing results from Dataverse collections over Datasets. If you would like to remove this, edit your ``solrconfig.xml`` and remove the ```` element and its contents. If you have ideas about how this boosting could be improved, feel free to contact us through our Google Group https://groups.google.com/forum/#!forum/dataverse-dev .
-A Dataverse installation requires a change to the ``jetty.xml`` file that ships with Solr. Edit ``/usr/local/solr/solr-8.8.1/server/etc/jetty.xml`` , increasing ``requestHeaderSize`` from ``8192`` to ``102400``
+A Dataverse installation requires a change to the ``jetty.xml`` file that ships with Solr. Edit ``/usr/local/solr/solr-8.11.1/server/etc/jetty.xml`` , increasing ``requestHeaderSize`` from ``8192`` to ``102400``
Solr will warn about needing to increase the number of file descriptors and max processes in a production environment but will still run with defaults. We have increased these values to the recommended levels by adding ulimit -n 65000 to the init script, and the following to ``/etc/security/limits.conf``::
@@ -203,7 +203,7 @@ Solr launches asynchronously and attempts to use the ``lsof`` binary to watch fo
Finally, you need to tell Solr to create the core "collection1" on startup::
- echo "name=collection1" > /usr/local/solr/solr-8.8.1/server/solr/collection1/core.properties
+ echo "name=collection1" > /usr/local/solr/solr-8.11.1/server/solr/collection1/core.properties
Solr Init Script
================
@@ -297,11 +297,7 @@ installation. It will allow you to ingest R (.RData) files as tabular
data and to export tabular data as .RData files. R can be considered an optional component, meaning
that if you don't have R installed, you will still be able to run and
use the Dataverse Software - but the functionality specific to tabular data
-mentioned above will not be available to your users. **Note** that if
-you choose to also install `TwoRavens
-`_, it will require some extra R
-components and libraries. Please consult the instructions in the
-:doc:`/installation/r-rapache-tworavens/` section of the Installation Guide.
+mentioned above will not be available to your users.
Installing R
diff --git a/doc/sphinx-guides/source/installation/r-rapache-tworavens.rst b/doc/sphinx-guides/source/installation/r-rapache-tworavens.rst
deleted file mode 100644
index b42c7272dfb..00000000000
--- a/doc/sphinx-guides/source/installation/r-rapache-tworavens.rst
+++ /dev/null
@@ -1,617 +0,0 @@
-.. role:: fixedwidthplain
-
-TwoRavens
-=========
-
-TwoRavens is a web application for tabular data exploration and statistical analysis.
-It can be integrated with your Dataverse installation, as an **optional** component. While TwoRavens was originally created at IQSS, its developers have since left the organization. Plans for the future of the Dataverse Project/TwoRavens collaboration are still being worked out. As such, **support for TwoRavens is somewhat limited at the
-moment (as of Spring of 2017).**
-
-Please note that in the text below, Glassfish was changed to Payara but not tested.
-
-Any questions regarding the features of TwoRavens, bug reports and
-such, should be addressed directly to the developers of the
-application. The `TwoRavens GitHub repository
-`_ and the `TwoRavens project page
-`_ are good places to start.
-
-For now, the Dataverse Project team will continue providing
-installation and integration support. We have created a new (as
-of Dataverse Software 4.6.1) version of the installer scripts and updated this guide. We have tried to improve and simplify the
-installation process, particularly the difficult process of installing
-correct versions of the required third party R packages.
-
-**Note that the installation process below supersedes the basic R
-setup described in the "Prerequisites" portion of the Installation
-Guide. Meaning that once completed, it installs everything needed to
-run TwoRavens, PLUS all the libraries and components required to
-ingest RData files and export as RData.**
-
-
-
-Please be warned:
-
-- This process may still require some system administration skills.
-- The guide below is very Linux-specific. This process has been tested
- on RedHat/derivative servers only. In some ways it *may* actually be
- easier to get it all installed on MacOS X (because
- MacOS X versions of third party R packages are available
- pre-compiled), or even on Windows. But it hasn't been attempted, and
- is not supported by the Dataverse Project team.
-
-In addition to the TwoRavens web application proper, several required
-components need to be installed and configured. This includes R,
-rApache and a collection of required third-party R packages. The
-installation steps for these components are described in the
-individual sections of the document below.
-
-.. contents:: |toctitle|
- :local:
-
-0. Overview
-+++++++++++
-
-TwoRavens is itself a compact JavaScript application that **runs on the user's
-browser**. These JavaScript files, and the accompanying HTML, CSS, etc. files
-are served by an HTTP server (Apache) as static objects.
-
-The statistical calculations are performed by R programs that run **on the server**.
-`rApache `_ is used as the web front end for R on the server, so
-that the browser application can talk to R over HTTP.
-
-See the :ref:`advanced` section of :doc:`prep` for an example of running various components on more than one server.
-
-TwoRavens will need to obtain some tabular-data-specific metadata from
-the Dataverse installation -- the DDI fragment that describes the variables and some pre-processed summary statistics for the data vectors. In order to produce the latter, the Dataverse Software application also needs to be able to execute some R code on the server. Instead of
-``rApache``, the Dataverse Software uses `Rserve `_ to
-communicate to R. Rserve is installed as a "contributor" R package. It runs as a
-daemon process on the server, accepting network connections on a dedicated port.
-The Dataverse Software supplies an :fixedwidthplain:`init.d`-style startup file for the
-daemon. The R setup in step ``2.`` will set it up so that the daemon gets started
-automatically when the system boots.
-
-When a user requests to run
-a statistical model on a data file, TwoRavens will instruct the R code on the
-server to download the file **directly from the Dataverse Software application**. Access
-URLs need to be configured for this to work properly (this is done by the TwoRavens
-installer script in step ``3.``)
-
-If you install all components on a single server and front the app server with Apache
-(see :ref:`network-ports` under the :doc:`config` section), the component and
-data flow diagram might looks something like this:
-
-|tworavens_components|
-
-In addition to Rserve, there are 14 more R library packages that the TwoRavens R
-code requires in order to run. These in turn require 30 more as their own dependencies,
-so a total of 45 packages must be installed. "Installed" in the
-context of an R package means R must download the **source code** from the `CRAN
-`_ code repository and compile it locally. This
-historically has been the trickiest, least stable part of the installation process,
-since the packages in question are being constantly (and independently) developed.
-This means that every time you attempt to install these packages, you are building
-from potentially different versions of the source code. An incompatibility introduced
-between any two of the packages can result in a failure to install. In this release
-we have attempted to resolve this by installing the **specific versions of the R
-packages that have been proven** to work together. If you have attempted to
-install TwoRavens in the past, and it didn't work, please see the part of
-section ``1.b.`` where we explain how to completely erase all the previously
-built packages.
-
-1. Prerequisites
-++++++++++++++++
-
-a. httpd (Apache):
-------------------
-
-It's probably installed already, but if not:
-
-``yum install httpd``
-
-This rApache configuration does not work with SELinux. Execute the following commands
-to disable SELinux:
-
-``setenforce permissive``
-
-``getenforce``
-
-(Note: If you can get rApache to work with SELinux, we encourage you to make a pull request! Please see the :doc:`/developers/selinux` section of the Developer Guide to get started.)
-
-If you choose to serve TwoRavens and run rApache under :fixedwidthplain:`https`, a "real" signed certificate (as opposed to self-signed) is recommended.
-
-For security reasons, directory listing needs to be disabled on the web documents folder served by Apache:
-
-In the main Apache configuration file (``/etc/httpd/conf/httpd.conf`` in the default setup), find the section that configures your web directory. For example, if the ``DocumentRoot``, defined elsewhere in the file, is set to the default ``"/var/www/html"``, the opening line of the section will look like this:
-
-````
-
-Find the ``Options`` line in that section, and make sure that it doesn't contain the ``Indexes`` statement.
-For example, if the options line in your configuration is
-
-``Options Indexes FollowSymLinks``
-
-change it to
-
-``Options FollowSymLinks``
-
-b. R:
------
-
-The simplest way to install R on RHEL/derivative systems is with yum, using the EPEL repository::
-
- yum install epel-release
- yum install R-core R-core-devel
-
-Both EPEL6 and EPEL7 currently provide R 3.5, which has been tested and appears to work well. R 3.4, offered by EPEL until also works well. We recommend using the currently available EPEL version for all the new installations. But if you already have a working R 3.4 installation from EPEL and you don't have a specific need to upgrade, you may lock that version in place using the ``yum-versionlock`` yum plugin, or simply add this line to the "epel" section of /etc/yum.repos.d/epel.repo::
-
- exclude=R-*,openblas-*,libRmath*
-
-RHEL users may need to log in to their organization's respective RHN interface, find the particular machine in question and:
-
-• click on "Subscribed Channels: Alter Channel Subscriptions"
-• enable EPEL, Server Extras, Server Optional
-
-If you are upgrading an existing installation of TwoRavens, or if you have attempted to
-install it in the past and it didn't work, **we strongly recommend reinstalling
-R completely**, erasing all the extra R packages that may have been already built.
-
-Uninstall R::
-
- yum erase R-core R-core-devel
-
-Wipe clean any R packages that were left behind::
-
- rm -rf /usr/lib64/R/library/*
- rm -rf /usr/share/R/library/*
-
-... then re-install R with :fixedwidthplain:`yum install`
-
-c. rApache:
------------
-
-We maintain the following rpms of rApache, built for the following version of RedHat/derivative distribution:
-
-For RHEL/CentOS 6 and R 3.4, download :download:`rapache-1.2.6-rpm0.x86_64.rpm <../_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.6-rpm0.x86_64.rpm>` and install it with::
-
- yum install rapache-1.2.6-rpm0.x86_64.rpm
-
-For RHEL/CentOS 6 and R 3.5, download :download:`rapache-1.2.9_R-3.5-RH6.x86_64.rpm <../_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.9_R-3.5-RH6.x86_64.rpm>` and install it with::
-
- yum install rapache-1.2.9_R-3.5-RH6.x86_64.rpm
-
-If you are using RHEL/CentOS 7 and R 3.4, download :download:`rapache-1.2.7-rpm0.x86_64.rpm <../_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.7-rpm0.x86_64.rpm>` and install it with::
-
- yum install apache-1.2.7-rpm0.x86_64.rpm
-
-If you are using RHEL/CentOS 7 in combination with R 3.5, download :download:`rapache-1.2.9_R-3.5.x86_64.rpm <../_static/installation/files/home/rpmbuild/rpmbuild/RPMS/x86_64/rapache-1.2.9_R-3.5.x86_64.rpm>` and install it with::
-
- yum install rapache-1.2.9_R-3.5.x86_64.rpm
-
-**Please note:**
-The rpms above cannot be *guaranteed* to work on your
-system. You may have a collection of system libraries installed on
-your system that will create a version conflict. If that's the case,
-or if you are trying to install on an operating system that's listed
-above, do not despair: simply build rApache from `source
-`_ . **Make sure** to build with
-the R that's the same version you are planning on using.
-
-d. Install the build environment for R:
----------------------------------------
-
-Once again, extra R packages will need to be built from sources. Make sure you have the standard GNU compilers installed: ``gcc``, ``gcc-c++`` and ``gcc-gfortran``.
-
-One of the required packages needed :fixedwidthplain:`/bin/ed`. The R package build script needs :fixedwidthplain:`/usr/bin/wget`. If these are missing, the rpms can be installed with::
-
- yum install ed wget
-
-Depending on how your system was originally set up, you may end up needing to install some other missing rpms. We'll explain how to troubleshoot compiler errors caused by missing libraries and/or executables.
-
-2. Install Extra R Packages
-+++++++++++++++++++++++++++
-
-We provide a shell script (``r-setup.sh``) that will try to install all the needed packages. **Note:** the script is now part of the TwoRavens distribution (it **used to be** in the Dataverse Software source tree).
-
-
-The script will attempt to download the packages from CRAN (or a mirror), so the system must have access to the Internet.
-
-In order to run the script:
-
-Download the current snapshot of the "dataverse-distribution" branch
-of TwoRavens from github:
-`https://github.com/IQSS/TwoRavens/archive/dataverse-distribution.zip
-`_.
-Once again, it is important that you download the
-"dataverse-distribution" branch, and NOT the master distribution!
-Unpack the zip file, then run the script::
-
- unzip dataverse-distribution.zip
- cd TwoRavens-dataverse-distribution/r-setup
- chmod +x r-setup.sh
- ./r-setup.sh
-
-
-See the section ``II.`` of the Appendix for trouble-shooting tips.
-
-For the Rserve package the setup script will also create a system user
-:fixedwidthplain:`rserve`, and install the startup script for the
-daemon (``/etc/init.d/rserve``). The script will skip this part, if
-this has already been done on this system (i.e., it should be safe to
-run it repeatedly).
-
-Note that the setup will set the Rserve password to :fixedwidthplain:`"rserve"`.
-Rserve daemon runs under a non-privileged user id, and there appears to be a
-very limited potential for security damage through unauthorized access. It is however
-still a good idea **to change the password**. The password is specified in ``/etc/Rserv.pwd``.
-Please see `Rserve documentation `_ for more
-information on password encryption and access security.
-
-Make sure the rserve password is correctly specified in the ``domain.xml`` of your Dataverse installation::
-
- -Ddataverse.rserve.password=...
-
-
-3. Install the TwoRavens Application
-++++++++++++++++++++++++++++++++++++
-
-a. download and unzip the application
--------------------------------------
-
-(though you may have already done so, in step ``2.`` above - see the instructions there).
-
-
-b. Rename the resulting directory "dataexplore" ...
---------------------------------------------------------
-
-...and place it in the web root directory of your apache server. We'll assume ``/var/www/html/dataexplore`` in the examples below::
-
- mv TwoRavens-dataverse-distribution /var/www/html/dataexplore
-
-
-c. run the installer
---------------------
-
-A scripted, interactive installer is provided at the top level of the TwoRavens
-distribution.
-
-The installer will ask you to provide the following:
-
-===================== ================================ ===========
-Setting default Comment
-===================== ================================ ===========
-TwoRavens directory ``/var/www/html/dataexplore`` File directory where TwoRavens is installed.
-Apache config dir. ``/etc/httpd`` rApache config file for TwoRavens will be placed under ``conf.d/`` there.
-Apache web dir. ``/var/www/html``
-rApache/TwoRavens URL ``http://{your hostname}:80`` URL of the Apache server hosting TwoRavens and rApache.
-Dataverse URL ``http://{your hostname}:8080`` URL of the Dataverse installation that integrates with this TwoRavens installation.
-===================== ================================ ===========
-
-Please note the default values above. The installer assumes
-
-- that you are running both the Dataverse installation and TwoRavens/rApache on the same host;
-- the default ports for Apache (80) and the app server that is serving your Dataverse installation (8080);
-- ``http`` (not ``https``!) for both .
-
-This configuration is recommended if you are simply trying out/testing Dataverse Software
-and TwoRavens. Accept all the defaults, and you should have a working installation
-in no time.
-
-However, if you are planning to use this installation to actually serve data to
-users, you'll most likely want to run under HTTPS. Please refer to the discussion
-in the Appendix, ``I.`` for more information on setting it up. Configuring HTTPS
-takes a little extra work. But note that the TwoRavens configuration
-can actually end up being simpler. If you use our recommended configuration for
-HTTPS (described in the Appendix), both the "TwoRavens URL" and "Dataverse URL"
-**will be the same**: ``https://{your hostname}``.
-
-Run the installer as::
-
- cd /var/www/html/dataexplore
- chmod +x install.pl
- ./install.pl
-
-
-
-
-Once everything is installed and configured, the installer script will print out a confirmation message with the URL of the TwoRavens application. For example::
-
- The application URL is https://server.dataverse.edu/dataexplore/gui.html
-
-d. Version conflict check (preprocess.R)
------------------------------------------
-
-One of the R files in the TwoRavens distribution, ``rook/preprocess/preprocess.R`` is used by both TwoRavens and
-the Dataverse installation. The Dataverse installation maintains its own copy of the file, ``/applications/dataverse-/WEB-INF/classes/edu/harvard/iq/dataverse/rserve/scripts/preprocess.R``.
-(Why not share the file from the same location? Because the two applications
-can potentially be installed on 2 different servers).
-Compare the two files. **It is important that the two copies are identical**.
-
-**If different**:
-
-- the **TwoRavens version wins**. Meaning, you need to copy the version supplied with this TwoRavens distribution and overwrite the app server version (above); then restart the app server.
-
-- unless this is a brand new Dataverse installation, it may have cached summary statistics fragments that were produced with the older version of this R code. You **must remove** all such cached files::
-
- cd
- find . -name '*.prep' | while read file; do /bin/rm $file; done
-
-*(Yes, this is a HACK! We are working on finding a better way to ensure this compatibility between
-TwoRavens and the Dataverse Software!)*
-
-e. Enable TwoRavens in a Dataverse Installation
------------------------------------------------
-
-Now that you have installed TwoRavens, you can make it available to your users by adding it an "external tool" for your Dataverse installation. (For more on external tools in general, see the :doc:`/admin/external-tools` section of the Admin Guide.)
-
-First, download :download:`twoRavens.json <../_static/installation/files/root/external-tools/twoRavens.json>` as a starting point and edit ``toolUrl`` in that external tool manifest file to be the URL where you want TwoRavens to run. This is the URL reported by the installer script (as in the example at the end of step ``c.``, above). Please note that this example .json file is not maintained by the TwoRavens developers and may be out of date. For an updated file, contact the `TwoRavens team
-`_.
-
-Once you have made your edits, make the tool available within your Dataverse installation with the following curl command (assuming ``twoRavens.json`` is in your current working directory):
-
-``curl -X POST -H 'Content-type: application/json' --upload-file twoRavens.json http://localhost:8080/api/admin/externalTools``
-
-Once enabled, TwoRavens will display as an explore tool option for tabular data files. Clicking it will redirect the user to the instance of TwoRavens, initialized with the data variables from the selected file.
-
-f. Perform a quick test of TwoRavens functionality
---------------------------------------------------
-
-Ingest the dummy data file ``50by1000.dta`` (supplied in the Dataverse Software source tree in ``dataverse/scripts/search/data/tabular``). If successfully ingested as tabular data,
-the file should appear on the Dataset page as follows:
-
-|tworavens_test_file_ingested|
-
-
-If the file does NOT appear as Tabular Data - if it is shown as Stata/dta,
-and no tabular attributes - the numbers of Variables and Observations and the UNF -
-are being displayed, try to refresh the page a couple of times. If that doesn't
-change the view to Tabular, it likely means that something went very wrong with the
-tabular ingest. Consult the app server log for any error messages that may
-explain the failure.
-
-If the file type is tabular data, but TwoRavens is not displayed as an explore tool option,
-double-check that the steps in ``e.``, above, were correctly performed.
-
-Selecting the TwoRavens explore tool option will open TwoRavens in a new browser window.
-If the application initializes successfully, you should see the "data pebbles" representing
-the first 3 variables in the file:
-
-|tworavens_test_init|
-
-If instead TwoRavens opens with an empty view - no variables listed on the left, and/or no "data pebbles" in the middle panel, we'll provide some diagnostics tips further below.
-
-Otherwise, mouse over ``var1``, and click on ``Dep Var``, selecting the variable as "dependent":
-
-|tworavens_test_select_var|
-
-Then select ``ls`` from the list of models on the right:
-
-|tworavens_test_select_model|
-
-Then click the ``Estimate`` button, above. If the model is successfully executed,
-the results will appear in a new popup panel, with some generated graph images, as shown below:
-
-|tworavens_test_output|
-
-**Troubleshooting:**
-
-If TwoRavens fails to initialize properly:
-
-Symptom: instead of the "data pebbles" display shown in the second image, above, you are getting an empty view:
-
-|tworavens_test_empty|
-
-A very likely cause of this condition is TwoRavens not being able to obtain the metadata describing the variables from your Dataverse installation.
-Specifically, the "preprocessed summary statistics".
-
-To diagnose: note the value of the ``dfId`` URL parameter in the view above.
-Try to request the preprocessed fragment by going to the API end point directly::
-
- /api/access/datafile/?format=prep
-
-Where the :fixedwidthplain:`` is the value of the :fixedwidthplain:`dfId` parameter from the previous view.
-You should get the output that looks like this::
-
- {"dataset":{"private":false},"variables":{"var1":{"plottype":"bar","plotvalues":{"1":100,"2":100,"3":100,"4":100,"5":100,"6":100,"7":100,"8":100,"9":100,"10":100},"varnamesSumStat":"var1","median":5.5,"mean":5.5,"mode":"1","max":10,"min":1,"invalid":0,"valid":1000,"sd":2.87371854193452,"uniques":10,"herfindahl":0.1,"freqmode":100,"fewest":"1","mid":"1","freqfewest":"100","freqmid":"100","numchar":"numeric","nature":"ordinal","binary":"no","interval":"discrete","varnamesTypes":"var1","defaultInterval":"discrete","defaultNumchar":"numeric","defaultNature":"ordinal","defaultBinary":"no"},"var3":{"plottype":"bar","plotvalues":
- ...
-
-If you are getting an error message instead, this is likely an Rserve connection problem.
-Consult the app server log for any Rserve-related "connection refused" messages.
-See if Rserve is running, and start it with ``service rserve start``, if necessary.
-Check if the Rserve host name, username and password in the app server configuration match
-the actual Rserve configuration. (this is discussed in the section ``2.`` of the guide).
-Correct this, if necessary, then try again.
-
-If you ARE getting JSON output, but the TwoRavens view is still broken:
-
-- Look closely at the very beginning of the JSON fragment. Does it have the ``{"private":false}`` entry, as shown in the example above? If not, this likely an R code version mismatch, described in section ``3.d.``, above. Correct the problem as described there, then try again.
-
-- If the JSON looks *exactly* as the fragment above, yet still no data pebbles - enable the JavaScript error console in the TwoRavens window, and try again. Look for any error messages; and, specifically, for any URLs that TwoRavens is failing to access. Look for the debugging entry that shows TwoRavens attempting to download the ``format=prep`` fragment. Does the URL have the correct host name, port and/or the protocol (http vs. https)? If not, re-run the installer, specifying the correct Dataverse installation URL, and try again.
-
-Symptom: the variables view is initialized properly, but no model output appears when you click ``Estimate``, with or without error messages.
-
-- Make sure you properly selected the dependent variable (:fixedwidthplain:`var1`) and the model (:fixedwidthplain:`ls`).
-
-- Consult the Apache error log files (``error_log`` and/or ``ssl_error_log``, in ``/var/log/httpd``) for any error messages. Possible error condition may include: missing R packages (double-check that the R setup, in step ``2.`` completed without errors); ``selinux`` ("Secure Linux") errors related to the rApache shared libraries, or directory permissions (disable Selinux, as described in ``1.a.``)
-
-
-4. Appendix
-+++++++++++
-
-
-I. Ports configuration discussion
----------------------------------
-
-By default, the app server will install itself on ports 8080 and 8181 (for
-``HTTP`` and ``HTTPS``, respectively). Apache will install itself on port 80
-(the default port for ``HTTP``). Under this configuration, your Dataverse installation will
-be accessible at ``http://{your host}:8080``, and rApache at
-``http://{your host}/``. The TwoRavens installer, above, will default to these
-values (and assume you are running both the Dataverse installation and TwoRavens/rApache on
-the same host).
-
-This configuration is the easiest to set up if you are simply
-trying out/testing the Dataverse Software and TwoRavens integration. Accept all the
-defaults, and you should have a working installation in no
-time. However, if you are planning to use this installation to
-actually serve data to real users, you will most likely want to run your Dataverse installation on a standard port; and to use ``HTTPS``. It is definitely possible to configure
-the app server to serve the application under ``HTTPS`` on port 443. However, we
-**do not recommend** this setup! For at least 2 reasons: 1. Running the app server on
-port 443 will require you to **run it as root** user; which should be avoided,
-if possible, for reasons of security. Also, 2) installing ``SSL`` certificates under
-the app server is unnecessarily complicated. The alternative configuration that
-we recommend is to "hide" your app server behind Apache. In this setup Apache
-serves as the ``HTTPS`` front running on port 443, proxying the traffic to
-the app server using ``mod_proxy_ajp``; and the app server is running as
-an non-privileged user on a high port that's not accessible from the outside.
-Unlike the app server, Apache has a mechanism for running on a privileged port (in
-this case, 443) as a non-privileged user. It is possible to use this
-configuration, and have this Apache instance serve TwoRavens and rApache too,
-all on the same server. Please see :ref:`network-ports` under the :doc:`config`
-section, and the :doc:`shibboleth` section of the Installation Guide for more
-information and configuration instructions.
-
-
-II. What the r-setup.sh script does:
-------------------------------------
-
-The script uses the list of 45 R library packages and specified
-package versions, supplied in ``TwoRavens/r-setup/package-versions.txt`` to
-replicate the library environment that has been proven to work on Dataverse
-installations.
-
-If any packages fail to build, the script will alert the user.
-
-For every package, the (potentially verbose) output of the build process is saved in
-its own file, ``RINSTALL.{PACKAGE NAME}.LOG``. So if, for example, the package
-Zelig fails to install, the log file :fixedwidthplain:`RINSTALL.Zelig.LOG` should
-be consulted for any error messages that may explain the reason for the failure;
-such as a missing library, or a missing compiler, etc. Be aware that diagnosing
-compiler errors will require at least some programming and/or system administration
-skills.
-
-
-III. What the install.pl script does:
--------------------------------------
-
-The steps below are performed by the ``install.pl`` script. **Provided for reference only!**
-The instruction below could be used to configure it all by hand, if necessary, or
-to verify that the installer has done it correctly.
-Once again: **normally you would NOT need to individually perform the steps below**!
-
-TwoRavens is distributed with a few hard-coded host and directory names. So these
-need to be replaced with the values specific to your system.
-
-
-**In the file** ``/var/www/html/dataexplore/app_ddi.js`` **the following 3 lines need to be
-edited:**
-
-1. ``var production=false;``
-
- changed to ``true``;
-
-2. ``hostname="localhost:8080";``
-
- changed to point to the Dataverse installation, from which TwoRavens will be obtaining the metadata and data files. (don't forget to change 8080 to the correct port number!)
-
-3. ``var rappURL = "http://0.0.0.0:8000/custom/";``
-
- changed to the URL of your rApache server, i.e.
-
- ``"http(s)://:/custom/";``
-
-**In** ``dataexplore/rook`` **the following files need to be edited:**
-
-``rookdata.R, rookzelig.R, rooksubset.R, rooktransform.R, rookselector.R, rooksource.R``
-
-replacing *every* instance of ``production<-FALSE`` line with ``production<-TRUE``.
-
-(yeah, that's why we provide that installer script...)
-
-
-**In** ``dataexplore/rook/rooksource.R`` **the following line:**
-
-``setwd("/usr/local/payara5/glassfish/domains/domain1/docroot/dataexplore/rook")``
-
-needs to be changed to:
-
-``setwd("/var/www/html/dataexplore/rook")``
-
-(or your :fixedwidthplain:`dataexplore` directory, if different from the above)
-
-**In** ``dataexplore/rook/rookutils.R`` **the following lines need to be edited:**
-
-``url <- paste("https://beta.dataverse.org/custom/preprocess_dir/preprocessSubset_",sessionid,".txt",sep="")``
-
-and
-
-``imageVector[[qicount]]<<-paste("https://beta.dataverse.org/custom/pic_dir/", mysessionid,"_",mymodelcount,qicount,".png", sep = "")``
-
-changing the URL to reflect the correct location of your rApache instance. make sure that the protocol (http vs. https) and the port number are correct too, not just the host name!
-
-
-**Next, in order to configure rApache to serve several TwoRavens "mini-apps",**
-
-the installer creates the file ``tworavens-rapache.conf`` in the Apache's ``/etc/httpd/conf.d`` directory with the following configuration:
-
-.. code-block:: none
-
- RSourceOnStartup "/var/www/html/dataexplore/rook/rooksource.R"
-
- SetHandler r-handler
- RFileEval /var/www/html/dataexplore/rook/rookzelig.R:Rook::Server$call(zelig.app)
-
-
- SetHandler r-handler
- RFileEval /var/www/html/dataexplore/rook/rooksubset.R:Rook::Server$call(subset.app)
-
-
- SetHandler r-handler
- RFileEval /var/www/html/dataexplore/rook/rooktransform.R:Rook::Server$call(transform.app)
-
-
- SetHandler r-handler
- RFileEval /var/www/html/dataexplore/rook/rookdata.R:Rook::Server$call(data.app)
-
-
-**The following directories are created by the installer to store various output files produced by TwoRavens:**
-
-.. code-block:: none
-
- mkdir --parents /var/www/html/custom/pic_dir
-
- mkdir --parents /var/www/html/custom/preprocess_dir
-
- mkdir --parents /var/www/html/custom/log_dir
-
-**The ownership of the TwoRavens directories is changed to user** ``apache``:
-
-.. code-block:: none
-
- chown -R apache.apache /var/www/html/custom
-
- chown -R apache /var/www/html/dataexplore
-
-**Finally, the installer restarts Apache, for all the changes to take effect:**
-
-``service httpd restart``
-
-.. |tworavens_test_file_ingested| image:: ./img/tworavens_test_file_ingested.png
- :class: img-responsive
-
-.. |tworavens_test_init| image:: ./img/tworavens_test_init.png
- :class: img-responsive
-
-.. |tworavens_test_select_var| image:: ./img/tworavens_test_select_var.png
- :class: img-responsive
-
-.. |tworavens_test_select_model| image:: ./img/tworavens_test_select_model.png
- :class: img-responsive
-
-.. |tworavens_test_output| image:: ./img/tworavens_test_output.png
- :class: img-responsive
-
-.. |tworavens_test_empty| image:: ./img/tworavens_test_empty.png
- :class: img-responsive
-
-.. |tworavens_components| image:: ./img/tworavens_components.png
- :class: img-responsive
diff --git a/doc/sphinx-guides/source/installation/shibboleth.rst b/doc/sphinx-guides/source/installation/shibboleth.rst
index 08d69bcad4a..cd0fbda77a6 100644
--- a/doc/sphinx-guides/source/installation/shibboleth.rst
+++ b/doc/sphinx-guides/source/installation/shibboleth.rst
@@ -23,7 +23,7 @@ System Requirements
Support for Shibboleth in the Dataverse Software is built on the popular `"mod_shib" Apache module, "shibd" daemon `_, and the `Embedded Discovery Service (EDS) `_ Javascript library, all of which are distributed by the `Shibboleth Consortium `_. EDS is bundled with the Dataverse Software, but ``mod_shib`` and ``shibd`` must be installed and configured per below.
-Only Red Hat Enterprise Linux (RHEL) and derivatives have been tested (x86_64 versions) by the Dataverse Project team. See https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPLinuxInstall for details and note that (according to that page) as of this writing Ubuntu and Debian are not offically supported by the Shibboleth project.
+Only Red Hat Enterprise Linux (RHEL) and derivatives have been tested (x86_64 versions) by the Dataverse Project team. See https://shibboleth.atlassian.net/wiki/spaces/SP3/pages/2065335547/LinuxInstall for details and note that (according to that page) as of this writing Ubuntu and Debian are not officially supported by the Shibboleth project.
Install Apache
~~~~~~~~~~~~~~
@@ -39,28 +39,12 @@ Install Shibboleth
Installing Shibboleth will give us both the ``shibd`` service and the ``mod_shib`` Apache module.
-Enable Shibboleth Yum Repo
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-This yum repo is recommended at https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPLinuxRPMInstall
-
-``cd /etc/yum.repos.d``
-
-Install ``wget`` if you don't have it already:
-
-``yum install wget``
-
-If you are running el8 (RHEL/derivative 8):
-
-``wget http://download.opensuse.org/repositories/security:/shibboleth/CentOS_8/security:shibboleth.repo``
-
-If you are running el7 (RHEL/CentOS 7):
-
-``wget http://download.opensuse.org/repositories/security:/shibboleth/CentOS_7/security:shibboleth.repo``
+Install Shibboleth Yum Repo
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
-If you are running el6 (RHEL/CentOS 6):
+The Shibboleth project now provides `a web form `_ to generate an appropriate package repository for use with YUM/DNF.
-``wget http://download.opensuse.org/repositories/security:/shibboleth/CentOS_CentOS-6/security:shibboleth.repo``
+You'll want to copy-paste the form results into ``/etc/yum.repos.d/shibboleth.repo`` or wherever is most appropriate for your operating system.
Install Shibboleth Via Yum
^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -124,10 +108,6 @@ Near the bottom of ``/etc/httpd/conf.d/ssl.conf`` but before the closing ```_ says, "At the present time, we do not support the SP in conjunction with SELinux, and at minimum we know that communication between the mod_shib and shibd components will fail if it's enabled. Other problems may also occur."
+The first and easiest option is to set ``SELINUX=permisive`` in ``/etc/selinux/config`` and run ``setenforce permissive`` or otherwise disable SELinux to get Shibboleth to work. This is apparently what the Shibboleth project expects because their `wiki page `_ says, "At the present time, we do not support the SP in conjunction with SELinux, and at minimum we know that communication between the mod_shib and shibd components will fail if it's enabled. Other problems may also occur."
Reconfigure SELinux to Accommodate Shibboleth
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/doc/sphinx-guides/source/style/patterns.rst b/doc/sphinx-guides/source/style/patterns.rst
index 77235e0d40f..e96f17dc2ec 100644
--- a/doc/sphinx-guides/source/style/patterns.rst
+++ b/doc/sphinx-guides/source/style/patterns.rst
@@ -580,11 +580,6 @@ Another variation of icon-only buttons uses the ``.btn-link`` style class from B
Explore