Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions doc/release-notes/7980-enhanced-dsd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
### Default Values for Database Connections Fixed

Introduced in Dataverse release 5.3 a regression might have hit you:
the announced default values for the database connection never actually worked.

With the update to Payara 5.2022.3 it was possible to introduce working
defaults. The documentation has been changed accordingly.

Together with this change, you can now enable advanced connection pool
configurations useful for debugging and monitoring. Of particular interest may be `sslmode=require`. See the docs for details.
151 changes: 151 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,153 @@ As for the "Remote only" authentication mode, it means that:
- ``:DefaultAuthProvider`` has been set to use the desired authentication provider
- The "builtin" authentication provider has been disabled (:ref:`api-toggle-auth-provider`). Note that disabling the "builtin" authentication provider means that the API endpoint for converting an account from a remote auth provider will not work. Converting directly from one remote authentication provider to another (i.e. from GitHub to Google) is not supported. Conversion from remote is always to "builtin". Then the user initiates a conversion from "builtin" to remote. Note that longer term, the plan is to permit multiple login options to the same Dataverse installation account per https://github.com/IQSS/dataverse/issues/3487 (so all this talk of conversion will be moot) but for now users can only use a single login option, as explained in the :doc:`/user/account` section of the User Guide. In short, "remote only" might work for you if you only plan to use a single remote authentication provider such that no conversion between remote authentication providers will be necessary.

.. _database-persistence:

Database Persistence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs are great but I wonder if they should be moved to "Advanced Installation" because the default database setup works fine and the list of steps before "going live" is getting longer and longer. We can defer this doc change for now. I created a dedicated issue for later:

--------------------

The Dataverse software uses a PostgreSQL database to store objects users create.
You can configure basic and advanced settings for the PostgreSQL database connection with the help of
MicroProfile Config API.

Basic Database Settings
+++++++++++++++++++++++

1. Any of these settings can be set via system properties (see :ref:`jvm-options` starting at :ref:`dataverse.db.name`), environment variables or other
MicroProfile Config mechanisms supported by the app server.
`See Payara docs for supported sources <https://docs.payara.fish/community/docs/documentation/microprofile/config/README.html#config-sources>`_.
2. Remember to protect your secrets. For passwords, use an environment variable (bare minimum), a password alias named the same
as the key (OK) or use the "dir config source" of Payara (best).

Alias creation example:

.. code-block:: shell

echo "AS_ADMIN_ALIASPASSWORD=changeme" > /tmp/p.txt
asadmin create-password-alias --passwordfile /tmp/p.txt dataverse.db.password
rm /tmp/p.txt

3. Environment variables follow the key, replacing any dot, colon, dash, etc. into an underscore "_" and all uppercase
letters. Example: ``dataverse.db.host`` -> ``DATAVERSE_DB_HOST``

.. list-table::
:widths: 15 60 25
:header-rows: 1
:align: left

* - MPCONFIG Key
- Description
- Default
* - dataverse.db.host
- The PostgreSQL server to connect to.
- ``localhost``
* - dataverse.db.port
- The PostgreSQL server port to connect to.
- ``5432``
* - dataverse.db.user
- The PostgreSQL user name to connect with.
- | ``dataverse``
| (installer sets to ``dvnapp``)
* - dataverse.db.password
- The PostgreSQL users password to connect with.

**Please note the safety advisory above.**
- *No default*
* - dataverse.db.name
- The PostgreSQL database name to use for the Dataverse installation.
- | ``dataverse``
| (installer sets to ``dvndb``)
Comment on lines +309 to +321
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to see that the default database user and database are "dataverse".

To deploy this branch I used scripts/dev/dev-rebuild.sh which has the names the installer uses hard coded:

DB_NAME=dvndb
DB_USER=dvnapp

I guess I could try changing that script to this...

DB_NAME=dataverse
DB_USER=dataverse

... and then deleting these lines from domain.xml:

  <system-property name="dataverse.db.user" value="dvnapp"></system-property>
  <system-property name="dataverse.db.name" value="dvndb"></system-property>

That is, without these system properties configured, it sounds like the default database user and database name will be "dataverse".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Sorry, but for container environments it really is common practice to use the application name for the database name. As the installer sets this explicitly for any installation out there, folks should be fine.

* - dataverse.db.parameters
- Connection parameters, such as ``sslmode=require``. See `Postgres JDBC docs <https://jdbc.postgresql.org/documentation/head/connect.html>`_
Note: you don't need to provide the initial "?".
- *Empty string*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried putting the following in domain.xml...

<jvm-options>-Ddataverse.db.parameters=sslmode=require</jvm-options>

... and it seems to have worked. Dataverse failed to deploy and I get errors like this in server.log:

Connection could not be allocated because: The server does not support SSL.

Error in allocating a connection. Cause: Connection could not be allocated because: The server does not support SSL.

Exception while invoking class org.glassfish.ejb.startup.EjbApplication start method
javax.ejb.EJBException: javax.ejb.CreateException: Initialization failed for Singleton StartupFlywayMigrator
...
Caused by: org.flywaydb.core.internal.exception.FlywaySqlException: Unable to obtain connection from database: Error in allocating a connection. Cause: Connection could not be allocated because: The server does not support SSL.

SQL State : null
Error Code : 0
Message : Error in allocating a connection. Cause: Connection could not be allocated because: The server does not support SSL.


Advanced Database Settings
++++++++++++++++++++++++++

The following options are useful in many scenarios. You might be interested in debug output during development or
monitoring performance in production.

You can find more details within the Payara docs:

- `User Guide: Connection Pool Configuration <https://docs.payara.fish/community/docs/documentation/user-guides/connection-pools/connection-pools.html>`_
- `Tech Doc: Advanced Connection Pool Configuration <https://docs.payara.fish/community/docs/documentation/payara-server/jdbc/advanced-connection-pool-properties.html>`_.

Connection Validation
^^^^^^^^^^^^^^^^^^^^^

.. list-table::
:widths: 15 60 25
:header-rows: 1
:align: left

* - MPCONFIG Key
- Description
- Default
* - dataverse.db.is-connection-validation-required
- ``true``: Validate connections, allow server to reconnect in case of failure.
- false
* - dataverse.db.connection-validation-method
- | The method of connection validation:
| ``table|autocommit|meta-data|custom-validation``.
- *Empty string*
* - dataverse.db.validation-table-name
- The name of the table used for validation if the validation method is set to ``table``.
- *Empty string*
* - dataverse.db.validation-classname
- The name of the custom class used for validation if the ``validation-method`` is set to ``custom-validation``.
- *Empty string*
* - dataverse.db.validate-atmost-once-period-in-seconds
- Specifies the time interval in seconds between successive requests to validate a connection at most once.
- ``0`` (disabled)

Connection & Statement Leaks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. list-table::
:widths: 15 60 25
:header-rows: 1
:align: left

* - MPCONFIG Key
- Description
- Default
* - dataverse.db.connection-leak-timeout-in-seconds
- Specify timeout when connections count as "leaked".
- ``0`` (disabled)
* - dataverse.db.connection-leak-reclaim
- If enabled, leaked connection will be reclaimed by the pool after connection leak timeout occurs.
- ``false``
* - dataverse.db.statement-leak-timeout-in-seconds
- Specifiy timeout when statements should be considered to be "leaked".
- ``0`` (disabled)
* - dataverse.db.statement-leak-reclaim
- If enabled, leaked statement will be reclaimed by the pool after statement leak timeout occurs.
- ``false``

Logging & Slow Performance
^^^^^^^^^^^^^^^^^^^^^^^^^^

.. list-table::
:widths: 15 60 25
:header-rows: 1
:align: left

* - MPCONFIG Key
- Description
- Default
* - dataverse.db.statement-timeout-in-seconds
- Timeout property of a connection to enable termination of abnormally long running queries.
- ``-1`` (disabled)
* - dataverse.db.slow-query-threshold-in-seconds
- SQL queries that exceed this time in seconds will be logged.
- ``-1`` (disabled)
* - dataverse.db.log-jdbc-calls
- When set to true, all JDBC calls will be logged allowing tracing of all JDBC interactions including SQL.
- ``false``



.. _file-storage:

File Storage: Using a Local Filesystem and/or Swift and/or Object Stores and/or Trusted Remote Stores
Expand Down Expand Up @@ -1561,6 +1708,8 @@ dataverse.auth.password-reset-timeout-in-minutes

Users have 60 minutes to change their passwords by default. You can adjust this value here.

.. _dataverse.db.name:

dataverse.db.name
+++++++++++++++++

Expand All @@ -1570,6 +1719,8 @@ Defaults to ``dataverse`` (but the installer sets it to ``dvndb``).

Can also be set via *MicroProfile Config API* sources, e.g. the environment variable ``DATAVERSE_DB_NAME``.

See also :ref:`database-persistence`.

dataverse.db.user
+++++++++++++++++

Expand Down
44 changes: 30 additions & 14 deletions src/main/java/edu/harvard/iq/dataverse/util/DataSourceProducer.java
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,13 @@
// HINT: PGSimpleDataSource would work too, but as we use a connection pool, go with a javax.sql.ConnectionPoolDataSource
// HINT: PGXADataSource is unnecessary (no distributed transactions used) and breaks ingest.
className = "org.postgresql.ds.PGConnectionPoolDataSource",
user = "${MPCONFIG=dataverse.db.user}",

// BEWARE: as this resource is created before defaults are read from META-INF/microprofile-config.properties,
// defaults must be provided in this Payara-proprietary manner.
user = "${MPCONFIG=dataverse.db.user:dataverse}",
password = "${MPCONFIG=dataverse.db.password}",
url = "jdbc:postgresql://${MPCONFIG=dataverse.db.host}:${MPCONFIG=dataverse.db.port}/${MPCONFIG=dataverse.db.name}",
url = "jdbc:postgresql://${MPCONFIG=dataverse.db.host:localhost}:${MPCONFIG=dataverse.db.port:5432}/${MPCONFIG=dataverse.db.name:dataverse}?${MPCONFIG=dataverse.db.parameters:}",

// If we ever need to change these pool settings, we need to remove this class and create the resource
// from web.xml. We can use MicroProfile Config in there for these values, impossible to do in the annotation.
//
Expand All @@ -30,18 +34,30 @@
maxPoolSize = 100,
// "The number of seconds that a physical connection should remain unused in the pool before the connection is closed for a connection pool. "
// Payara DataSourceDefinitionDeployer default value = 300 (seconds)
maxIdleTime = 300)
// It's possible to add additional properties like this...
//
//properties = {
// "fish.payara.log-jdbc-calls=true"
//})
//
// ... but at this time we don't think we need any. The full list
// of properties can be found at https://docs.payara.fish/community/docs/5.2021.6/documentation/payara-server/jdbc/advanced-connection-pool-properties.html#full-list-of-properties
//
// All these properties cannot be configured via MPCONFIG as Payara doesn't support this (yet). To be enhanced.
// See also https://github.com/payara/Payara/issues/5024
maxIdleTime = 300,

// Set more options via MPCONFIG, including defaults where applicable.
// TODO: Future versions of Payara might support setting integer properties like pool size,
// idle times, etc in a Payara-propietary way. See https://github.com/payara/Payara/pull/5272
properties = {
// The following options are documented here:
// https://docs.payara.fish/community/docs/documentation/payara-server/jdbc/advanced-connection-pool-properties.html
// VALIDATION
"fish.payara.is-connection-validation-required=${MPCONFIG=dataverse.db.is-connection-validation-required:false}",
"fish.payara.connection-validation-method=${MPCONFIG=dataverse.db.connection-validation-method:}",
"fish.payara.validation-table-name=${MPCONFIG=dataverse.db.validation-table-name:}",
"fish.payara.validation-classname=${MPCONFIG=dataverse.db.validation-classname:}",
"fish.payara.validate-atmost-once-period-in-seconds=${MPCONFIG=dataverse.db.validate-atmost-once-period-in-seconds:0}",
// LEAK DETECTION
"fish.payara.connection-leak-timeout-in-seconds=${MPCONFIG=dataverse.db.connection-leak-timeout-in-seconds:0}",
"fish.payara.connection-leak-reclaim=${MPCONFIG=dataverse.db.connection-leak-reclaim:false}",
"fish.payara.statement-leak-timeout-in-seconds=${MPCONFIG=dataverse.db.statement-leak-timeout-in-seconds:0}",
"fish.payara.statement-leak-reclaim=${MPCONFIG=dataverse.db.statement-leak-reclaim:false}",
// LOGGING, SLOWNESS, PERFORMANCE
"fish.payara.statement-timeout-in-seconds=${MPCONFIG=dataverse.db.statement-timeout-in-seconds:-1}",
"fish.payara.slow-query-threshold-in-seconds=${MPCONFIG=dataverse.db.slow-query-threshold-in-seconds:-1}",
"fish.payara.log-jdbc-calls=${MPCONFIG=dataverse.db.log-jdbc-calls:false}"
})
public class DataSourceProducer {

@Resource(lookup = "java:app/jdbc/dataverse")
Expand Down