Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
f21b117
refactor(solr): move schema.xml and script to schema folder, delete s…
poikilotherm Dec 21, 2021
60a263d
feat(solr): add XSLT scripts to edit solrconfig.xml with our changes …
poikilotherm Dec 21, 2021
c5d3a09
feat(solr): make schema factory XSLT idempotent #7662
poikilotherm Dec 23, 2021
4bbe1e9
feat(solr): make search boosting XSLT idempotent #7662
poikilotherm Dec 23, 2021
1a24a50
fix(solr): adapt pathes in shellspec tests for update-fields.sh #7662
poikilotherm Dec 23, 2021
a6fdaa8
docs(solr): make Sphinx read Solr version from Maven pom.xml #7662
poikilotherm Dec 23, 2021
3641f60
docs(solr): use Sphinx substitution for Solr version in installation …
poikilotherm Dec 23, 2021
69f66f5
docs(metadata): fix update-fields.sh include path #7662
poikilotherm Dec 23, 2021
2d506db
feat(solr): add Makefile to create Dataverse ConfigSet #7662
poikilotherm Jan 3, 2022
76a9a99
fix(solr): make XSLTs output XML proc line & comments #7662
poikilotherm Jan 3, 2022
ebb7ddf
refactor(deps): move SolrJ dependency to parent POM #7662
poikilotherm Feb 28, 2022
17555c8
refactor(deps): move JUnit Jupiter dependency to parent POM #7662
poikilotherm Feb 28, 2022
a7ffb29
refactor(solr): move XSLT and Schema to solr-configset module #7662
poikilotherm Feb 28, 2022
9954ba1
refactor(solr): move update-fields.sh to solr-configset module #7662
poikilotherm Feb 28, 2022
eb2be9c
feat!(solr): create Maven module solr-configset #7662
poikilotherm Feb 28, 2022
91e1983
refactor(solr): remove conf/solr leftovers #7662
poikilotherm Feb 28, 2022
1f03925
chore(solr): remove unnecessary Solr plugin #7662
poikilotherm Mar 10, 2022
1e6edfc
deps(solr): switch all deps to scope provided or test #7662
poikilotherm Mar 10, 2022
c5fc8ad
build(solr): move zip assembly into default profile #7662
poikilotherm Mar 11, 2022
2509beb
build(parent): add a Maven profile "ct" for container image usage
poikilotherm Mar 11, 2022
52dcf8d
build(solr): make test compile in all cases #7662
poikilotherm Mar 11, 2022
e418ef9
build(solr): make packaging type configurable via property #7662
poikilotherm Mar 11, 2022
9e1dcee
feat(solr): add a build for Dataverse Solr container images #7662
poikilotherm Mar 11, 2022
0a32c21
feat(parent,solr): make configset name configurable #7662
poikilotherm Mar 11, 2022
5b8a0bc
refactor(solr,parent): make configsets target path a reusable propert…
poikilotherm Mar 11, 2022
1aa408e
feat(solr): tune container image memory defaults and request size #7662
poikilotherm Mar 11, 2022
b9140f0
feat(solr): add Solr container run configuration #7662
poikilotherm Mar 11, 2022
c81066e
fix(solr): align Solr request size with documentation #7662
poikilotherm Mar 12, 2022
3dba548
docs(solr): refactor Solr installation guide #7662
poikilotherm Mar 12, 2022
4b98326
refactor(solr): rename CompileSolrConfigSet to CompileSolrConfigXML #…
poikilotherm Mar 21, 2022
1dc0ebf
Merge branch 'develop' into 7662-solrconfig
poikilotherm Mar 22, 2022
afa2b15
fix(solr): make container image group name a variable #7662
poikilotherm Apr 1, 2022
fb6dcaf
feat(solr): introduce solrteur JBang app #7662
poikilotherm Apr 1, 2022
0970533
refactor(solr): move CompileSolrConfig to pretty sub #7662
poikilotherm Apr 1, 2022
6ce1b5d
refactor(solrteur): introduce simple log levels #7662
poikilotherm Apr 26, 2022
c0516cc
refactor(solrteur): split up solr config compilation #7662
poikilotherm Apr 26, 2022
efb948b
chore(solrteur): add package-info.java for cli.cmd package #7662
poikilotherm Apr 26, 2022
a1b25c5
chore(solrteur): introduce helper packages cli.util and cli.util.mode…
poikilotherm Apr 26, 2022
39acda2
feat(solrteur): introduce custom exception to handle parsing errors #…
poikilotherm Apr 26, 2022
a5efc7e
feat(solrteur): introduce state machine for TSV file layout #7662
poikilotherm Apr 26, 2022
7fc08b0
feat(solrteur): introduce a validator helper class #7662
poikilotherm Apr 26, 2022
ff0e91c
feat(solrteur): extend the metadata block POJO #7662
poikilotherm Apr 26, 2022
2dae76c
feat(solrteur): introduce configuration POJO #7662
poikilotherm Apr 29, 2022
2649a87
feat(solrteur): extend Field model with minimal header spec #7662
poikilotherm Apr 29, 2022
2aa8a4e
feat(solrteur): extend ControlledVocabulary model with minimal header…
poikilotherm Apr 29, 2022
3733117
refactor(solrteur): switch Block model to use keyword #7662
poikilotherm Apr 29, 2022
3c6a5fa
refactor(solrteur): refactor state factory with state keywords #7662
poikilotherm Apr 29, 2022
aacb890
refactor(solrteur): let Validator use the configuration #7662
poikilotherm Apr 29, 2022
fea4d1c
feat(solrteur): extend Block and make it testable #7662
poikilotherm Apr 29, 2022
03d19e0
feat(solrteur): add types enum to Field #7662
poikilotherm Apr 29, 2022
13de6d5
fix(solrteur): make headers not match null #7662
poikilotherm Apr 29, 2022
52dc5a9
feat(solrteur): add all headers to Field #7662
poikilotherm Apr 29, 2022
c67ec21
Merge branch 'develop' into 7662-solrconfig
poikilotherm Jan 14, 2023
b013dc5
Merge branch 'develop' into 7662-solrconfig
poikilotherm Feb 13, 2023
bc06537
feat(solrteur): move classes and extend field parser #7662
poikilotherm Feb 16, 2023
ae4c7f1
feat(solrteur): continue improving MDB TSV field parsing
poikilotherm Feb 16, 2023
9790e64
fix(solrteur): check for correct length of block line
poikilotherm Mar 8, 2023
329c200
build(solrteur): switch to JAR build to also release the code itself …
poikilotherm Mar 8, 2023
0da8172
test(solrteur): add more tests for parsing fields from TSVs
poikilotherm Mar 8, 2023
a13743a
Merge branch 'develop' into 7662-solrconfig
poikilotherm Aug 23, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion conf/solr/8.11.1/readme.md

This file was deleted.

1,410 changes: 0 additions & 1,410 deletions conf/solr/8.11.1/solrconfig.xml

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Type = forking
WorkingDirectory = /usr/local/solr/solr-8.11.1
ExecStart = /usr/local/solr/solr-8.11.1/bin/solr start -m 1g -j "jetty.host=127.0.0.1"
ExecStop = /usr/local/solr/solr-8.11.1/bin/solr stop
Environment="SOLR_OPTS=-Dsolr.jetty.request.header.size=102400"
LimitNOFILE=65000
LimitNPROC=65000
Restart=on-failure
Expand Down
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/admin/metadatacustomization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -516,7 +516,7 @@ the Solr schema configuration, including any enabled metadata schemas:

``curl "http://localhost:8080/api/admin/index/solr/schema"``

You can use :download:`update-fields.sh <../../../../conf/solr/8.11.1/update-fields.sh>` to easily add these to the
You can use :download:`update-fields.sh <../../../../modules/solr-configset/src/main/scripts/update-fields.sh>` to easily add these to the
Solr schema you installed for your Dataverse installation.

The script needs a target XML file containing your Solr schema. (See the :doc:`/installation/prerequisites/` section of
Expand Down
6 changes: 5 additions & 1 deletion doc/sphinx-guides/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@
sys.path.insert(0, os.path.abspath('../../'))
import sphinx_bootstrap_theme

import xml.etree.ElementTree as et
pom = et.parse("../../../modules/dataverse-parent/pom.xml")
ns = {"mvn": "http://maven.apache.org/POM/4.0.0"}

# Activate the theme.
# html_theme = 'bootstrap'
Expand Down Expand Up @@ -438,4 +441,5 @@
rst_prolog = """
.. |toctitle| replace:: Contents:
.. |anotherSub| replace:: Yes, there can be multiple.
"""
.. |solr_version| replace:: {solr_version}
""".format(solr_version=pom.find("./mvn:properties/mvn:solr.version", ns).text)
169 changes: 125 additions & 44 deletions doc/sphinx-guides/source/installation/prerequisites.rst
Original file line number Diff line number Diff line change
Expand Up @@ -154,86 +154,164 @@ Configuring Database Access for the Dataverse Installation (and the Dataverse So
Solr
----

The Dataverse Software search index is powered by Solr.
The Dataverse Software search index is powered by `Solr <https://solr.apache.org>`_.

Supported Versions
==================

The Dataverse Software has been tested with Solr version 8.11.1. Future releases in the 8.x series are likely to be compatible; however, this cannot be confirmed until they are officially tested. Major releases above 8.x (e.g. 9.x) are not supported.
The Dataverse Software has been tested with Solr version |solr_version|. Future releases in the 8.x series are likely to
be compatible; however, this cannot be confirmed until they are officially tested. Major releases above 8.x (e.g. 9.x)
are not supported.

- Releases up to 4.20 supported Solr 7.x.x.
- Releases 5.0 to 5.3 supported Solr 7.7.2.
- Releases 5.4 to 5.9 supported Solr 8.8.1.
- Releases since 5.10 support Solr |solr_version|


Installing Solr
===============

You should not run Solr as root. Create a user called ``solr`` and a directory to install Solr into::
Note: this guide describes setting up a small installation, using the Solr *standalone* mode. For larger installations or
higher availability requirements, please take a look at `Solr Cloud <https://solr.apache.org/guide/solrcloud.html>`_ mode.

Optional Step 0:

useradd solr
- Solr launches asynchronously and attempts to use the ``lsof`` binary to watch for its own availability.
Installation of this package isn't required but will prevent a warning in the log at startup. (Use ``dnf``, ``yum`` or
``apt-get`` to install this standard package.)
- Solr 8.x runs on Java 11 (same as your Dataverse installation). Remember to install it when running Solr on a
separated machine.

**Step 1**: You should **not** run Solr as ``root``! Create a user and group called ``solr`` (as root) and a directory to
install Solr into:

.. parsed-literal::
mkdir /usr/local/solr
chown solr:solr /usr/local/solr
groupadd -r --gid 8983 solr
useradd -r --home-dir /usr/local/solr --uid 8983 --gid 8983 solr
chown solr: /usr/local/solr

Become the ``solr`` user and then download and configure Solr::
**Step 2:** Become the ``solr`` user and then download and configure Solr:

su - solr
.. parsed-literal::
sudo -u solr -s
cd /usr/local/solr
wget https://archive.apache.org/dist/lucene/solr/8.11.1/solr-8.11.1.tgz
tar xvzf solr-8.11.1.tgz
cd solr-8.11.1
cp -r server/solr/configsets/_default server/solr/collection1
wget https://archive.apache.org/dist/lucene/solr/|solr_version|/solr-|solr_version|.tgz
tar xvzf solr-|solr_version|.tgz
exit

You should already have a "dvinstall.zip" file that you downloaded from https://github.com/IQSS/dataverse/releases . Unzip it into ``/tmp``. Then copy the files into place::

cp /tmp/dvinstall/schema*.xml /usr/local/solr/solr-8.11.1/server/solr/collection1/conf
cp /tmp/dvinstall/solrconfig.xml /usr/local/solr/solr-8.11.1/server/solr/collection1/conf
Solr Init Script
================

Note: The Dataverse Project team has customized Solr to boost results that come from certain indexed elements inside the Dataverse installation, for example prioritizing results from Dataverse collections over Datasets. If you would like to remove this, edit your ``solrconfig.xml`` and remove the ``<str name="qf">`` element and its contents. If you have ideas about how this boosting could be improved, feel free to contact us through our Google Group https://groups.google.com/forum/#!forum/dataverse-dev .
**Step 3:** Once you installed Solr, you need to add to the init system to start on boot, stop on shutdown etc. Please choose the
right option for your underlying Linux operating system. *It will not be necessary to execute both!*

A Dataverse installation requires a change to the ``jetty.xml`` file that ships with Solr. Edit ``/usr/local/solr/solr-8.11.1/server/etc/jetty.xml`` , increasing ``requestHeaderSize`` from ``8192`` to ``102400``
SystemD based systems
^^^^^^^^^^^^^^^^^^^^^

Solr will warn about needing to increase the number of file descriptors and max processes in a production environment but will still run with defaults. We have increased these values to the recommended levels by adding ulimit -n 65000 to the init script, and the following to ``/etc/security/limits.conf``::
For systems running systemd (like RedHat or derivatives since 7, Debian since 9, Ubuntu since 15.04), as root, download
:download:`solr.service<../_static/installation/files/etc/systemd/solr.service>` and place it in ``/tmp``. Then start
Solr and configure it to start at boot with the following commands (run as root again):

solr soft nproc 65000
solr hard nproc 65000
solr soft nofile 65000
solr hard nofile 65000
.. parsed-literal::
cp /tmp/solr.service /etc/systemd/system
systemctl daemon-reload
systemctl start solr.service
systemctl enable solr.service

On operating systems which use systemd such as RHEL/derivative, you may then add a line like LimitNOFILE=65000 for the number of open file descriptors and a line with LimitNPROC=65000 for the max processes to the systemd unit file, or adjust the limits on a running process using the prlimit tool::
SysVinit based systems
^^^^^^^^^^^^^^^^^^^^^^

# sudo prlimit --pid pid --nofile=65000:65000
For (older) systems using init.d (like CentOS 6 or Devuan), download this :download:`Solr init script <../_static/installation/files/etc/init.d/solr>`
and place it in ``/tmp``. Then start Solr and configure it to start at boot with the following commands (run as root again):

Solr launches asynchronously and attempts to use the ``lsof`` binary to watch for its own availability. Installation of this package isn't required but will prevent a warning in the log at startup::
.. parsed-literal::
cp /tmp/solr /etc/init.d
service start solr
chkconfig solr on

# yum install lsof

Finally, you need to tell Solr to create the core "collection1" on startup::
Creating Solr Core
==================

echo "name=collection1" > /usr/local/solr/solr-8.11.1/server/solr/collection1/core.properties
Solr Cores hold the actual data of your index. They get created from templates called "config sets". We provide a
template that has been tuned carefully for usage within a Dataverse installation and is distributed as a ZIP file.

Solr Init Script
================
Note: The Dataverse Project team has customized the cores ``solrconfig.xml`` to boost Solr search results that come from
certain indexed elements inside the Dataverse installation, for example prioritizing results from Dataverse collections
over Datasets. If you would like to remove this, edit this file and remove the ``<str name="qf">`` element and its
contents. If you have ideas about how this boosting could be improved, feel free to contact us through our
`Google Group <https://groups.google.com/forum/#!forum/dataverse-dev>`_.

Please choose the right option for your underlying Linux operating system.
It will not be necessary to execute both!
**Step 4:** If not already done, please download the latest release package ``dvinstall.zip`` at
https://github.com/IQSS/dataverse/releases.

For systems running systemd (like RedHat or derivatives since 7, Debian since 9, Ubuntu since 15.04), as root, download :download:`solr.service<../_static/installation/files/etc/systemd/solr.service>` and place it in ``/tmp``. Then start Solr and configure it to start at boot with the following commands::
**Step 5:** Extract our Solr Dataverse config set from it and unpack the configset directory:

cp /tmp/solr.service /etc/systemd/system
systemctl daemon-reload
systemctl start solr.service
systemctl enable solr.service
.. parsed-literal::
sudo -u solr -s
cd solr-|solr_version|/server/solr/configsets
unzip path/to/dvinstall.zip solr-configset.zip
unzip solr-configset.zip

For systems using init.d (like CentOS 6), download this :download:`Solr init script <../_static/installation/files/etc/init.d/solr>` and place it in ``/tmp``. Then start Solr and configure it to start at boot with the following commands::
**Step 6:** Create the core within your running Solr instance:

cp /tmp/solr /etc/init.d
service start solr
chkconfig solr on
.. parsed-literal::
/usr/local/solr/solr-|solr_version|/bin/solr create -c collection1 -d dataverse


Tuning Solr
===========

The next steps are mostly extracted from the recommendations for
`"Taking Solr to Production" <https://solr.apache.org/guide/taking-solr-to-production.html>`_.

They are mostly necessary for older Linux distributions using System V init systems. If you are using our
SystemD unit file (see above), they may be skipped.

1. A Dataverse installation requires a change to the ``jetty.xml`` file that ships with Solr.
Edit ``/usr/local/solr/*/server/etc/jetty.xml`` , increasing ``requestHeaderSize`` from ``8192`` to ``102400``.

Alternative: use ``SOLR_OPTS`` to set the system property (see Solr docs linked above).

2. Solr will warn about needing to increase the number of file descriptors and max processes in a production environment
but will still run with defaults. We have increased these values to the recommended levels by adding ulimit -n 65000
to the init script, and the following to ``/etc/security/limits.conf``:

.. parsed-literal::
solr soft nproc 65000
solr hard nproc 65000
solr soft nofile 65000
solr hard nofile 65000

Note: This is not necessary with SystemD, which ignores these settings (see unit file instead)!
If not using our unit file, you may need to add a line like ``LimitNOFILE=65000`` for the number of open file
descriptors and a line with ``LimitNPROC=65000`` for the max processes to the systemd unit file.

Alternative: adjust the limits on a running process using the ``prlimit`` tool:

.. parsed-literal::
sudo prlimit --pid pid --nofile=65000:65000

Securing Solr
=============

Our sample init script and systemd service file linked above tell Solr to only listen on localhost (127.0.0.1). We strongly recommend that you also use a firewall to block access to the Solr port (8983) from outside networks, for added redundancy.
Our sample init script and systemd service file linked above tell Solr to only listen on localhost (127.0.0.1). We
strongly recommend that you also use a firewall to block access to the Solr port (8983) from outside networks, for
added redundancy.

It is **very important** not to allow direct access to the Solr API from outside networks! Otherwise, any host that can reach the Solr port (8983 by default) can add or delete data, search unpublished data, and even reconfigure Solr. For more information, please see https://lucene.apache.org/solr/guide/7_3/securing-solr.html. A particularly serious security issue that has been identified recently allows a potential intruder to remotely execute arbitrary code on the system. See `RCE in Solr via Velocity Template <https://github.com/veracode-research/solr-injection#7-cve-2019-xxxx-rce-via-velocity-template-by-_s00py>`_ for more information.
It is **very important** not to allow direct access to the Solr API from outside networks! Otherwise, any host that can
reach the Solr port (8983 by default) can add or delete data, search unpublished data, and even reconfigure Solr. For
more information, please see https://lucene.apache.org/solr/guide/7_3/securing-solr.html. A particularly serious
security issue that has been identified recently allows a potential intruder to remotely execute arbitrary code on the
system. See `RCE in Solr via Velocity Template <https://github.com/veracode-research/solr-injection#7-cve-2019-xxxx-rce-via-velocity-template-by-_s00py>`_
for more information.

If you're running your Dataverse installation across multiple service hosts you'll want to remove the jetty.host argument (``-j jetty.host=127.0.0.1``) from the startup command line, but make sure Solr is behind a firewall and only accessible by the Dataverse installation host(s), by specific ip address(es).
If you're running your Dataverse installation across multiple service hosts you'll want to remove the jetty.host
argument (``-j jetty.host=127.0.0.1``) from the startup command line, but make sure Solr is behind a firewall and only
accessible by the Dataverse installation host(s), by specific ip address(es).

We additionally recommend that the Solr service account's shell be disabled, as it isn't necessary for daily operation::

Expand All @@ -247,7 +325,10 @@ or simply prepend each command you would run as the Solr user with "sudo -u solr

# sudo -u solr command

Finally, we would like to reiterate that it is simply never a good idea to run Solr as root! Running the process as a non-privileged user would substantially minimize any potential damage even in the event that the instance is compromised.
Finally, we would like to reiterate that it is simply never a good idea to run Solr as root! Running the process as
non-privileged user would substantially minimize any potential damage even in the event that the instance is compromised.



jq
--
Expand Down
17 changes: 14 additions & 3 deletions modules/dataverse-parent/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
<module>../../scripts/zipdownload</module>
<module>../container-base</module>
<module>../dataverse-spi</module>
<module>../solr-configset</module>
</modules>

<!-- Transitive dependencies, bigger library "bill of materials" (BOM) and
Expand Down Expand Up @@ -53,6 +54,11 @@
<artifactId>postgresql</artifactId>
<version>${postgresql.version}</version>
</dependency>
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-solrj</artifactId>
<version>${solr.version}</version>
</dependency>
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
Expand Down Expand Up @@ -126,6 +132,12 @@
<scope>import</scope>
<type>pom</type>
</dependency>

<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>${junit.jupiter.version}</version>
</dependency>
</dependencies>
</dependencyManagement>

Expand Down Expand Up @@ -200,6 +212,8 @@

<!-- Container related -->
<fabric8-dmp.version>0.43.0</fabric8-dmp.version>
<solr.collection>collection1</solr.collection>
<solr.configset>dataverse</solr.configset> <!-- Note: container usage only for now, might be reused in code -->
</properties>

<pluginRepositories>
Expand Down Expand Up @@ -407,7 +421,6 @@
-->
<!-- <payara.version>5.2022.5</payara.version> -->
</properties>

<build>
<plugins>
<!-- This will get the current commit id to include in image tags of container builds as Maven properties -->
Expand All @@ -433,8 +446,6 @@
</plugin>
</plugins>
</build>

</profile>
</profiles>

</project>
Loading