-
Notifications
You must be signed in to change notification settings - Fork 535
Description
Mistake
Since we upgraded from Solr 7.3.0, we made one bad mistake (mea culpa, too): we did not adapt the luceneMatchVersion to the version of the running server.
Other changes
We also did not incorporate upstream changes to solrconfig.xml:
--- solrconfig.xml 2021-03-08 10:29:37.810488567 +0100
+++ solrconfig-881.xml 2021-02-12 19:56:43.000000000 +0100
@@ -35,7 +35,7 @@
that you fully re-index after changing this setting as it can
affect both how text is indexed and queried.
-->
- <luceneMatchVersion>7.3.0</luceneMatchVersion>
+ <luceneMatchVersion>8.8.1</luceneMatchVersion>
<!-- <lib/> directives can be used to instruct Solr to load any Jars
identified and use them to resolve any "plugins" specified in
@@ -69,20 +69,11 @@
If a 'dir' option (with or without a regex) is used and nothing
is found that matches, a warning will be logged.The formerly present JARs have been excluded since 8.0, see apache/lucene-solr@dce36c1
I don't know if we actually use any of those. Remove and look if it breaks.
- The examples below can be used to load some solr-contribs along
+ The example below can be used to load a solr-contrib along
with their external dependencies.
-->
- <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar" />
- <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-cell-\d.*\.jar" />
+ <!-- <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-ltr-\d.*\.jar" /> -->
- <lib dir="${solr.install.dir:../../../..}/contrib/clustering/lib/" regex=".*\.jar" />
- <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-clustering-\d.*\.jar" />
-
- <lib dir="${solr.install.dir:../../../..}/contrib/langid/lib/" regex=".*\.jar" />
- <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-langid-\d.*\.jar" />
-
- <lib dir="${solr.install.dir:../../../..}/contrib/velocity/lib" regex=".*\.jar" />
- <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-velocity-\d.*\.jar" />
<!-- an exact 'path' can be used instead of a 'dir' to specify a
specific jar file. This will cause a serious error to be logged
if it can't be loaded.These are newer changes we should incorporate.
@@ -161,6 +152,15 @@
<!-- <ramBufferSizeMB>100</ramBufferSizeMB> -->
<!-- <maxBufferedDocs>1000</maxBufferedDocs> -->
+ <!-- Expert: ramPerThreadHardLimitMB sets the maximum amount of RAM that can be consumed
+ per thread before they are flushed. When limit is exceeded, this triggers a forced
+ flush even if ramBufferSizeMB has not been exceeded.
+ This is a safety limit to prevent Lucene's DocumentsWriterPerThread from address space
+ exhaustion due to its internal 32 bit signed integer based memory addressing.
+ The specified value should be greater than 0 and less than 2048MB. When not specified,
+ Solr uses Lucene's default value 1945. -->
+ <!-- <ramPerThreadHardLimitMB>1945</ramPerThreadHardLimitMB> -->
+
<!-- Expert: Merge Policy
The Merge Policy in Lucene controls how merging of segments is done.
The default since Solr/Lucene 3.3 is TieredMergePolicy.
@@ -367,23 +367,32 @@
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
<query>
- <!-- Maximum number of clauses in each BooleanQuery, an exception
- is thrown if exceeded. It is safe to increase or remove this setting,
- since it is purely an arbitrary limit to try and catch user errors where
- large boolean queries may not be the best implementation choice.
+ <!-- Maximum number of clauses allowed when parsing a boolean query string.
+
+ This limit only impacts boolean queries specified by a user as part of a query string,
+ and provides per-collection controls on how complex user specified boolean queries can
+ be. Query strings that specify more clauses then this will result in an error.
+
+ If this per-collection limit is greater then the global `maxBooleanClauses` limit
+ specified in `solr.xml`, it will have no effect, as that setting also limits the size
+ of user specified boolean queries.
-->
- <maxBooleanClauses>1024</maxBooleanClauses>
+ <maxBooleanClauses>${solr.max.booleanClauses:1024}</maxBooleanClauses>
<!-- Solr Internal Query Caches
- There are two implementations of cache available for Solr,
- LRUCache, based on a synchronized LinkedHashMap, and
- FastLRUCache, based on a ConcurrentHashMap.
+ There are four implementations of cache available for Solr:
+ LRUCache, based on a synchronized LinkedHashMap,
+ LFUCache and FastLRUCache, based on a ConcurrentHashMap, and CaffeineCache -
+ a modern and robust cache implementation. Note that in Solr 9.0
+ only CaffeineCache will be available, other implementations are now
+ deprecated.
FastLRUCache has faster gets and slower puts in single
threaded operation and thus is generally faster than LRUCache
when the hit ratio of the cache is high (> 75%), and may be
faster under other scenarios on multi-cpu systems.
+ Starting with Solr 9.0 the default cache implementation used is CaffeineCache.
-->
<!-- Filter Cache
@@ -403,13 +412,12 @@
initialSize - the initial capacity (number of entries) of
the cache. (see java.util.HashMap)
autowarmCount - the number of entries to prepopulate from
- and old cache.
+ an old cache.
maxRamMB - the maximum amount of RAM (in MB) that this cache is allowed
to occupy. Note that when this option is specified, the size
and initialSize parameters are ignored.
-->
- <filterCache class="solr.FastLRUCache"
- size="512"
+ <filterCache size="512"
initialSize="512"
autowarmCount="0"/>
@@ -421,8 +429,7 @@
maxRamMB - the maximum amount of RAM (in MB) that this cache is allowed
to occupy
-->
- <queryResultCache class="solr.LRUCache"
- size="512"
+ <queryResultCache size="512"
initialSize="512"
autowarmCount="0"/>
@@ -432,14 +439,12 @@
document). Since Lucene internal document ids are transient,
this cache will not be autowarmed.
-->
- <documentCache class="solr.LRUCache"
- size="512"
+ <documentCache size="512"
initialSize="512"
autowarmCount="0"/>
<!-- custom cache currently used by block join -->
<cache name="perSegFilter"
- class="solr.search.LRUCache"
size="10"
initialSize="0"
autowarmCount="10"
@@ -452,8 +457,7 @@
even if not configured here.
-->
<!--
- <fieldValueCache class="solr.FastLRUCache"
- size="512"
+ <fieldValueCache size="512"
autowarmCount="128"
showItems="32" />
-->
@@ -469,7 +473,6 @@
-->
<!--
<cache name="myUserCache"
- class="solr.LRUCache"
size="4096"
initialSize="1024"
autowarmCount="1024"
@@ -521,6 +524,23 @@
-->
<queryResultMaxDocsCached>200</queryResultMaxDocsCached>
+ <!-- Use Filter For Sorted Query
+
+ A possible optimization that attempts to use a filter to
+ satisfy a search. If the requested sort does not include
+ score, then the filterCache will be checked for a filter
+ matching the query. If found, the filter will be used as the
+ source of document ids, and then the sort will be applied to
+ that.
+
+ For most situations, this will not be useful unless you
+ frequently get the same search repeatedly with different sort
+ options, and none of them ever use "score"
+-->
+ <!--
+ <useFilterForSortedQuery>true</useFilterForSortedQuery>
+ -->
+
<!-- Query Related Event Listeners
Various IndexSearcher related events can trigger Listeners to
@@ -569,6 +589,64 @@
</query>
+ <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ Circuit Breaker Section - This section consists of configurations for
+ circuit breakers
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
+
+ <!-- Circuit Breakers
+
+ Circuit breakers are designed to allow stability and predictable query
+ execution. They prevent operations that can take down the node and cause
+ noisy neighbour issues.
+
+ This flag is the uber control switch which controls the activation/deactivation of all circuit
+ breakers. If a circuit breaker wishes to be independently configurable,
+ they are free to add their specific configuration but need to ensure that this flag is always
+ respected - this should have veto over all independent configuration flags.
+ -->
+ <circuitBreakers enabled="true">
+
+ <!-- Memory Circuit Breaker Configuration
+
+ Specific configuration for max JVM heap usage circuit breaker. This configuration defines whether
+ the circuit breaker is enabled and the threshold percentage of maximum heap allocated beyond which queries will be rejected until the
+ current JVM usage goes below the threshold. The valid value range for this value is 50-95.
+
+ Consider a scenario where the max heap allocated is 4 GB and memoryCircuitBreakerThreshold is
+ defined as 75. Threshold JVM usage will be 4 * 0.75 = 3 GB. Its generally a good idea to keep this value between 75 - 80% of maximum heap
+ allocated.
+
+ If, at any point, the current JVM heap usage goes above 3 GB, queries will be rejected until the heap usage goes below 3 GB again.
+ If you see queries getting rejected with 503 error code, check for "Circuit Breakers tripped"
+ in logs and the corresponding error message should tell you what transpired (if the failure
+ was caused by tripped circuit breakers).
+
+ If, at any point, the current JVM heap usage goes above 3 GB, queries will be rejected until the heap usage goes below 3 GB again.
+ If you see queries getting rejected with 503 error code, check for "Circuit Breakers tripped"
+ in logs and the corresponding error message should tell you what transpired (if the failure
+ was caused by tripped circuit breakers).
+ -->
+ <!--
+ <memBreaker enabled="true" threshold="75"/>
+ -->
+
+ <!-- CPU Circuit Breaker Configuration
+
+ Specific configuration for CPU utilization based circuit breaker. This configuration defines whether the circuit breaker is enabled
+ and the average load over the last minute at which the circuit breaker should start rejecting queries.
+
+ Consider a scenario where the max heap allocated is 4 GB and memoryCircuitBreakerThreshold is
+ defined as 75. Threshold JVM usage will be 4 * 0.75 = 3 GB. Its generally a good idea to keep this value between 75 - 80% of maximum heap
+ allocated.
+ -->
+
+ <!--
+ <cpuBreaker enabled="true" threshold="75"/>
+ -->
+
+ </circuitBreakers>
+
<!-- Request DispatcherThese are definitly changes we did. I don't know why they happened (it's really tricky to find its sources) and I don't know if this is actually used.
@@ -693,48 +771,6 @@
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
- <str name="defType">edismax</str>
- <float name="tie">0.075</float>
- <str name="qf">
- dvName^400
- authorName^180
- dvSubject^190
- dvDescription^180
- dvAffiliation^170
- title^130
- subject^120
- keyword^110
- topicClassValue^100
- dsDescriptionValue^90
- authorAffiliation^80
- publicationCitation^60
- producerName^50
- fileName^30
- fileDescription^30
- variableLabel^20
- variableName^10
- _text_^1.0
- </str>
- <str name="pf">
- dvName^200
- authorName^100
- dvSubject^100
- dvDescription^100
- dvAffiliation^100
- title^75
- subject^75
- keyword^75
- topicClassValue^75
- dsDescriptionValue^75
- authorAffiliation^75
- publicationCitation^75
- producerName^75
- </str>
- <!-- Even though this number is huge it only seems to apply a boost of ~1.5x to final result -MAD 4.9.3-->
- <str name="bq">
- isHarvested:false^25000
- </str>
-
<!-- Default search field
<str name="df">text</str>
-->
@@ -805,43 +841,12 @@
</lst>
</requestHandler>More changes by upstream, should be incorporated. (Seems related to the same change in apache/lucene-solr@dce36c1)
-
- <!-- A Robust Example
-
- This example SearchHandler declaration shows off usage of the
- SearchHandler with many defaults declared
-
- Note that multiple instances of the same Request Handler
- (SearchHandler) can be registered multiple times with different
- names (and different init parameters)
- -->
- <requestHandler name="/browse" class="solr.SearchHandler" useParams="query,facets,velocity,browse">
- <lst name="defaults">
- <str name="echoParams">explicit</str>
- </lst>
- </requestHandler>
-
- <initParams path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
+ <initParams path="/update/**,/query,/select,/spell">
<lst name="defaults">
<str name="df">_text_</str>
</lst>
</initParams>
- <!-- Solr Cell Update Request Handler
-
- http://wiki.apache.org/solr/ExtractingRequestHandler
-
- -->
- <requestHandler name="/update/extract"
- startup="lazy"
- class="solr.extraction.ExtractingRequestHandler" >
- <lst name="defaults">
- <str name="lowernames">true</str>
- <str name="fmap.meta">ignored_</str>
- <str name="fmap.content">_text_</str>
- </lst>
- </requestHandler>
-
<!-- Search Components
Search components are registered to SolrCore and used by
@@ -972,30 +977,6 @@
</arr>
</requestHandler>
- <!-- Term Vector Component
-
- http://wiki.apache.org/solr/TermVectorComponent
- -->
- <searchComponent name="tvComponent" class="solr.TermVectorComponent"/>
-
- <!-- A request handler for demonstrating the term vector component
-
- This is purely as an example.
-
- In reality you will likely want to add the component to your
- already specified request handlers.
- -->
- <requestHandler name="/tvrh" class="solr.SearchHandler" startup="lazy">
- <lst name="defaults">
- <bool name="tv">true</bool>
- </lst>
- <arr name="last-components">
- <str>tvComponent</str>
- </arr>
- </requestHandler>
-
- <!-- Clustering Component. (Omitted here. See the default Solr example for a typical configuration.) -->
-
<!-- Terms Component
http://wiki.apache.org/solr/TermsComponent
@@ -1016,30 +997,6 @@
</arr>
</requestHandler>
-
- <!-- Query Elevation Component
-
- http://wiki.apache.org/solr/QueryElevationComponent
-
- a search component that enables you to configure the top
- results for a given query regardless of the normal lucene
- scoring.
- -->
- <searchComponent name="elevator" class="solr.QueryElevationComponent" >
- <!-- pick a fieldType to analyze queries -->
- <str name="queryFieldType">string</str>
- </searchComponent>
-
- <!-- A request handler for demonstrating the elevator component -->
- <requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy">
- <lst name="defaults">
- <str name="echoParams">explicit</str>
- </lst>
- <arr name="last-components">
- <str>elevator</str>
- </arr>
- </requestHandler>
-
<!-- Highlighting Component
http://wiki.apache.org/solr/HighlightingParameters🚨 THIS IS CRUCIAL FOR US. Newer versions of Solr default to the managed schema factory that @pkiraly suggested in #5989.
@@ -1170,8 +1127,6 @@
See http://wiki.apache.org/solr/GuessingFieldTypes
-->
-<schemaFactory class="ClassicIndexSchemaFactory"/>
-
<updateProcessor class="solr.UUIDUpdateProcessorFactory" name="uuid"/>
<updateProcessor class="solr.RemoveBlankFieldUpdateProcessorFactory" name="remove-blank"/>
<updateProcessor class="solr.FieldNameMutatingUpdateProcessorFactory" name="field-name-mutating">These have been changed by upstream and as they seem to use regexes now, should be OK to incorporate.
@@ -1183,28 +1138,16 @@
<updateProcessor class="solr.ParseDoubleFieldUpdateProcessorFactory" name="parse-double"/>
<updateProcessor class="solr.ParseDateFieldUpdateProcessorFactory" name="parse-date">
<arr name="format">
- <str>yyyy-MM-dd'T'HH:mm:ss.SSSZ</str>
- <str>yyyy-MM-dd'T'HH:mm:ss,SSSZ</str>
- <str>yyyy-MM-dd'T'HH:mm:ss.SSS</str>
- <str>yyyy-MM-dd'T'HH:mm:ss,SSS</str>
- <str>yyyy-MM-dd'T'HH:mm:ssZ</str>
- <str>yyyy-MM-dd'T'HH:mm:ss</str>
- <str>yyyy-MM-dd'T'HH:mmZ</str>
- <str>yyyy-MM-dd'T'HH:mm</str>
- <str>yyyy-MM-dd HH:mm:ss.SSSZ</str>
- <str>yyyy-MM-dd HH:mm:ss,SSSZ</str>
- <str>yyyy-MM-dd HH:mm:ss.SSS</str>
- <str>yyyy-MM-dd HH:mm:ss,SSS</str>
- <str>yyyy-MM-dd HH:mm:ssZ</str>
- <str>yyyy-MM-dd HH:mm:ss</str>
- <str>yyyy-MM-dd HH:mmZ</str>
- <str>yyyy-MM-dd HH:mm</str>
- <str>yyyy-MM-dd</str>
+ <str>yyyy-MM-dd['T'[HH:mm[:ss[.SSS]][z</str>
+ <str>yyyy-MM-dd['T'[HH:mm[:ss[,SSS]][z</str>
+ <str>yyyy-MM-dd HH:mm[:ss[.SSS]][z</str>
+ <str>yyyy-MM-dd HH:mm[:ss[,SSS]][z</str>
+ <str>[EEE, ]dd MMM yyyy HH:mm[:ss] z</str>
+ <str>EEEE, dd-MMM-yy HH:mm:ss z</str>
+ <str>EEE MMM ppd HH:mm:ss [z ]yyyy</str>
</arr>
</updateProcessor>Is the removal of this processors still a thing?
-
- <!--Dataverse removed-->
-<!-- <updateProcessor class="solr.AddSchemaFieldsUpdateProcessorFactory" name="add-schema-fields">
+ <updateProcessor class="solr.AddSchemaFieldsUpdateProcessorFactory" name="add-schema-fields">
<lst name="typeMapping">
<str name="valueClass">java.lang.String</str>
<str name="fieldType">text_general</str>
@@ -1212,7 +1155,7 @@
<str name="dest">*_str</str>
<int name="maxChars">256</int>
</lst>
-
+ <!-- Use as default mapping instead of defaultFieldType -->
<bool name="default">true</bool>
</lst>
<lst name="typeMapping">
@@ -1232,11 +1175,11 @@
<str name="valueClass">java.lang.Number</str>
<str name="fieldType">pdoubles</str>
</lst>
- </updateProcessor> -->
+ </updateProcessor>We should us the setting to disable this instead of changing the default... 🙈
<!-- The update.autoCreateFields property can be turned to false to disable schemaless mode -->
- <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:false}"
- processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date">
+ <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:true}"
+ processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
<processor class="solr.LogUpdateProcessorFactory"/>
<processor class="solr.DistributedUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
@@ -1265,46 +1208,6 @@
</updateRequestProcessorChain>
-->More upstream due to the libs removed. Looks like we never configured those.
- <!-- Language identification
-
- This example update chain identifies the language of the incoming
- documents using the langid contrib. The detected language is
- written to field language_s. No field name mapping is done.
- The fields used for detection are text, title, subject and description,
- making this example suitable for detecting languages form full-text
- rich documents injected via ExtractingRequestHandler.
- See more about langId at http://wiki.apache.org/solr/LanguageDetection
- -->
- <!--
- <updateRequestProcessorChain name="langid">
- <processor class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory">
- <str name="langid.fl">text,title,subject,description</str>
- <str name="langid.langField">language_s</str>
- <str name="langid.fallback">en</str>
- </processor>
- <processor class="solr.LogUpdateProcessorFactory" />
- <processor class="solr.RunUpdateProcessorFactory" />
- </updateRequestProcessorChain>
- -->
-
- <!-- Script update processor
-
- This example hooks in an update processor implemented using JavaScript.
-
- See more about the script update processor at http://wiki.apache.org/solr/ScriptUpdateProcessor
- -->
- <!--
- <updateRequestProcessorChain name="script">
- <processor class="solr.StatelessScriptUpdateProcessorFactory">
- <str name="script">update-script.js</str>
- <lst name="params">
- <str name="config_param">example config parameter</str>
- </lst>
- </processor>
- <processor class="solr.RunUpdateProcessorFactory" />
- </updateRequestProcessorChain>
- -->
-
<!-- Response Writers
http://wiki.apache.org/solr/QueryResponseWriter
@@ -1340,23 +1243,6 @@
<str name="content-type">text/plain; charset=UTF-8</str>
</queryResponseWriter>
- <!--
- Custom response writers can be declared as needed...
- -->
- <queryResponseWriter name="velocity" class="solr.VelocityResponseWriter" startup="lazy">
- <str name="template.base.dir">${velocity.template.base.dir:}</str>
- <str name="solr.resource.loader.enabled">${velocity.solr.resource.loader.enabled:true}</str>
- <str name="params.resource.loader.enabled">${velocity.params.resource.loader.enabled:false}</str>
- </queryResponseWriter>
-
- <!-- XSLT response writer transforms the XML output by any xslt file found
- in Solr's conf/xslt directory. Changes to xslt files are checked for
- every xsltCacheLifetimeSeconds.
- -->
- <queryResponseWriter name="xslt" class="solr.XSLTResponseWriter">
- <int name="xsltCacheLifetimeSeconds">5</int>
- </queryResponseWriter>
-
<!-- Query Parsers
https://lucene.apache.org/solr/guide/query-syntax-and-parsing.htmlConclusion
Instead of maintaining a static config, we should rely on using the _default configset and apply our changes to it.
At least this is what I'm going to do in the Dataverse Solr container images.