-
Notifications
You must be signed in to change notification settings - Fork 809
SOLR-17961 Remove deprecated Tika Extraction Backend #3784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR removes the deprecated in-process Tika backend and consolidates Solr’s ExtractingRequestHandler to use only the external Tika Server backend.
- Remove LocalTikaExtractionBackend and related code/tests; require Tika Server (tikaserver) exclusively
- Update documentation to reflect tikaserver-only support and configuration
- Clean up module dependencies and licenses associated with Tika parsers and other removed components
Reviewed Changes
Copilot reviewed 186 out of 194 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| solr/solr-ref-guide/modules/upgrade-notes/pages/major-changes-in-solr-10.adoc | Adds upgrade note about removing LocalTikaExtractionBackend and requiring tikaserver.url |
| solr/solr-ref-guide/modules/indexing-guide/pages/indexing-with-tika.adoc | Rewrites Solr Cell docs for tikaserver-only; updates examples, parameters, and Tika Server config |
| solr/modules/extraction/src/java/org/apache/solr/handler/extraction/ExtractingRequestHandler.java | Removes local backend configuration and enforcement; requires tikaserver.url; validates backend name |
| solr/modules/extraction/src/java/org/apache/solr/handler/extraction/ExtractionBackend.java | Updates method Javadoc to remove local/tikaserver examples |
| solr/modules/extraction/src/java/org/apache/solr/handler/extraction/ParseContextConfig.java | Removes parse-context configuration class (no longer applicable) |
| solr/modules/extraction/src/java/org/apache/solr/handler/extraction/LocalTikaExtractionBackend.java | Removes Local Tika backend implementation |
| solr/modules/extraction/src/test/** | Removes tests specific to the local backend and parse context config; adjusts test solrconfig defaults |
| solr/modules/extraction/build.gradle | Drops Tika parsers dependency, retains tika-core and Jetty client for tikaserver backend |
| solr/modules/extraction/gradle.lockfile | Prunes dependencies associated with removed parsers and transitive artifacts |
| solr/licenses/** | Removes licenses/notice files for dependencies no longer used |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
solr/solr-ref-guide/modules/indexing-guide/pages/indexing-with-tika.adoc
Outdated
Show resolved
Hide resolved
...modules/extraction/src/java/org/apache/solr/handler/extraction/ExtractingRequestHandler.java
Outdated
Show resolved
Hide resolved
...modules/extraction/src/java/org/apache/solr/handler/extraction/ExtractingRequestHandler.java
Outdated
Show resolved
Hide resolved
solr/solr-ref-guide/modules/upgrade-notes/pages/major-changes-in-solr-10.adoc
Show resolved
Hide resolved
…-tika.adoc Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ction/ExtractingRequestHandler.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…in-solr-10.adoc Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
This is an impressive PR! -1001 lines for adding 128 lines. I look forward to seeing Solrbot close a million PR's that are no longer needed! |
11.106 lines removed, 131 added :) |
epugh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm. Small questions.
solr/modules/extraction/src/java/org/apache/solr/handler/extraction/ExtractingParams.java
Show resolved
Hide resolved
solr/solr-ref-guide/modules/indexing-guide/pages/indexing-with-tika.adoc
Show resolved
Hide resolved
# Conflicts: # solr/CHANGES.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 197 out of 205 changed files in this pull request and generated 5 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
solr/solr-ref-guide/modules/indexing-guide/pages/post-tool.adoc
Outdated
Show resolved
Hide resolved
solr/solr-ref-guide/modules/indexing-guide/pages/indexing-with-tika.adoc
Outdated
Show resolved
Hide resolved
solr/solr-ref-guide/modules/indexing-guide/pages/indexing-with-tika.adoc
Outdated
Show resolved
Hide resolved
solr/modules/extraction/src/test-files/extraction/solr/collection1/conf/solrconfig.xml
Show resolved
Hide resolved
solr/solr-ref-guide/modules/getting-started/pages/tutorial-diy.adoc
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…-tika.adoc Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…-tika.adoc Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Instruction on how to stop Tika container.
epugh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran through the bats tests and the tutorials and ref guide pages. couple small edits... Feedback on the bats tests, but they aren't critical! LGTM.... Ship it!
|
Thanks for the review. I'll do some moving of bats methods, remove last NOCOMMIT, get all tests passing and then ship. |
Remove some copied classes from Tika 1.x Fix @keep for dockerfile-baseimage-java
…into SOLR-17961-remove-tika-backend
# Conflicts: # gradle/libs.versions.toml
Co-authored-by: Eric Pugh <epugh@opensourceconnections.com> (cherry picked from commit d8546d4)
Co-authored-by: Eric Pugh <epugh@opensourceconnections.com> (cherry picked from commit d8546d4)
Co-authored-by: Eric Pugh <epugh@opensourceconnections.com>
https://issues.apache.org/jira/browse/SOLR-17961
Making 'tikaserver' the default. This effectively removes all bundled Tika Parsers.
Review checklist (please tick if you reviewed that part):
licensesfolder - are the corresponding files gone?indexing-with-tika.adoc- does it still read well?tikaserver.urlconfiguredtest_extraction.batsto work, if so it must stand up a TikaServer somehow, or remove it?Will be merged to
main,branch_10xandbranch_10_0