Skip to content

Conversation

@janhoy
Copy link
Contributor

@janhoy janhoy commented Oct 16, 2025

https://issues.apache.org/jira/browse/SOLR-17961

Making 'tikaserver' the default. This effectively removes all bundled Tika Parsers.

Review checklist (please tick if you reviewed that part):

  • Code changes when slicing out the backend from ExtractionHandler
  • Gradle build - did we remove the correct dependencies?
  • licenses folder - are the corresponding files gone?
  • NOTICE file - did we remove the correct Copyright statements?
  • Reference guide indexing-with-tika.adoc - does it still read well?
  • Make sure starting TikaServer is mentioned in various tutorials / docs
  • Give proper error message if someone starts Solr 10 with extracting handler without tikaserver.url configured
  • Give error messages if users try to configure old, removed configuration options
  • Fix the BATS test test_extraction.bats to work, if so it must stand up a TikaServer somehow, or remove it?

Will be merged to main, branch_10x and branch_10_0

@github-actions github-actions bot added documentation Improvements or additions to documentation dependencies Dependency upgrades module:extraction tool:build tests labels Oct 16, 2025
@janhoy janhoy requested review from Copilot and epugh October 16, 2025 23:43
@janhoy janhoy changed the title SOLR-17961 Rremove deprecated Tika Extraction Backend SOLR-17961 Remove deprecated Tika Extraction Backend Oct 16, 2025
@janhoy janhoy requested a review from dsmiley October 16, 2025 23:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes the deprecated in-process Tika backend and consolidates Solr’s ExtractingRequestHandler to use only the external Tika Server backend.

  • Remove LocalTikaExtractionBackend and related code/tests; require Tika Server (tikaserver) exclusively
  • Update documentation to reflect tikaserver-only support and configuration
  • Clean up module dependencies and licenses associated with Tika parsers and other removed components

Reviewed Changes

Copilot reviewed 186 out of 194 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
solr/solr-ref-guide/modules/upgrade-notes/pages/major-changes-in-solr-10.adoc Adds upgrade note about removing LocalTikaExtractionBackend and requiring tikaserver.url
solr/solr-ref-guide/modules/indexing-guide/pages/indexing-with-tika.adoc Rewrites Solr Cell docs for tikaserver-only; updates examples, parameters, and Tika Server config
solr/modules/extraction/src/java/org/apache/solr/handler/extraction/ExtractingRequestHandler.java Removes local backend configuration and enforcement; requires tikaserver.url; validates backend name
solr/modules/extraction/src/java/org/apache/solr/handler/extraction/ExtractionBackend.java Updates method Javadoc to remove local/tikaserver examples
solr/modules/extraction/src/java/org/apache/solr/handler/extraction/ParseContextConfig.java Removes parse-context configuration class (no longer applicable)
solr/modules/extraction/src/java/org/apache/solr/handler/extraction/LocalTikaExtractionBackend.java Removes Local Tika backend implementation
solr/modules/extraction/src/test/** Removes tests specific to the local backend and parse context config; adjusts test solrconfig defaults
solr/modules/extraction/build.gradle Drops Tika parsers dependency, retains tika-core and Jetty client for tikaserver backend
solr/modules/extraction/gradle.lockfile Prunes dependencies associated with removed parsers and transitive artifacts
solr/licenses/** Removes licenses/notice files for dependencies no longer used

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

janhoy and others added 5 commits October 17, 2025 01:56
…-tika.adoc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ction/ExtractingRequestHandler.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…in-solr-10.adoc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@epugh
Copy link
Contributor

epugh commented Oct 17, 2025

This is an impressive PR! -1001 lines for adding 128 lines. I look forward to seeing Solrbot close a million PR's that are no longer needed!

@janhoy
Copy link
Contributor Author

janhoy commented Oct 17, 2025

This is an impressive PR! -1001 lines for adding 128 lines. I look forward to seeing Solrbot close a million PR's that are no longer needed!

11.106 lines removed, 131 added :)

@janhoy janhoy requested review from anshumg and gerlowskija October 17, 2025 07:12
Copy link
Contributor

@epugh epugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. Small questions.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 197 out of 205 changed files in this pull request and generated 5 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

janhoy and others added 6 commits October 20, 2025 17:00
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…-tika.adoc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…-tika.adoc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Instruction on how to stop Tika container.
Copy link
Contributor

@epugh epugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran through the bats tests and the tutorials and ref guide pages. couple small edits... Feedback on the bats tests, but they aren't critical! LGTM.... Ship it!

@janhoy
Copy link
Contributor Author

janhoy commented Oct 21, 2025

Thanks for the review. I'll do some moving of bats methods, remove last NOCOMMIT, get all tests passing and then ship.

@janhoy janhoy merged commit d8546d4 into apache:main Oct 22, 2025
6 of 7 checks passed
@janhoy janhoy deleted the SOLR-17961-remove-tika-backend branch October 22, 2025 20:32
janhoy added a commit that referenced this pull request Oct 22, 2025
Co-authored-by: Eric Pugh <epugh@opensourceconnections.com>
(cherry picked from commit d8546d4)
janhoy added a commit that referenced this pull request Oct 22, 2025
Co-authored-by: Eric Pugh <epugh@opensourceconnections.com>
(cherry picked from commit d8546d4)
janhoy added a commit to janhoy/solr that referenced this pull request Dec 23, 2025
Co-authored-by: Eric Pugh <epugh@opensourceconnections.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

configs dependencies Dependency upgrades documentation Improvements or additions to documentation module:extraction tests tool:build

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants