Skip to content

Conversation

@janhoy
Copy link
Contributor

@janhoy janhoy commented Oct 17, 2025

https://issues.apache.org/jira/browse/SOLR-17960

This will be merged to main, branch_10x and branch_10_0.

@github-actions github-actions bot added documentation Improvements or additions to documentation dependencies Dependency upgrades configs tool:build tests cat:index module:langid labels Oct 17, 2025
{solr-javadocs}/modules/langid/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessorFactory.html[LangDetectLanguageIdentifierUpdateProcessorFactory]::: Identifies the language of a set of input fields using http://code.google.com/p/language-detection.

{solr-javadocs}/modules/langid/org/apache/solr/update/processor/TikaLanguageIdentifierUpdateProcessorFactory.html[TikaLanguageIdentifierUpdateProcessorFactory]::: Identifies the language of a set of input fields using Tika's LanguageIdentifier.
{solr-javadocs}/modules/langid/org/apache/solr/update/processor/OpenNLPLangDetectUpdateProcessorFactory.html[OpenNLPLangDetectUpdateProcessorFactory]::: Identifies the language of a set of input fields using OpenNLP's language detection model.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turned out the ref guide did not list the OpenNLP identifier...

@janhoy janhoy changed the title SOLR-17960 Remove deprecated Deprecate TikaLanguageIdentifierUpdateProcessor SOLR-17960 Remove deprecated TikaLanguageIdentifierUpdateProcessor Oct 17, 2025
@janhoy janhoy requested review from Copilot and epugh October 17, 2025 07:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Removes the deprecated TikaLanguageIdentifierUpdateProcessor which was deprecated in version 9.10. Users are directed to use LangDetectLanguageIdentifierUpdateProcessor or OpenNLPLangDetectUpdateProcessor as alternatives for language detection functionality.

  • Removes all TikaLanguageIdentifierUpdateProcessor related classes and test files
  • Updates documentation to remove references to Tika language detection
  • Updates configuration examples to use LangDetectLanguageIdentifierUpdateProcessor instead

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
major-changes-in-solr-10.adoc Adds upgrade note about TikaLanguageIdentifierUpdateProcessor removal
language-detection.adoc Removes Tika implementation from documentation and examples
update-request-processors.adoc Replaces Tika processor reference with OpenNLP processor
solrconfig.xml Updates sample configuration to use LangDetect instead of Tika
TikaLanguageIdentifierUpdateProcessorFactoryTest.java Removes test file for deprecated processor
LanguageIdentifierUpdateProcessorFactoryTestCase.java Removes Tika processor chain test setup
solrconfig-languageidentifier.xml Removes Tika processor configuration from test config
TikaLanguageIdentifierUpdateProcessorFactory.java Removes deprecated factory class
TikaLanguageIdentifierUpdateProcessor.java Removes deprecated processor class
gradle.lockfile Removes Tika dependency from lockfile
build.gradle Removes Tika core dependency
README.md Removes Tika dependency documentation
CHANGES.txt Adds changelog entry for removal

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

janhoy and others added 2 commits October 17, 2025 09:51
…in-solr-10.adoc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@janhoy janhoy requested a review from tballison October 17, 2025 07:53
Copy link
Contributor

@epugh epugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

@janhoy janhoy merged commit a0c8b8a into apache:main Oct 20, 2025
3 checks passed
@janhoy janhoy deleted the SOLR-17960-Remove-tika-langid branch October 20, 2025 14:43
janhoy added a commit that referenced this pull request Oct 20, 2025
janhoy added a commit that referenced this pull request Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:index configs dependencies Dependency upgrades documentation Improvements or additions to documentation module:langid tests tool:build

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants