Skip to content

refactor!: vendor the tokenizer stack into lance#6512

Merged
Xuanwo merged 6 commits intomainfrom
xuanwo/lance-tokenizer
Apr 15, 2026
Merged

refactor!: vendor the tokenizer stack into lance#6512
Xuanwo merged 6 commits intomainfrom
xuanwo/lance-tokenizer

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Apr 14, 2026

This PR vendors the tokenizer stack Lance actually uses into a new rust/lance-tokenizer crate and rewires FTS and inverted-index code to depend on it instead of tantivy and lindera-tantivy. It keeps the existing document and query tokenization semantics in-tree, renames the old FTS document adapter module to document_tokenizer, and preserves upstream license headers on vendored code.

@Xuanwo Xuanwo changed the title refactor: vendor the tokenizer stack into lance refactor!: vendor the tokenizer stack into lance Apr 14, 2026
@Xuanwo Xuanwo marked this pull request as ready for review April 14, 2026 12:23
Copy link
Copy Markdown
Contributor

@BubbleCal BubbleCal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

@Xuanwo Xuanwo merged commit 65ac541 into main Apr 15, 2026
27 checks passed
@Xuanwo Xuanwo deleted the xuanwo/lance-tokenizer branch April 15, 2026 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants