Skip to content

build: Extend tokenizer capabilities#1114

Open
benITo47 wants to merge 1 commit intomainfrom
@bo/bumpTokenizerCapabilities
Open

build: Extend tokenizer capabilities#1114
benITo47 wants to merge 1 commit intomainfrom
@bo/bumpTokenizerCapabilities

Conversation

@benITo47
Copy link
Copy Markdown
Contributor

@benITo47 benITo47 commented Apr 29, 2026

Description

This PR introduces rebuilt binaries that contain new, updated tokenizers.
This iteration features support for more tokenisation models (i.e. unigram, worldlevel) as well as bunch of previously unsupported pre-tokenisers, decoders, post-processors.

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

Before merging, test all demo applications. See if all models that proved problematic during bumps in the past are working (i.e. kokoro, multi-method models)
Check all LLM models, see if output is working.

  • LLM app on iOS
  • LLM app on Android
  • Speech app on iOS
  • Speech app on Android
  • Text Embeddings on iOS
  • Text Embeddings on Android

Screenshots

Related issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

@msluszniak msluszniak force-pushed the @bo/bumpTokenizerCapabilities branch from 78b5a13 to f1341d2 Compare April 30, 2026 12:14
Copy link
Copy Markdown
Member

@msluszniak msluszniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, tested all Android demo apps and they worked. Unfortunately, I don't have any iOS with me. We need someone to test iOS as well and then, I think we are ready to ship it.

@chmjkb
Copy link
Copy Markdown
Collaborator

chmjkb commented May 4, 2026

Ok, tested all Android demo apps and they worked. Unfortunately, I don't have any iOS with me. We need someone to test iOS as well and then, I think we are ready to ship it.

ill take a look tomorrow

@chmjkb
Copy link
Copy Markdown
Collaborator

chmjkb commented May 4, 2026

is there any particular tokenizer this should be tested with?

@msluszniak
Copy link
Copy Markdown
Member

msluszniak commented May 4, 2026

is there any particular tokenizer this should be tested with?

Yes, unigram. You can test it by running model from this PR: #1115

Copy link
Copy Markdown
Member

@msluszniak msluszniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore PRs that are chores

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants