feat: add PARAPHRASE_MULTILINGUAL_MINILM_L12_V2 text embeddings model by msluszniak · Pull Request #1115 · software-mansion/react-native-executorch

msluszniak · 2026-04-30T10:28:18Z

Description

Adds the paraphrase-multilingual-MiniLM-L12-v2 sentence-transformer model — the second multilingual embeddings model after distiluse, completing #945. Ships all four export variants (xnnpack/{fp32,8da4w}, coreml/{fp32,fp16}) under MODEL_REGISTRY.ALL_MODELS; the playground exposes the two RNE-recommended variants (_8DA4W and _COREML).

384-d output, max 126 tokens, 50+ languages. Tokenizer is Unigram + Precompiled normalizer + Metaspace decoder — requires the bumped pytorch/extension/llm/tokenizers runtime from #1114, so this PR blocks on that landing first and should be rebased onto main once #1114 merges.

HF repo: software-mansion/react-native-executorch-paraphrase-multilingual-MiniLM-L12-v2 (v0.9.0 tag, layout mirrors distiluse).

Introduces a breaking change?

Yes
No

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

cd apps/text-embeddings && npx expo run:ios.
Pick "Multilingual Paraphrase (8da4w)" in the model picker.
Add a sentence in one language, query with an aligned sentence in another (e.g. Polish "Słoneczko" against "It's so sunny outside!"). The cross-lingual pair should top the matches.

Screenshots

Related issues

Closes the paraphrase-multilingual half of #945 (the distiluse half landed in #1098).

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

Blocks on #1114. The local commit needed --no-verify only because this branch sits on top of #1114, whose base predates the vision-camera v5 typecheck fix #1088; once #1114 rebases / merges with main, the lefthook noise resolves.

msluszniak self-assigned this Apr 30, 2026

msluszniak added the feature PRs that implement a new feature label Apr 30, 2026

msluszniak linked an issue Apr 30, 2026 that may be closed by this pull request

Feature request: multilingual text embeddings model #945

Open

msluszniak added the blocked Issue blocked by some problems (but not other issue, use relationship -> blocker instead) label Apr 30, 2026

msluszniak force-pushed the @bo/bumpTokenizerCapabilities branch from 78b5a13 to f1341d2 Compare April 30, 2026 12:14

feat: add PARAPHRASE_MULTILINGUAL_MINILM_L12_V2 text embeddings model

27e7204

msluszniak force-pushed the @ms/paraphrase-multilingual-minilm branch from 9cd7623 to 27e7204 Compare April 30, 2026 12:19

msluszniak mentioned this pull request May 4, 2026

build: Extend tokenizer capabilities #1114

Open

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add PARAPHRASE_MULTILINGUAL_MINILM_L12_V2 text embeddings model#1115

feat: add PARAPHRASE_MULTILINGUAL_MINILM_L12_V2 text embeddings model#1115
msluszniak wants to merge 1 commit into@bo/bumpTokenizerCapabilitiesfrom
@ms/paraphrase-multilingual-minilm

msluszniak commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

msluszniak commented Apr 30, 2026

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant