feat: add DISTILUSE_BASE_MULTILINGUAL_CASED_V2 text embeddings model by msluszniak · Pull Request #1098 · software-mansion/react-native-executorch

msluszniak · 2026-04-24T11:12:53Z

Description

Addresses the multilingual half of #945 by shipping the first multilingual text embeddings model, distiluse-base-multilingual-cased-v2, covering 50+ languages at 512 embedding dims.

The paraphrase-multilingual-MiniLM-L12-v2 model from the same issue is deferred — its tokenizer pipeline is Unigram + Precompiled normalizer + Metaspace decoder, and executorch/extension/llm/tokenizers (the C++ lib RNE links) only supports BPE + WordPiece + BertNormalizer. Unigram support is in flight upstream; we'll ship paraphrase-multilingual in a follow-up once the runtime picks it up.

What the diff does:

modelUrls.ts — adds DISTILUSE_BASE_MULTILINGUAL_CASED_V2 (XNNPACK fp32), _8DA4W (XNNPACK 8-bit dynamic-act / 4-bit weight via torchao), _COREML_FP32, and _COREML_FP16 pointing at the new HF repo under NEXT_VERSION_TAG (= resolve/v0.9.0), and registers all four in MODEL_REGISTRY.ALL_MODELS. Files are uploaded under xnnpack/ and coreml/ subfolders following the newer CLIP repo convention.
types/textEmbeddings.ts — extends TextEmbeddingsModelName with 'distiluse-base-multilingual-cased-v2'.
apps/text-embeddings/.../index.tsx — adds the model to the playground picker.
useTextEmbeddings.md — adds a row to the "Supported models" table.
.cspell-wordlist.txt — adds DISTILUSE, distiluse, Distil for the spell-check hook.

HF repo (live): software-mansion/react-native-executorch-distiluse-base-multilingual-cased-v2, main branch + v0.9.0 tag.

├── README.md
├── config.json
├── tokenizer.json
├── tokenizer_config.json
├── xnnpack/
│   ├── distiluse-base-multilingual-cased-v2_xnnpack_fp32.pte   (541 MB)
│   └── distiluse-base-multilingual-cased-v2_xnnpack_8da4w.pte  (393 MB)
└── coreml/
    ├── distiluse-base-multilingual-cased-v2_coreml_fp32.pte    (541 MB)
    └── distiluse-base-multilingual-cased-v2_coreml_fp16.pte    (271 MB)

Introduces a breaking change?

Yes
No

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

1. Run the text-embeddings playground: cd apps/text-embeddings && yarn ios (or yarn android).
2. In the model picker, select "Multilingual DistilUSE".
3. Enter any sentence; forward should return a 512-dim Float32Array in ~35 ms.
4. Cross-lingual retrieval is the model's strength — try indexing a Polish sentence and querying with an English equivalent (or vice versa); top match should be the aligned pair.
5. Short single-word non-English queries over short targets are the weakest case (inherent to the model, not an export issue) — use ≥ 1-sentence inputs for best results.

Screenshots

Related issues

Closes the multilingual-BERT half of #945. The paraphrase-multilingual-MiniLM-L12-v2 half stays open pending Unigram/Precompiled/Metaspace support in ExecuTorch's tokenizer lib.

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

The exporter wrapper passes attention_mask=None into the underlying transformer even though the RNE runtime always supplies an all-ones mask. This is deliberate: HF's create_bidirectional_mask / masking_utils otherwise emits a chain of where / eq / any / logical_not / expand_copy ops on the mask that XNNPACK can't delegate, costing ~10 ms per forward with no observable effect on the output. With the bypass the .pte is bit-exact with eager PyTorch (RMSE 0.0 on fp32 random input) and XNNPACK delegation stays around 89–91% of graph runtime.

The concrete non-delegated ops that remain (LayerNorm, the residual mask prep inside HF, the explicit expand in mean pooling) are all inside upstream code paths — pushing past ~91% would need either a StaticLayerNormalization match in XNNPACK's partitioner or surgery on HF's mask utils. Out of scope here; worth flagging for a future iteration.

Export of fp16 for XNNPack succeeded but produced NaNs, so not useful.

The exporter script, profiling setup, and the full write-up of the above live in the internal export-scripts repo.

Addresses the multilingual half of #945. Shipping only the WordPiece tokenizer model for now — paraphrase-multilingual-MiniLM-L12-v2 needs Unigram/Precompiled/Metaspace support in executorch/extension/llm/ tokenizers, which is in-flight upstream. The model lives at software-mansion/react-native-executorch-distiluse-base-multilingual-cased-v2 under tag v0.9.0, so the constant uses NEXT_VERSION_TAG. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chmjkb

I haven't verified the demo apps, but the code looks good. Any idea on how the performance compares to other embedding models? 500mb seems like a lot

msluszniak · 2026-04-24T13:59:47Z

I haven't verified the demo apps, but the code looks good. Any idea on how the performance compares to other embedding models? 500mb seems like a lot

To be fairly honest more than a half of our text embedding models currently are the same other of magnitude. Look: https://huggingface.co/collections/software-mansion/text-embeddings

chmjkb · 2026-04-24T14:01:38Z

I haven't verified the demo apps, but the code looks good. Any idea on how the performance compares to other embedding models? 500mb seems like a lot

To be fairly honest more than a half of our text embedding models currently are the same other of magnitude. Look: https://huggingface.co/collections/software-mansion/text-embeddings

ok, I was biased by the minilm, which is super small, fine

Follows the same scheme-suffix convention used for LLaMA (`_QLORA`, `_SPINQUANT`) — each variant has its own constant so the caller picks exactly the quantization / backend combo they want: DISTILUSE_BASE_MULTILINGUAL_CASED_V2 xnnpack fp32 (baseline) DISTILUSE_BASE_MULTILINGUAL_CASED_V2_8DA4W xnnpack 8da4w DISTILUSE_BASE_MULTILINGUAL_CASED_V2_COREML_FP32 coreml fp32 (iOS/macOS) DISTILUSE_BASE_MULTILINGUAL_CASED_V2_COREML_FP16 coreml fp16 (iOS/macOS) All four point at the same HF repo tag v0.9.0; tokenizer.json is shared. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…o benchmarks Size data belongs next to the other model sizes, not inline in the hook reference. The useTextEmbeddings page now only lists the model family (one row) and leaves variant enumeration to the API reference + the model-size benchmark table. Model column in model-size.md is renamed from "XNNPACK [MB]" to just "Size [MB]" since the table now mixes XNNPACK and CoreML rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds inference-time and memory-usage rows for all four distiluse-base-multilingual-cased-v2 variants (XNNPACK fp32, XNNPACK 8da4w, Core ML fp32, Core ML fp16). Captured on a OnePlus 12 (Android, debug build) and iPhone 17 Pro (iOS, debug build) with a fixed ~80-token sentence over 100 measured forwards, JS-side wall-clock around model.forward(). Memory column reports peak resident-set delta vs the pre-model-load baseline, sampled with adb dumpsys meminfo on Android and Xcode's Debug Navigator on iOS. Also normalizes the text-embeddings table headers to match the Classification section convention: column header drops the "(XNNPACK)" suffix and the backend now lives in the per-row label, which lets multi-backend models (fp32 / 8da4w / Core ML) share a single table without an artificial column split. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

NorbertKlockiewicz

I think we should ship just 1 version of the model per backend in modelUrls

Renames `DISTILUSE_BASE_MULTILINGUAL_CASED_V2_COREML_FP32` → `DISTILUSE_BASE_MULTILINGUAL_CASED_V2_COREML` to match the existing convention where the default-precision XNNPACK variant has no precision suffix. The fp16 variant keeps its suffix since it's non-default. Per review feedback on #1098.

msluszniak · 2026-04-28T12:20:58Z

Tested all 4 variants off-device via the executorch Python runtime on Tatoeba bitext-mining (eng↔X for X ∈ {pol, deu, fra, spa, rus, jpn}, 1000 pairs each, 6000 total).

All four land within 0.2 pp R@1 / 0.1 pp R@10 across every language pair. CoreML fp32 is bit-exact with XNNPACK fp32 (cosine 1.0). CoreML fp16 differs only at the 5th decimal. XNNPACK 8da4w drifts ~1% in cosine but retrieval is unaffected (sometimes +0.1 pp R@1, sometimes −0.2 pp — quantization noise either way).

Proposing to drop 2:

_V2 (XNNPACK fp32) — Pareto-dominated on iPhone by _COREML (15 vs 47 ms, 55 vs 175 MB) and on Android by _8DA4W (15 vs 41 ms, 44 vs 196 MB). The "Android accuracy fallback" rationale doesn't survive the data.
_COREML_FP16 — quality identical to _COREML fp32, but per the docs benchmarks it's slower on iPhone (19 vs 15 ms) and uses more memory (143 vs 55 MB). Only edge is download size (271 vs 541 MB). That memory delta is surprising — worth one re-measure (cold vs warm) before final delete; if fp16 actually wins on memory, the call flips.

Naming: with no bare _V2 shipped, both surviving variants carry explicit suffixes (_8DA4W, _COREML) — slightly inconsistent with other text-embedding models where bare = default XNNPACK fp32. I'd keep the explicit suffixes (clear > convention-pure). Alternative: rename _8DA4W → bare _V2 since it becomes the canonical Android variant — preserves the convention but redefines "default" as quantized for this one model.

Tatoeba bitext-mining (eng↔X for X ∈ {pol, deu, fra, spa, rus, jpn}, 1000 pairs each) shows all 4 variants land within 0.2 pp R@1 / 0.1 pp R@10 of each other. CoreML fp32 is bit-exact with XNNPACK fp32; CoreML fp16 differs at the 5th decimal; 8da4w drifts ~1% cosine but retrieval is unaffected. Drop: - XNNPACK fp32 (bare _V2): Pareto-dominated on iPhone by COREML and on Android by 8DA4W (speed and memory). No retained quality benefit. - COREML_FP16: identical retrieval quality to COREML fp32 but slower on iPhone (19 vs 15 ms) and uses more memory (143 vs 55 MB). Ships as _8DA4W (Android) and _COREML (iOS) only.

msluszniak self-assigned this Apr 24, 2026

msluszniak added the feature PRs that implement a new feature label Apr 24, 2026

msluszniak requested a review from IgorSwat April 24, 2026 13:32

chmjkb reviewed Apr 24, 2026

View reviewed changes

msluszniak and others added 2 commits April 24, 2026 16:38

This comment was marked as resolved.

Sign in to view

NorbertKlockiewicz reviewed Apr 27, 2026

View reviewed changes

Comment thread packages/react-native-executorch/src/constants/modelUrls.ts Outdated

Comment thread packages/react-native-executorch/src/types/textEmbeddings.ts Outdated

msluszniak requested review from NorbertKlockiewicz and chmjkb April 28, 2026 12:30

NorbertKlockiewicz approved these changes Apr 28, 2026

View reviewed changes

msluszniak merged commit 04852be into main Apr 28, 2026
5 checks passed

msluszniak deleted the feat/multilingual-text-embeddings branch April 28, 2026 13:12

msluszniak mentioned this pull request Apr 30, 2026

feat: add PARAPHRASE_MULTILINGUAL_MINILM_L12_V2 text embeddings model #1115

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add DISTILUSE_BASE_MULTILINGUAL_CASED_V2 text embeddings model#1098

feat: add DISTILUSE_BASE_MULTILINGUAL_CASED_V2 text embeddings model#1098
msluszniak merged 6 commits intomainfrom
feat/multilingual-text-embeddings

msluszniak commented Apr 24, 2026 •

edited

Loading

Uh oh!

chmjkb left a comment

Uh oh!

msluszniak commented Apr 24, 2026

Uh oh!

chmjkb commented Apr 24, 2026

Uh oh!

This comment was marked as resolved.

NorbertKlockiewicz left a comment

Uh oh!

Uh oh!

Uh oh!

msluszniak commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

msluszniak commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

chmjkb left a comment

Choose a reason for hiding this comment

Uh oh!

msluszniak commented Apr 24, 2026

Uh oh!

chmjkb commented Apr 24, 2026

Uh oh!

This comment was marked as resolved.

NorbertKlockiewicz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

msluszniak commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

msluszniak commented Apr 24, 2026 •

edited

Loading