Skip to content

Feature request: multilingual text embeddings model #945

@ErionTp

Description

@ErionTp

Summary

Currently all shipped text embedding models (ALL_MINILM_L6_V2, ALL_MPNET_BASE_V2, MULTI_QA_MINILM_L6_COS_V1, MULTI_QA_MPNET_BASE_DOT_V1) are English-only.

For apps that support multiple languages, this limits the usefulness of semantic search and similarity features. In our case, we use useTextEmbeddings to match user-typed label names to icons — but it only works well when the user types in English.

Request

Ship a pre-exported multilingual text embeddings model, such as:

This would allow useTextEmbeddings to work with non-English input out of the box, similar to how the English models work today:

import { useTextEmbeddings, MULTILINGUAL_MINILM_L12_V2 } from 'react-native-executorch';

const embeddings = useTextEmbeddings({ model: MULTILINGUAL_MINILM_L12_V2 });

Context

  • The current MULTI_QA_MINILM_L6_COS_V1 works great for English
  • Non-English queries produce poor embeddings since the model was only trained on English data
  • Many React Native apps are multilingual by nature (we support English, Italian, and Albanian)
  • The useTextEmbeddings API wouldn't need to change — just a new model constant

Thanks for the great library!

Metadata

Metadata

Assignees

Labels

feature requestmodelIssues related to exporting, improving, fixing ML models

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions