Tiny Language Detector, simply detect the language of a unicode UTF-8 text:
- pure javascript, no api call, and no dependency (node and browser compatible)
- alternative to libraries like CLD
- blazing fast and low memory footprint (unlike ML methods)
- support 62 languages (30 for the web version)
- format ISO 639-1
- Playground - Try the library
- Getting Started
- API
- CLI
- TinyLD Web
- Supported Language
- Algorithm
- Developer
yarn add tinyld # or npm install --save tinyldimport { detect, detectAll } from 'tinyld'
// Detect
detect('これは日本語です.') // ja
detect('and this is english.') // en
// DetectAll
detectAll('ceci est un text en francais.')
// [ { lang: 'fr', accuracy: 0.5238 }, { lang: 'ro', accuracy: 0.3802 }, ... ]tinyld This is the text that I want to check
# [ { lang: 'en', accuracy: 1 } ]Benchmark done on tatoeba dataset (~9M sentences) on 16 of the most common languages.
| Library | Script | Properly Identified | Improperly identified | Not identified | Avg Execution Time | Disk Size |
|---|---|---|---|---|---|---|
| TinyLD | yarn bench:tinyld |
96.1747% | 2.6938% | 1.1315% | 0.1315ms. | 778KB |
| TinyLD Web | yarn bench:tinyld-light |
92.1169% | 3.9536% | 3.9295% | 0.0616ms. | 89KB |
| node-cld | yarn bench:cld |
88.9148% | 1.7489% | 9.3363% | 0.0612ms. | > 10MB |
| node-lingua | yarn bench:lingua |
82.3157% | 0.2158% | 17.4685% | 0.7085ms. | ~100MB |
| franc | yarn bench:franc |
68.7783% | 26.3432% | 4.8785% | 0.1381ms. | 267KB |
| franc-min | yarn bench:franc-min |
65.5163% | 23.5794% | 10.9044% | 0.0614ms. | 119KB |
| languagedetect | yarn bench:languagedetect |
61.6068% | 12.295% | 26.0982% | 0.1585ms. | 240KB |
- For each category, top3 results are in Bold
- Language evaluated in this benchmark:
- Asia:
jpn,cmn,kor,hin - Europe:
fra,spa,por,ita,nld,eng,deu,fin,rus - Middle east: ,
tur,heb,ara
- Asia:
- This kind of benchmark is not perfect and % can vary over time, but it gives a good idea of overall performances
- For NodeJS:
TinyLDornode-cld(fast and accurate) - For Browser:
TinyLD Lightorfranc-min(small, decent accuracy, franc is less accurate but support more languages)
node-linguais just too big and slowlanguagedetectis light but just not accurate enough, really focused on indo-european languages (support kazakh but not chinese, korean or japanese)
