Skip to content

1morello/kaz-morph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kaz-morph

Rule-based morphological analyzer for Kazakh: written in Rust, with bindings for Python and WASM.

The problem

Kazakh is agglutinative. Grammar is encoded as chains of suffixes on a single root:

жүректерімізде
  жүрек  +  тер  +  іміз  +  де
  heart     PL      our      in
  -> "in our hearts"

For a computer, жүрек and жүректерімізде are unrelated strings. A spellchecker can't tell whether a five-suffix chain follows vowel harmony or breaks it. On top of that, an NLP pipeline has no idea that барды, бармады, and барғандар are all forms of бару.

Without structural understanding of words, every downstream tool for Kazakh is either broken or faking it.

kaz-morph is here to fix this.

Usage

use kaz_morph::Analyzer;

let a = Analyzer::new();
let r = a.analyze("жүректерімізде");
// → lemma: "жүрек", pos: Noun, number: Plural,
//   possession: P1Pl, case: Locative

Status

Early development — setting up the foundation.

License

MIT

About

The first real morphological analyzer for the Kazakh language: open, fast, with Rust, Python, and WASM bindings.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages