Skip to content

Alex314/lang-morph

Repository files navigation

lang-morph

Natural language text analyser. Created for fun, rather than for any serious purpose.

Currently works with Russian and Ukrainian languages.

Includes morphological analysis and basic named entity recognition for personal names.

Content

Loads morphological dictionaries for Ukrainian(https://github.com/LinguisticAndInformationSystems/mphdict) and Russian(http://odict.ru/, now commercialized).
Generates all wordforms present in dicts and saves them in simplier less optimized format.

Describes text tokenization, representing it as a DAG of entities, analysing this DAG with basic analysers: Spacing, Punctuation, Numbers.

Postprocessing wordforms generated by 1_PipelineOverview.ipynb and building WordAnalyser to match words in the DAG.
Explores possibility to enrich DAG and facilitate wordforms matching by NormalizeAnalyzer.

Describes Named Entity Recognition on the DAG via PersonNameAnalyser to match different variations of full name, as well as surname with initials.
Shows small sample of names matched from real-world data (messages in public social media groups).

*.py files

Implementation of features described in Jupyter notebooks.

About

Natural language text analyzer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published