Natural language text analyser. Created for fun, rather than for any serious purpose.
Currently works with Russian and Ukrainian languages.
Includes morphological analysis and basic named entity recognition for personal names.
Loads morphological dictionaries for Ukrainian(https://github.com/LinguisticAndInformationSystems/mphdict) and Russian(http://odict.ru/, now commercialized).
Generates all wordforms present in dicts and saves them in simplier less optimized format.
Describes text tokenization, representing it as a DAG of entities, analysing this DAG with basic analysers: Spacing, Punctuation, Numbers.
Postprocessing wordforms generated by 1_PipelineOverview.ipynb and building WordAnalyser to match words in the DAG.
Explores possibility to enrich DAG and facilitate wordforms matching by NormalizeAnalyzer.
Describes Named Entity Recognition on the DAG via PersonNameAnalyser to match different variations of full name, as well as surname with initials.
Shows small sample of names matched from real-world data (messages in public social media groups).
Implementation of features described in Jupyter notebooks.