lang-morph

Natural language text analyser. Created for fun, rather than for any serious purpose.

Currently works with Russian and Ukrainian languages.

Includes morphological analysis and basic named entity recognition for personal names.

Content

0_DataLoading.ipynb

Loads morphological dictionaries for Ukrainian(https://github.com/LinguisticAndInformationSystems/mphdict) and Russian(http://odict.ru/, now commercialized).
Generates all wordforms present in dicts and saves them in simplier less optimized format.

1_PipelineOverview.ipynb

Describes text tokenization, representing it as a DAG of entities, analysing this DAG with basic analysers: Spacing, Punctuation, Numbers.

2_Words.ipynb

Postprocessing wordforms generated by 1_PipelineOverview.ipynb and building WordAnalyser to match words in the DAG.
Explores possibility to enrich DAG and facilitate wordforms matching by NormalizeAnalyzer.

3_Names.ipynb

Describes Named Entity Recognition on the DAG via PersonNameAnalyser to match different variations of full name, as well as surname with initials.
Shows small sample of names matched from real-world data (messages in public social media groups).

*.py files

Implementation of features described in Jupyter notebooks.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
0_DataLoading.ipynb		0_DataLoading.ipynb
1_PipelineOverview.ipynb		1_PipelineOverview.ipynb
2_Words.ipynb		2_Words.ipynb
3_Names.ipynb		3_Names.ipynb
README.md		README.md
analyser.py		analyser.py
dag.py		dag.py
tokenization.py		tokenization.py
word_analyser.py		word_analyser.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lang-morph

Content

0_DataLoading.ipynb

1_PipelineOverview.ipynb

2_Words.ipynb

3_Names.ipynb

*.py files

About

Uh oh!

Releases

Packages

Languages

Alex314/lang-morph

Folders and files

Latest commit

History

Repository files navigation

lang-morph

Content

0_DataLoading.ipynb

1_PipelineOverview.ipynb

2_Words.ipynb

3_Names.ipynb

*.py files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages