HMM_POS_Tagger

Implementation of a part-of-speech tagger using the Viterbi algorithm

Data

Trained on tagged Wall Street Journal corpus (WSJ_02-21.pos) and achieves 94.5% accuracy on the development corpus (WSJ_24.words)

Example

input file needs to contain one word per line e.g. test_input.words
output file will contain a tab-separated word and POS tag per line e.g. test_output.pos
If a truth file is provided, an accuracy score will be printed

Tag an input file

python run_hmm.py -i test_input.words -o output/test_output.pos

Tag an input file and get an accuracy score

python run_hmm.py -i WSJ_POS_CORPUS_FOR_STUDENTS/WSJ_24.words -o output/WSJ_24.pos -t WSJ_POS_CORPUS_FOR_STUDENTS/WSJ_24.pos

References

Ch 8.4 in Speech and Language Processing by Jurafsky and Martin discusses the components of an HMM tagger and the Viterbi algorithm: https://web.stanford.edu/~jurafsky/slp3/8.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
WSJ_POS_CORPUS_FOR_STUDENTS		WSJ_POS_CORPUS_FOR_STUDENTS
output		output
README.md		README.md
requirements.txt		requirements.txt
run_hmm.py		run_hmm.py
scoring.py		scoring.py
test_input.words		test_input.words
train_hmm.py		train_hmm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HMM_POS_Tagger

Data

Example

Tag an input file

Tag an input file and get an accuracy score

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HMM_POS_Tagger

Data

Example

Tag an input file

Tag an input file and get an accuracy score

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages