TCMNER_datasets

TCM NER dataset used in the paper "MC-TCMNER_A Multi-Modal Fusion Model Combining Contrast Learning Method for Traditional Chinese Medicine NER"

Prescipts

Since there is currently a lack of publicly available large-scale TCMNER datasets, we collected and annotated a TCMNER dataset independently and used it to evaluate the model’s performance. We first used web scraping techniques to extract disease related information from the website, which includes brief descriptions of diseases, symptoms, causes, treatment methods, and others. For the extracted text content, we initially cleaned it by removing duplicate sentences and characters that are not Chinese. Four clinical experts from China (chief physicians) used the YEDDA annotation tool to individually annotate text for four types of information using BIO tagging. This dataset contains a total of 387,465 Chinese characters, with five types of entities: Symptoms, Causes, Herbs, Preparations(already prepared medicine), and Effects.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
1.jpg		1.jpg
README.md		README.md
test.txt		test.txt
train.txt		train.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TCMNER_datasets

Prescipts

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

TCMNER_datasets

Prescipts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages