CEMR-LM: A domain adaptive pre-training language model for sentence classification of Chinese electronic medical record
CEMR-LM is a classification model for Chinese EMR data. It leverages a clinical domain adaptive pre-trained language Model architecture along with convolutional layers and attention mechanisms to achieve accurate classification results. This repository provides the implementation of the model along with training, testing, and evaluation scripts.
CEMR-LM is designed for classifying Chinese EMR data into multiple categories. It incorporates a clinical domain adaptive pre-trained model for text encoding, followed by convolutional layers for feature extraction and an attention mechanism to capture important information. The model's performance is evaluated using various metrics.
To run this code, you need the following dependencies:
- Python 3.7+
- PyTorch 1.6+
pytorch_pretrained_bertscikit-learntqdm
You can install the required packages using the following command:
pip install torch torchvision pytorch_pretrained_bert scikit-learn tqdm
Before using the model, you need to configure the parameters. Modify the Config class in CEMR-LM.py to set the necessary parameters for your dataset and training preferences.
To train the model, execute the following command:
python train.py --model CEMR-LM
This will start the training process using the specified model (in this case, CEMR-LM). The training progress will be logged, and the best model checkpoint will be saved for later use.
This will load the best model checkpoint and evaluate it on the test dataset. The test accuracy, loss, precision, recall, and F1-score will be displayed.
The model's performance results are provided in the output logs and the test accuracy, loss, and classification report.