Skip to content

Douerww/BIO-R-BERT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BIO-R-BERT

Relation Extraction task on DDI 2013 Bio dataset

  • Using BioBERT for pretrained model
  • Using R-BERT for relation extraction modeling

Model

Dataset

  • DDI (Drug-Drug Interaction) 2013 dataset (link)
    • Relation Extraction task on Bioinformatics
    • 175 MEDLINE abstracts and 730 DrugBank documents
    • 5 DDI types (Negative, Mechanism, Effect, Advice, Int)
    • Use the preprocessed dataset from this repo
    • Didn't replace the name of drug to DRUG0, DRUG1, or DRUGN, comparing to other researches

How to use BioBERT for Transformers library

>>> from transformers import BertModel, BertTokenizer
>>> model = BertModel.from_pretrained('monologg/biobert_v1.1_pubmed')
>>> tokenizer = BertTokenizer.from_pretrained('monologg/biobert_v1.1_pubmed')

Dependencies

  • python>=3.5
  • torch==1.1.0
  • transformers>=2.2.2
  • scikit-learn>=0.20.0
$ pip3 install -r requirements.txt

How to run

You must give --do_lower_case option if pretrained model is uncased model.

$ python3 main.py --do_train --do_eval

Results

F1 micro score on 4 Positive types (Mechanism, Effect, Advice, Int)

F1 micro (%)
CNN 69.75
AB-LSTM 69.39
MCCNN 70.21
GCNN 72.55
Recursive NN 73.50
RHCNN 75.48
SMGCN 76.64
BIO-R-BERT 82.66

Comparing the effect of pretrained language models

Using R-BERT architecture, with different pretrained weights

F1 Micro (%)
Random Init 47.04
bert-base-cased 80.62
scibert-scivocab-uncased 81.30
biobert_v1.0_pubmed_pmc 82.30
biobert_v1.1_pubmed 82.66

References

About

R-BERT on DDI Bio dataset with BioBERT

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%