This repository provides models, source code, and data for BERGAMOT: Biomedical Entity Representation with Graph-Augmented Multi-Objective Transformer which will be presented at NAACL 2024.
The model supports all the languages available in UMLS version 2020AB: English, Spanish, French, Dutch, German, Finnish, Russian, Turkish, Korean, Chinese, Japanese, Thai, Portuguese, Italian, Swedish, Hungarian, Polish, Estonian, Croatian, Ukrainian, Greek, Danish, Hebrew.
Here is the poster of our paper presented at NAACL 2024:
To run zero-shot evaluation as described in our NAACL paper, you need to download the evaluation data. To run the evaluation, use the eval_bert_ranking script:
python Fair-Evaluation-BERT/eval_bert_ranking.py --model_dir "andorei/BERGAMOT-multilingual-GAT" \
--data_folder "data_medical_crossing/datasets/mantra/es/DISO-fair_exact_vocab" \
--vocab "data_medical_crossing/vocabs/mantra_es_dict_DISO.txt"Required libraries are listed in requirements.txt. We ran our experiments using Python 3.8.
GAT-BERGAMOT: https://huggingface.co/andorei/BERGAMOT-multilingual-GAT:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("andorei/BERGAMOT-multilingual-GAT")
model = AutoModel.from_pretrained("andorei/BERGAMOT-multilingual-GAT")
@inproceedings{sakhovskiy-et-al-2024-bergamot,
title = "Biomedical Entity Representation with Graph-Augmented Multi-Objective Transformer",
author = "Sakhovskiy, Andrey and Semenova, Natalia and Kadurin, Artur and Tutubalina, Elena",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2024",
month = jun,
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
}
