Gut Instincts

This repo contains the submission of the group Gut Instincts in the GutBrainIE CLEF 2025 challenge, part of the BioASQ CLEF Lab 2025. The challenge focuses on extracting structured information from biomedical abstracts related to the gut microbiota and its connections with Parkinson's disease and mental health. The goal is to develop Information Extraction (IE) systems to support experts in understanding the gut-brain interplay.

The challenge is divided into two main subtasks:

Named Entity Recognition (NER): Identifying and classifying specific text spans into predefined categories.
Relation Extraction (RE): Determining if a particular relationship between two categories holds.

Reproducibility

To reproduce our results, follow these steps:

Download the Data:

The official challenge data is not included in this repository. Download the data and place the data in the data/ directory, preserving the original folder structure.

Prepare the Environment

Follow the guide in Setup to create an environment, activate the envirnoment, and install all dependencies.

Preprocess and Prepare the Training Data

Choose an existing training configuration from the training_configs/ directory, or create a new one based on the template located at training_configs/_template.yaml. Then run the following command, replacing PATH_TO_TRAINING_CONFIG with the path to the chosen configuration file:

python src/preprocessing/create_datasets.py --config PATH_TO_TRAINING_CONFIG

Based on the settings in the training configuration, the preprocessing script will load the specified training datasets, apply corrections and cleaning steps, optionally remove HTML content, tokenize the data using the appropriate tokenizer for the specified model, and save the processed data in the appropriate location for subsequent training.

Train the Models

Once the datasets have been prepared, start the training process using the same training configuration. Run the following command, replacing PATH_TO_TRAINING_CONFIG with the path to the chosen configuration file:

python src/training/run_training.py --config PATH_TO_TRAINING_CONFIG

This script will load the preprocessed data, initialize the specified model architecture, and begin training according to the parameters defined in the training configuration file (such as the number of epochs, batch size, and learning rate schedule). Progress, metrics, and the best performing model (based on its F1_micro score) will be saved to the models/ directory.

Create predictions

Once the models have been trained, the inference process can be started to generate predictions.

NER inference: To run inference with the NER models, execute the following command, replacing PATH_TO_TRAINING_CONFIG with the path to the chosen configuration file:

python src/inference/ner_inference.py --config PATH_TO_TRAINING_CONFIG

The results will be saved to data_inference_results.

NER ensemble inference: To perform ensemble inference with NER models, run the command below, replacing PATH_TO_INFERENCE_CONFIG with the path to the NER inference configuration file. A template configuration can be found at inference_configs/_template_ner_ensemble_inference.yaml.

python src/inference/ner_ensemble_inference.py --config PATH_TO_INFERENCE_CONFIG

The results will be saved to data_inference_results.

RE inference: To perform inference with an RE model, use the script located at src/inference/re_inference.py.

For pipeline-based inference, where an RE model is applied to predictions generated by a NER model, use src/inference/pipeline.py. In this case, the following needs to be specified:

the path to the NER predictions,
the path to the data (from the data/Articles/json_format folder) used to create the NER predictions,
and the path to the folder containing the training configurations for the RE models to be used in the pipeline.

RE ensemble inference: To produce ensemble inference with RE models, run the command below, replacing PATH_TO_INFERENCE_CONFIG with the path to the RE inference configuration file. A template configuration can be found at inference_configs/_template_re_relation_ensemble_inference.yaml.

python src/inference/re_ensemble_inference.py --config PATH_TO_INFERENCE_CONFIG

Notes

All training was conducted on a computational cluster with GPU resources. Training on local machines may take significantly longer or may not be feasible depending on hardware.
If you encounter issues with missing packages, ensure your environment matches the versions specified in pyproject.toml.

Setup

Virtual Environment

It is recommended to use a virtual environment to avoid dependency conflicts.

Windows:

python -m venv env
env\Scripts\activate

Linux/MacOS:

python3 -m venv env
source env/bin/activate

To deactivate the environment:

deactivate

Installing Dependencies

Install the necessary dependencies as specified in pyproject.toml:

pip install -e .

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 886 Commits
Gut-Instincts_GutBrainIE_2025		Gut-Instincts_GutBrainIE_2025
bash_scripts		bash_scripts
data		data
data_inference_results_6.2.2_evaluated_on_dev		data_inference_results_6.2.2_evaluated_on_dev
data_inference_results_evaluated_on_test		data_inference_results_evaluated_on_test
data_inference_results_re_evaluated_on_dev_ee9_5		data_inference_results_re_evaluated_on_dev_ee9_5
data_inference_results_re_evaluated_on_test_ee9		data_inference_results_re_evaluated_on_test_ee9
data_inference_results_re_evaluated_on_test_ee9dev		data_inference_results_re_evaluated_on_test_ee9dev
inference_configs		inference_configs
src		src
tests		tests
training_configs		training_configs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gut Instincts

Table of Contents

Reproducibility

Notes

Setup

Virtual Environment

Installing Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gut Instincts

Table of Contents

Reproducibility

Notes

Setup

Virtual Environment

Installing Dependencies

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages