EC-Bench is a benchmark framework for evaluating enzyme annotation models that predict Enzyme Commission (EC) numbers from protein sequences. EC numbers describe the biochemical reactions enzymes catalyze, and predicting them accurately is essential for understanding protein function.
While many EC prediction methods already exist — including homology-based tools, deep learning models, contrastive learning techniques, and protein language models — there's been no consistent way to evaluate and compare their performance. EC-Bench fills this gap by offering a unified, open-source platform.
Enzyme Commission (EC) numbers are four-level hierarchical annotations that classify enzymes based on the chemical reactions they catalyze. Accurate EC prediction is essential for understanding enzyme functions, annotating protein sequences, and advancing functional genomics.
- Standardized Datasets: Pretraining, training, and testing datasets prepared from UniProtKB (Swiss-Prot and TrEMBL) and Price-149.
- Model Coverage: 10 representative models, spanning homology-based (e.g., BLASTp), deep learning, contrastive learning (e.g., CLEAN), and protein language models (e.g., EnzBert, ProteinBERT).
- Multiple Evaluation Tasks:
- Exact EC Number Prediction
- EC Number Completion
- Partial/Additional EC Number Recommendation
- Evaluation Metrics:
- Performance: Weighted F1, Precision, Recall
- Resource Usage: Memory, Runtime, Storage
- Consistency: Agreement Rate, Reaction Similarity
- Interoperable Framework: Easily add and evaluate new models.
Predict the full EC number at all levels (1-4). Performance is measured using:
- Precision = TP / (TP + FP)
- Recall = TP / (TP + FN)
- F1 Score (Weighted) = Takes class imbalance into account
Fill in missing parts of partial EC numbers. Since ground truth may be unavailable, we measure:
- Coverage: Fraction of proteins for which a complete EC number was generated.
- Agreement Rate: Percentage of completions where a majority of models agree.
Suggest novel or alternate EC numbers. Evaluated via:
- Reaction Similarity Score: Computed using RDKit and reactions from ECReact.
- Weighted/Average Similarity Score: Combines similarity with prediction coverage.
| Metric | Description |
|---|---|
| Precision | Accuracy of positive predictions |
| Recall | Coverage of relevant EC classes |
| F1 Score | Harmonic mean of precision and recall |
| Agreement Rate | Model consensus on EC completions |
| Reaction Similarity | Biochemical similarity of predicted vs. true EC reactions |
| Memory Usage (GiB) | GPU/CPU memory during training |
| Model Size (MiB) | Disk storage footprint |
| Runtime (s) | Time for inference and training |
| Dataset Type | Source | Version |
|---|---|---|
| Pretraining | TrEMBL | 2018-02 |
| Training | Swiss-Prot | 2018-02 |
| Test | Swiss-Prot | 2023-01 |
| Challenging Test | Price-149 | Manual |
The EC-Bench framework includes a standardized pipeline for preparing pretraining, training, and testing datasets to ensure fair evaluation across models. The figure below outlines the data preparation steps:
| Dataset Type | Source | Similarity Threshold | Description |
|---|---|---|---|
| Pretraining | TrEMBL 2018-02 | N/A | For large-scale unsupervised training |
| Training (100%) | Swiss-Prot 2018-02 | 100% | For models tested on high-similarity data |
| Training (30%) | Swiss-Prot 2018-02 | 30% | For models tested on generalization |
| Test | Swiss-Prot 2023-01 | 30%, 100% | Realistic and updated EC annotations |
| Price-149 | CLEAN paper | N/A | Manually curated difficult test cases |
| Model Name | Link | Year |
|---|---|---|
| BLASTp | LINK | 2008 |
| CatFam | LINK | 2008 |
| PRIAM | LINK | 2013 |
| DeepEC | LINK | 2018 |
| ECPred | LINK | 2018 |
| ProteinBERT | LINK | 2022 |
| ECRECer | LINK | 2022 |
| DeepECtransformer | LINK | 2023 |
| EnzBERT | LINK | 2023 |
| CLEAN | LINK | 2023 |
- Install conda and RDKit (for reaction similarity calculation).
- For RDKit, you can use conda to install it:
conda install -c conda-forge rdkit - Clone the repository
git clone https://github.com/dsaeedeh/EC-Bench.git
- Install each model by following the instructions in their respective README files.
- Run download_data.sh to download the data:
sbatch download_data.sh
or
./download_data.sh
- Run extract_coordinates.sh to extract the coordinates from the downloaded data:
sbatch extract_coordinates.sh
- Run data_preprocessing.sh; Removes duplicates and invalid sequences and non-enzyme sequences from the data
sbatch data_preprocessing.sh
- Run run_mmseqs2.sh to concat fasta files (pretrain.fasta, train.fasta, test.fasta, price.fasta, and ensemble.fasta (if existed)). Be sure no other .fasta files are existed in the data directory. We need "all.fasta" file to run mmseqs2 on it!
sbatch run_mmseqs2.sh
- Run create_data.sh to create the final data for training, testing, and ensemble; Output files are: train_ec.csv, test_ec.csv, ensemble_ec.csv for each similarity threshold.
sbatch create_data.sh
- Run go_creator.sh to create the GO terms for pretraining data; Output files are: pretrain_go_final.csv
sbatch go_creator.sh
After running the above steps, you can find the final data in the data directory for each similarity threshold. training data is train_ec.csv, test data is test_ec.csv, and ensemble data is ensemble_ec.csv.
Each model has a run_model.sh script that contains the commands to run the model. You can run each model by executing the respective script or you can run all_models.sh script to run all models. For example, to run all models, you can use:
sbatch all_models.sh
After running the models, you can find the results in the results directory. Each model will create a subdirectory with its name and the results will be saved in that directory for each similarity threshold.
We welcome contributions from the community! You can:
- 🧩 Add new models to the benchmark.
- 📊 Suggest or implement new evaluation metrics.
- 🛠 Improve preprocessing pipelines or dataset support.
- 📝 Report issues or suggest enhancements.
To contribute:
- Fork the repository.
- Open a pull request with your proposed changes.
- If you are adding a model, include clear instructions and dependencies.
Feel free to open an issue for discussion before submitting major changes.
If you use EC-Bench in your research, please cite the following:
@article{EC-Bench,
title={EC-Bench: A Benchmark for Enzyme Commission Number Prediction},
author={Saeedeh Davoudi et al.},
journal={Bioinformatics Advances; doi: https://doi.org/10.1093/bioadv/vbag004},
year={2026}
}For questions or feedback, please contact:
Saeedeh Davoudi
saeedeh.davoudi@ucdenver.edu
EC-Bench is distributed under the MIT License.

