Skip to content

zhw-e8/LAMAR_baselines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LAMAR_baselines

We compared the performance of baseline methods with LAMAR for each downstream task.

Downstream tasks

Task Baseline Method
Predict splice site of pre-mRNA RNA-FM, RNAErnie, SpliceAI
Predict mRNA translation efficiency based on 5'UTR RNA-FM, RNAErnie, UTR-LM
Predict mRNA degradation rate based on 3' UTR RNA-FM, RNAErnie
Predict internal ribosome entry site (IRES) RNA-FM, RNAErnie

Deploy baseline methods

Environment

RNA-FM

RNA-FM is a foundation language model pretrained on non-coding RNAs, used for predicting RNA 3D structure (Nature Methods, 2024).
The github link is https://github.com/ml4bio/RNA-FM, from which we deployed the model.
We fine-tuned RNA-FM using the trainer of transformers, so we further installed the following packages:

transformers==4.36.2  
accelerate==0.26.1  
evaluate==0.4.1  
tokenizers==0.15.0  
datasets==2.18.0  

The tokenizer was developed for RNA-FM.

RNAErnie

RNAErnie is a foundation language model pretrained on non-coding RNAs, used for predicting RNA secondary structures and RNA-RNA interactions (Nature Machine Intelligence, 2024).
The github link is https://github.com/CatIIIIIIII/RNAErnie, from which we deployed the model (pytorch-version).
We fine-tuned RNAErnie using the trainer of transformers, and installed the same packages as fine-tuning RNA-FM.
The tokenizer was developed for RNAErnie.

UTR-LM

UTR-LM is a foundation language model pretrained on sequences and structures of 5' UTR, used for predicting translation efficiency of mRNA based on 5' UTR (Nature Machine Intelligence, 2024).
The script link is https://github.com/a96123155/UTR-LM, from which we deployed the model.
We fine-tuned UTR-LM using the trainer of transformers, and installed the same packages as fine-tuning RNA-FM.
The tokenizer was developed for UTR-LM.

SpliceAI

SpliceAI is a CNN model to predict splice site from pre-mRNA sequence (Cell, 2019).
The script link is https://github.com/Illumina/SpliceAI, from which we deployed the model.
We directly used the trained model to predict splice site.

About

Baseline methods compared to LAMAR

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors