Skip to content

rnasys/TranslationAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

TranslationAI: An advanced deep learning framework for precise identification of translation initiation and termination sites within mature mRNA sequences

This package predicts translation initiation and termination sites with a given mature mRNA sequence. As an application, it can predict effect of genetic variants or RNA editing on translation product.

License

TranslationAI source code is provided under the GPLv3 license. The trained models used by TranslationAI (located in this package at translationai/models) are provided under the CC BY NC 4.0 license for academic and non-commercial use.

Installation

TranslationAI can be installed from the github repository:

git clone https://github.com/rnasys/TranslationAI.git
cd TranslationAI
python setup.py install

TranslationAI requires tensorflow>=1.2.0, which is best installed separately via pip or conda (see the TensorFlow website for other installation options):

pip install tensorflow
# or
conda install tensorflow

Usage

TranslationAI can be run from the command line (e.g.):

translationai -I translationai/examples/query_seq.fa -t 0.5,0.5

Required parameters:

  • -I: Input .fa with query sequence(s).

  • -t: Threshold k values for TIS/TTS prediction. If k>1, output the top-k TISs and TTSs in each sequence; if k<1, output the score>=k TISs and TTSs. e.g. 0.5,0.5

Examples

A sample input file and the corresponding output file can be found at examples/query_seq.fa and examples/query_seq_predTIS_0.1.txt & examples/query_seq_predTTS_0.1.txt & examples/query_seq_predORFs_0.1_0.1.txt respectively.

The header line format of input fasta file is: >chrN:int-int(+/-)(annotation)(int, int,). Example: >chr3:28283123-28361264(+)(CMC1)(241, 568,)

In the file for predicted TISs (e.g. examples/query_seq_predTIS_0.1.txt), each line represents a sequence and contains the following information: the sequence identifier, the predicted TIS position, the corresponding score. The TIS position and score are separated by a comma character. If multiple TISs were predicted for a sequence, they are ranked by the predicted scores and separated by a tab character. The format of each line is as follows: sequence_identifier TIS_position_1,TIS_score_1 TIS_position_2,TIS_score_2 ...

The file for predicted TTSs (e.g. examples/query_seq_predTTS_0.1.txt) is in the same format as the file for predicted TISs.

In the file for predicted ORFs (e.g. examples/query_seq_predORFs_0.1_0.1.txt), each line represents a sequence and contains the following information: the sequence identifier, the predicted TIS position, the predicted TTS position, the corresponding TIS score, the corresponding TTS score. The positions and scores are separated by a comma character. If multiple ORFs were predicted for a sequence, they are separated by a tab character. The format of each line is as follows: sequence_identifier TIS_position_1,TTS_position_1,TIS_score_1,TTS_score_1 TIS_position_2,TTS_position_2,TIS_score_2,TTS_score_2 ...

Contact

Xiaojuan Fan: xiaojuanfan05@gmail.com

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages