Skip to content

davidlevybooth/Praxis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PRAXIS: PRokaryotic Activity and eXpression Informatics System

End-to-end de novo transcriptomic pipeline v0.2

Authors: David Levy-Booth and Parker Lloyd

This snakemake pipeline aids in downloading and processing of genomic and transcriptomic sequences, transcriptomic assembly and quantification using modularized, benchmarked tools.

Steps to Initialize:

Download or clone repository:

git clone https://github.com/davidlevybooth/Praxis.git

Change to the Praxis directory:

cd Praxis

Create the conda environment:

conda env create --name Praxis --file envs/environment.yaml

Activate conda environment:

conda activate Praxis

Install Praxis:

python setup.py install

Run pipeline with desired options:

Praxis [ -d featureCounts -a bbmap -q bbduk -s trinity … ]

Pipeline Options:

-d    --method    The tool that will be used to generate the count table of expressed genes. The available tools are featureCounts, htseq, and salmon

-a    --aligner    The tool that will be used to align the RNASeq reads to the reference genome/transcriptome

-q    --trimmer    The tool that will trim contaminant sequences and adapter sequences off the RNASeq reads

-s    --assembler    The tool that will assemble a reference transcriptome if there is no genome URL provided

-t    --threads    The number of threads that the pipeline is allowed to use.

-m    --max_memory    The amount of memory provided to the assembler. Enter in byte format

-u    --genome_url        The NCBI url from which the reference genome can be downloaded

-f    --genome_file        The file containing the reference genome

-j    --jobs        The number of jobs. Analogous to snakemake -j#

-r     --read_dir        The directory in which fastq reads are stored

-o    --outdir        The directory for the final outputs

-c    --configfile         The yaml file containing all of the above configuration details

Declaring a reference:

There are 3 ways to provide a reference to the pipeline. The path to a genome file can be provided using the corresponding flag (-f), the NCBI URL can be provided for the pipeline to download the genome (-u), or the pipeline can use the provided RNASeq reads located in the directory transcriptome/reads/untrimmed to assemble a reference transcriptome.

If the path to a user provided genome file is specified, it will always be used as the reference, regardless of any other options available. To download a reference from NCBI, remove the path to the genome file and specify the NCBI URL (Praxis -f -u [ncbi_url]). To assemble a reference transcriptome, make sure both the genome file and NCBI URL are not specified, and make sure the desired assembler is chosen. User Provided RNASeq Reads:

The columns in the samples.tsv file must be updated to include the file names for each of the forward and reverse reads.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors