A bash pipeline for de novo assembly of viral genomes generated via Illumina NGS. Currently handles the following viruses: HIV-1, RSV, RRV and HMPV.
Current version: V1
Table of contents
A conda package manager like Miniconda3.
- Download the initial environment installation file
wget https://raw.githubusercontent.com/jsede/virus_assembly/main/scripts/install_env.sh
- Run the script in the terminal
bash ./install_env.sh
- Check if installation worked
conda activate virus_assembly
Pipeline: NGS pipeline for viral assembly.
usage: virus_assembly [-h -v -p -q] (-i dir -m value -t value )
(-s string)
with:
-h Show help text
-v Version of the pipeline
-n Name of RUN.
-i Input directory
-s Viral species
-c Perform clipping of primers
-q Perform quality check using fastQC
-m Memory
-t Number of threads
- Activate environment.
conda activate virus_assembly
- Head to the directory where you will perform the analysis.
- Place the raw fastq.gz files in a directory called 1_reads.
- Create a list holding the sample names from you sequencing files called IDs.list and place it in the main directory.
- Start the pipeline using the following command
virus_assembly -i path/to/main/directory -s VIRUS
- When the pipeline has finished, 4 additional folders will have been created:
- 2_ref_map: Includes a bam file of the trimmed reads against the viral reference genome and a pdf with the qualimap results.
- 3_contigs: Includes the de novo assembled contigs by megahit for each sample.
- 4_filter: Includes the high converage contigs generated by megahit (_ *hicov.fasta), filtered and reorientated against the viral reference genome (_ *reoriented.fa)
- 5_remap: Includes both a fasta file holding the viral contigs for that sample and a bam file of the trimmed reads agains those contigs.
- HIV: K03455.1
- RSV: MH760627; MH760652
- RRV: RRV_ref (Accession pending)
- HMPV: HMPV205; HMPV218 (Accessions pending)