Navigating common challenges in microbial ecology.
How to trim primer sequences from 16S rRNA gene reads generated by Illumina. We currently have code for V4 and V4-V5 16S rRNA gene regions.
1. filterAndTrim_bigData.R. At the filter and trim step, process groups of samples one at a time instead of all samples simultaneously. Saves time and computer power and crashes and headaches.
Concatenate FASTQ files with identical names. Its original purpose was to combine files from two sequencing runs (on full and nano Illumina flow cells) on the same samples.
Trim primers and sort reads according to their unique barcodes. Mothur has this ability, however it also merges paired-end reads in the process. This script's original purpose was to sort reads from mutiple isolate 16S rRNA genes, sequenced simultaneously, based on unique oligos on the 5' ends of primers. This will be updated so it will take a list of file names as input.
Download multiple files from NCBI Sequence Read Archive. Use when you're interested in runs that are named as a series of numbers, which is typical for BioProjects (e.g., runs in project PRJNA597057 range from SRR10755563 to SRR10755886).
Download multiple files from NCBI Sequence Read Archive. Use when you're interested in runs that are not named in a series. Create a text file called "runs.txt" at the end of the name. For example...
lou$ head runs.txt
ERR2129782
ERR2129783
ERR2129800
ERR2129801
ERR2129803
ERR2129872
ERR2129873
ERR2129875
ERR2129891
ERR2129909