java -jar readSimulator.jar
-length <readLength>
-frlength <fragmentLength>
-SD <standardDeviation>
-mutationrate <mutationRate (1.0 == 1%)>
-gtf <"./inputFiles/Homo_sapiens.GRCh37.75.gtf">
-fasta <"./inputFiles/Homo_sapiens.GRCh37.75.dna.toplevel.fa">
-fidx <"./inputFiles/Homo_sapiens.GRCh37.75.dna.toplevel.fa.fai">
-readcounts <"./inputFiles/readcounts.simulation">
-od <"outputDir">
[-debug]
[-transciptome <"Homo_sapiens.GRCh37.75.cdna.all.fa">]In bioinformatics, Sequencing is a term used to describe the process of gathering genomic data by reading the nucleotides of a DNA molecule. This is done by a sequencer. There are different types of sequencing techniques (e.g Illumina (next generation sequencing), Oxford Nanopore, Pacbio (third generation sequencing), etc.) and different variants of sequencing (ATAC-seq, scRNA-seq, ChIP-seq, ...).
The simplified process of Illumina sequencing is as follows:
Several DNA target sequences get treated with ultrasound
in order to break them down into smaller fragments of a certain length with a certain margin of error (e.g. 200 bp +/-
This figure was created using BioRender. For more information, visit BioRender.
Note
This figure is a simplification of the real underlying process (i.e. the mate read is generated by reading the reverse complement of the original fragment and maps to the opposite strand as first of pair etc.
