eNRSA is an enhanced version of NRSA for analyzing nascent transcriptome generated by PRO-seq, GRO-seq, (m)NET-seq, and Butt-seq data. The source code of eNRSA is available at GitHub. The Docker image is available at DockerHub
There are two ways for using eNRSA:
- Command line usage – download eNRSA code and install the dependencies (see Installation and Usage).
- Docker or Singularity container (see Container Usage).
If you encounter any issues using this package, please email support#example.com (replace # with @).
Qi Liu, @liuqivandy Email: qi.liu#vumc.org
Jing Wang, @jingwang Email: jing.wang.1#vumc.org
HuaChang Chen @chc-code Email: hua-chang.chen#vumc.org
eNRSA has the following dependencies:
fisherpandasnumpypydeseq2matplotlibscipystatsmodels
fisher package For the Fisher's exact test, we used the
fisherpackage instead ofscipy.stats.fisher_exactdue to its significantly faster performance. However, the version offisheravailable on PyPI is outdated and may fail to install usingpip install fisher.To resolve this, you can either:
Download the package from its GitHub repository and install it locally:
git clone https://github.com/tylerjereddy/fishers_exact_test.git pip install ./fishers_exact_testUse
condato install it:conda install -c conda-forge fisherBoth methods ensure you have a functional and up-to-date version of the
fisherpackage.
bedtoolshomer
We recommend using conda to install the dependencies. Follow these steps:
# Create an environment for eNRSA
conda create -n enrsa python=3.9 -y
conda activate enrsa
# Install dependencies
conda install -y fisher pandas numpy pydeseq2 matplotlib scipy statsmodels bedtools homer
# Clone the eNRSA repository
git clone https://github.com/chc-code/eNRSA.gitTo work with built-in reference files for supported organisms (hg19, hg38, mm10, mm39, dm3, dm6, ce10, danrer10), you can download them as follows:
# Download and unzip reference files
cd eNRSA
wget https://bioinfo.vanderbilt.edu/eNRSA/download/eNRSA_ref.zip
unzip eNRSA_ref.zip
rm eNRSA_ref.zipYou can also use pre-built Docker or Singularity images for eNRSA, which include all dependencies and reference files.
# Docker
docker pull chccode/enrsa:latest
# Singularity
singularity build enrsa.sif docker://chccode/enrsa:latestUse the -in1 and -in2 options to specify alignment files for control and case samples, respectively. For multiple files, separate them with spaces.
Supported file formats:
- BAM: Automatically converted to sorted BED format using
bedtools bamtobed. This step may take several minutes depending on file size. - BED: If the input BED files are already sorted, add the
-sortedflag to skip the sorting step.
Example:
python eRNA.py -in1 ctrl_sample.bam -in2 case_sample.bam -sortedTo include batch correction or complex experimental designs, use the -design option. This option overrides -in1 and -in2.
The design table is a tab-delimited text file with two sections:
-
Sample Information:
- Column 1: Full path to the alignment file (required)
- Column 2: Group name, used to define comparisons (required)
- Column 3: Batch group for correction (optional)
-
Comparison Definitions:
- Each row starts with
@@. - The first column specifies the case group name.
- The second column specifies the control group name.
- Each row starts with
Example:
/nobackup/INF_Cre_0hr_1-R1.bam INF_Cre_0hr b1
/nobackup/INF_Cre_0hr_2-R1.bam INF_Cre_0hr b2
/nobackup/INF_EV_0hr_1-R1.bam INF_EV_0hr b1
/nobackup/INF_EV_0hr_2-R1.bam INF_EV_0hr b2
@@INF_Cre_0hr INF_EV_0hr
eNRSA supports 8 built-in organisms (hg19, hg38, mm10, mm39, dm3, dm6, ce10, danrer10). For other organisms or custom annotations, specify a GTF file using -gtf <file_path>.
- Rows with
exonas the third column (feature) will be processed. - The ninth column (attributes) must include
transcript_idandgene_name.
eNRSA generates a variety of results, including:
- Gene-Level Metrics: Nascent RNA abundance in promoter-proximal and gene body regions, pausing index, and significance.
- Differential Analysis: Changes in pausing index across conditions.
- Enhancer and eRNA Detection: Identified active enhancers, long eRNAs, and their quantification.
- Visualizations: Heatmaps and boxplots offering a global view of the data.
Results for known genes are stored in the known_gene folder.
If eRNA.py is run, an additional eRNA folder will be created.
These folders include primary tables and visualization files, such as:
- Differential analysis results.
- Heatmaps of transcription changes.
- Boxplots of promoter and gene body read densities.
| File Name | File Description |
|---|---|
| pindex.txt | Pausing information for each gene in all samples |
| normalized_pp_gb.txt | Normalized read counts in promoter-proximal and gene body regions for each gene in all samples |
| pp_change.txt | Differential expression results of genes within promoter-proximal region across two conditions |
| gb_change.txt | Differential expression results of genes within gene body region across two conditions |
| pindex_change.txt | Differential expression results of genes of pausing index across two conditions |
| boxplot_ppdensity.pdf | Box plot of normalized read density of promoter-proximal regions for each sample |
| boxplot_gbdensity.pdf | Box plot of normalized read density of gene body regions for each sample |
| boxplot_pausingIndex.pdf | Box plot of pausing index for each sample |
| pindex_change.pdf | Heatmap of pausing index change across two conditions for genes with adjp < 0.05 |
| heatmap.pdf | Heatmap of condition-dependent transcription changes around TSS for active genes |
| Reps_condition1.tif | Histogram for variation across samples within condition 1 |
| Reps_condition2.tif | Histogram for variation across samples within condition 2 |
| TSS_alternative_isoforms_between_conditions.sig.tsv | The alternative TSS used in different conditions |
| TTS_alternative_isoforms_between_conditions.sig.tsv | The alternative TTS used in different conditions |
| File Name | File Description |
|---|---|
| Enhancer.txt | List of identified enhancers with annotation, predicted target genes from different strategies, and rank scores |
| Enhancer_center.txt | List of enhancer centers |
| normalized_count_enhancer.txt | Normalized counts for each enhancer |
| Enhancer_change.txt | Differential expression results of enhancers across two conditions |
| long_eRNA.txt | Identified long eRNAs (default: length > 10 Kb) |
| longeRNA-pindex.txt | Pausing information of long eRNAs for all samples |
| longeRNA-normalized_pp_gb.txt | Normalized read counts in promoter-proximal and gene body regions of long eRNAs |
| longeRNA-pp_change.txt | Differential expression results of promoter-proximal regions of long eRNAs across two conditions |
| longeRNA-gb_change.txt | Differential expression results of gene body regions of long eRNAs across two conditions |
| longeRNA-pindex_change.txt | Differential expression results of pausing index of long eRNAs across two conditions |
| signal_around_enhancer-center.pdf | PROseq signal around enhancer center for all samples |
You can add the eNRSA package folder to your $PATH, allowing you to run the scripts by their names without specifying the full path. e.g. suppose your eNRSA package is under
/home/user1/eNRSA
export PATH=$PATH:/home/user1/eNRSA
pause_PROseq.py <options>
eRNA.py <options>
- Add the
-vargument of docker command to mount the input and output disk on the host, e.g. you are working under /data directory, you have to add-v /data:/data, otherwise, your data won't be recognized inside of docker - you have to use the absolute path for all the files , including
-in1/-in2or-gtf/-fa(if applicable), the relative path won't work, because docker used isolated filesystem, and the working path inside and outside of docker are different.
docker run -v /data:/data chccode/enrsa:latest pause_PROseq.py <options>
docker run -v /data:/data chccode/enrsa:latest eRNA.py <options>
Singularity typically mounts the current working directory automatically. However, if you encounter mounting issues, you can explicitly bind the input and output directories using the -B option.
singularity exec -B /data:/data nrsa.sif pause_PROseq.py <options>
singularity exec -B /data:/data nrsa.sif eRNA.py <options>
| -in1 [bed/bam] | required, read alignment files in bed (6 columns) or bam format for condition1, each file is separated by space |
| -in2 [bed/bam] | read alignment files in bed (6 columns) or bam format for condition2, each file is separated by space (It is NOT required to have the same number of samples for each condition. The differential analysis is performed on condition 2 vs. 1.) |
| -design / -design_table [file] | Optional, desgin table in tsv format. 2 sections, first section is the sample information, 2 or 3 columns. For details, please refer to [[#Input Files]] part |
| -gtf | user specified GTF file, if not specified, will use the default GTF file for the organism |
| -fa | Full path for the fasta file for the genome. If not specified, will search under the fa folder under the package directory. |
| -o [string] | required, output/work directory |
| -f1 [string] | normalization factors for samples of condition1, separated by space. If not specified, we will use DESeq2 default method to normalize |
| -f2 [string] | normalization factors for samples of condition2, same to -f1 |
| -u [int] | defines the upstream of TSS as promoter (bp, default: 500) |
| -d [int] | defines the downstream of TSS as promoter (bp, default: 500) |
| -b [int] | defines the start of gene body density calculation (bp, default: 1000) |
| -l [int] | defines the minimum length of a gene to perform the analysis (bp, default: 1000). Genes with length less than this will be ignored. |
| -w [int] | defines the window size (bp, default: 50) |
| -s [int] | defines the step size (bp, default: 5) |
| -h | help message |
