Skip to content

snmalk/dimpl

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DIMPL (Discovery of Intergenic Motifs PipeLine)

==============================

Summary

The DIMPL discovery pipeline enables rapid extraction and selection of bacterial IGRs that are enriched for structured ncRNAs. DIMPL also automates the subsequent computational steps necessary for their functional identification.

Requirements

For Local Computer

For Compute Cluster

Quick-start

Cluster Configuration

  1. Download the IGR search database (filename: s50.igr.fasta) from this link to your cluster using Globus FTP.

  2. Ensure the availability of the BLAST nr database on your computational cluster. Follow these instructions for updating/downloading the latest version.

Local Configuration

  1. Clone the repository.

git clone github.com/breakerlab/dimpl

  1. Download the docker image.

docker pull breakerlab/dimpl

  1. Configure docker to grant containers access to the folder where the DIMPL repository is located

  2. Modify the script template files found at dimpl/src/shell/*_template.sh with the database locations and appropriate commands for importing utilities on your cluster.

  3. Run ./start.sh in the main repository directory. Follow the first-time configuration instructions.

  4. Follow the link generated by the start.sh script to access the DIMPL jupyter notebooks.

Data Transfer between Local Machine and Cluster

The DIMPL notebooks generate compressed .tar.gz files consisting of all the scripts and data necessary to run the more computationally steps on a cluster. These .tar.gz files are placed in the directory data/export. After transferring the files to a cluster they should be unpacked using the command tar xzvf data-dir.tar.gz. When tasks on the cluster complete the directory should be recompressed using the command tar czvf data-dir.tar.gz data-dir .

File Organization


├── .env               <- File generated during configuration step of start.sh
├── LICENSE
├── README.md          <- This document
├── start.sh           <- Script to perform initial configuration and start the docker container
├── data
│   ├── export         <- Where DIMPL places data and bash script tar.gz files  
│   ├── import         <- Where to place re-compressed tar.gz files that have been run on a compute cluster
│   ├── interim        <- Where processed genomic data is stored during analysis
│   └── raw            <- The original genomic data.
│
├── docs               <- Sphinx documentation for DIMPL
│
├── notebooks          <- Jupyter notebooks for the various steps of DIMPL
│   ├── 1-Genome-IGR-Selection.ipynb    <- 
│   ├── 2-BLAST-Processing.ipynb       <- 
│   ├── 3-IGR-Report.ipynb              <- 
│   └── 4-Motif-Refinement.ipynb        <- 
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
└── src                <- Source code for use in this project.

About

DIMPL: Discovery of Intergenic Motifs PipeLine

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 94.1%
  • Python 3.8%
  • Shell 1.9%
  • Dockerfile 0.2%