SolvIT is a machine learning model designed to predict protein solubility in Escherichia coli and aid in enzyme design by prioritizing high-solubility candidates from large design sets. This approach leverages a small graph neural networks (GNNs) to achieve state-of-the-art performance that rivals much larger models.
- Deep Learning Integration: Uses SolvIT, a GNN-based solubility classifier trained on E. coli expression data.
- Comprehensive Pipeline: Automates feature extraction, solubility prediction, and result formatting.
- Ease of Use: Minimal setup requirements for running pre-defined protein designs.
Before running the SolvIT pipeline, ensure the following tools are installed:
- Apptainer/Singularity: Used for managing the containerized environments. Installation Instructions
- Python environment specified in the
environment.yamlfile provided in this repository.
-
Clone the repository:
git clone https://github.com/Enzymit/SolvIT.git cd SolvIT -
Create and activate the Python environment:
conda env create -f environment.yaml conda activate solvit_snakemake
-
Download the necessary Singularity container:
./singularity/download_sif.sh
Modify the config.yaml file as needed. Key parameters include:
OUTDIR: Output directory for results.INPUTDIR: Directory containing input.pdbfiles.SINGULARITY_PATH: Path to the directory containing the downloaded.siffile. (usuallysingularity)OUTFILENAME: Name of the final output file.
- Execute the Snakemake workflow:
Replace
snakemake --cores <number_of_cores> --use-singularity
<number_of_cores>with the number of CPU cores to use.
The final results will be saved in the output directory specified in config.yaml under the name provided in OUTFILENAME.
The pipeline consists of the following steps:
- Feature Extraction: Extracts features from input
.pdbfiles using Rosetta and saves them in a compressed format. - SolvIT Prediction: Runs the solubility prediction model on the extracted features.
- Result Formatting: Processes raw predictions into a user-friendly
.csvfile.
OUTDIR: "output"
INPUTDIR: "example"
SINGULARITY_PATH: "singularity"
OUTFILENAME: "solvit_out.csv"If you use SolvIT in your research, please cite:
Zimmerman et al., Context-dependent design of induced-fit enzymes using deep learning generates well-expressed, thermally stable, and active enzymes. PNAS, 2024. DOI:10.1073/pnas.2313809121
This project is licensed under the GNU General Public License v3.0 (GPL-3.0).
For any questions or issues, create a new issue in the repository.