Pallatom is an innovative protein generation model that produces protein structures with all-atom coordinates. By learning and modeling the joint distribution
To set up the environment for running Pallatom, follow these steps:
-
Create and activate a conda environment:
conda create --name pallatom python=3.7.16 conda activate pallatom
-
Install JAX:
First, install the specific version of JAX needed for this project:
pip install jax==0.3.25 pip install "jax[cuda]"==0.3.25 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html -
Install other dependencies:
Finally, install the additional required packages from
requirements.txt:pip install -r requirements.txt
If you encounter compatibility issues with higher CUDA versions, JAX 0.3.25, and Python 3.7, we offer the following solution using Python 3.10 and JAX with CUDA 12.6:
Create and activate a conda environment:
conda create --name pallatom python=3.10
conda activate pallatomInstall basic dependencies:
pip install biopython==1.79 dm-tree==0.1.8 chex==0.1.86 dm-haiku==0.0.12 dm-tree==0.1.8 immutabledict==2.0.0 ml-collections==0.1.0 numpy==1.24.3 pandas==2.0.3 scipy==1.11.1 tensorflow-cpu==2.16.1 rdkit einops tqdmInstall JAX with CUDA support:
pip install "jax[cuda]"==0.4.34 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.htmlTo run the Pallatom model sampling process, use the pallatom.py script. Below is an example of how to use the script with command-line arguments:
python pallatom.py --savepath ./results --L 100 --cuda_devices 0 --t_min 0.01 --t_max 1.0 --gamma 0.2 --step_scale 2.25 --T 200 --rounds 10data_dir: Directory where model parameters are stored (default:./)model_name: Name of the model to use (default:Pallatom)savepath: Directory where results will be saved (default:./results)L: Length of the sequence to sample (default:120)batch_num: Number of batches to run (default:4)cuda_devices: CUDA visible device (default:0)t_min: Minimum noise level foradd_noise_level(default:0.01)t_max: Maximum noise level foradd_noise_level(default:1.0)gamma: Gamma value foradd_noise_level(default:0.2)step_scale: Scale of the step (default:2.25)T: Number of steps for the sampling process (default:200)rounds: Number of rounds to run (default:1)
The results, including the generated sequences in FASTA format and protein structures in PDB format, will be saved in the specified savepath directory.
In ./db_scripts/pipeline.py, we provide the training data processing pipeline, including metric calculation and filtering, deduplication, and final clustering.
If you find Pallatom useful in your research, please consider citing our work:
@article {Qu2024.08.16.608235,
author = {Qu, Wei and Guan, Jiawei and Ma, Rui and Zhai, Ke and Wu, Weikun and Wang, Haobo},
title = {P(all-atom) Is Unlocking New Path For Protein Design},
year = {2024},
doi = {10.1101/2024.08.16.608235},
journal = {bioRxiv}
}This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
