Skip to content

ak422/DDAffinity

Repository files navigation

DDAffinity-network


Description

This repo contains code for Predicting the changes in binding affinity of multiple point mutations using protein three-dimensional structure by Guanglei Yu, Qichang Zhao, Xuehua Bi and Jianxin Wang.

We proposed a ProteinMPNN-inspired $\Delta\Delta G$ predictor using 3D structure and 2D sequences of wildtype $\mathcal{WT}$ and mutant $\mathcal{MT}$ protein complex as input. The mutant structure is generated by BuildModel and Optimize module using FoldX 5.0.

  • Clipped patches: when given $\mathcal{WT}$ and $\mathcal{MT}$, we clipped $\mathcal{WT}$ and $\mathcal{MT}$ into residue patches containing 256 residues respectively, which are the 256 nearest neighbors of mutant residues based on $C_{\beta}$ distances of inter-residues, including the mutant residues itself.
  • Two-step additive Gaussian noising strategy: To improve the performance and generalization of DDAffinity, we implemented a two-step additive Gaussian noising strategy for the atomic coordinates of residues. Firstly, the additive Gaussian noise ($std=0.2\mathring{\mathrm A}$) was combined with all input atomic coordinates, which yields the perturbed backbone dihedrals $(\phi,\psi,\omega)$ and sidechain dihedrals $(\chi^{(1)},\chi^{(2)},\chi^{(3)},\chi^{(4)})$. Secondly, inspired by the ideas of ProteinMPNN that can improve predictive performance and make prediction algorithm more robust, we also incorporate Gaussian noise ($std=0.2\mathring{\mathrm A}$) to the atomic coordinates of protein backbone atom set $\boldsymbol{A}={N,C_\alpha,C,O,C_\beta}$. Importantly, this perturbation was implemented without updating the backbone dihedrals and sidechain dihedrals. Additionally, we only implemented above mentioned two-step additive Gaussian noising strategy during training.
  • How to construct the $k$-nearest neighbor graph. We use three different neighbor residues: (1) Spatial distance $k_1$. A residue will be connected to its $k_1$-nearest neighbors according to their spatial Euclidean distances, which ensures that the spatial densities of different proteins are comparable. (2) Sequential distance $k_2$. The linear interactions of residues are defined as the sequential distance between the residue $r_i$ and its sequence neighbors if their sequential distances are no more than $(k_2-1)/2$. (3) Long-range distance $k_3$. For efficiently capturing those dependencies that are long-range in sequence but local in 3D Euclidean space, neighbors of residue $r_i$ are ranked in ascending order according to their Euclidean distances, and discarded if their sequence distances are not greater than $(k_2-1)/2$. After that, we select the $k_3$-nearest neighbors from the ordered neighbor list. In summary, $k=k_1+k_2+k_3$.

Overview of our DDAffinity architecture is shown below.

cover

Contact


  • Please check out our latest work "CATH-ddG: towards robust mutation effect prediction on protein–protein interactions out of CATH homologous superfamily" on mutational effect prediction for protein-protein interactions at github

Install

DDAffinity Environment
conda env create -f env.yml -n DDAffinity
conda activate DDAffinity

The default PyTorch version is 1.12.1 and cudatoolkit version is 11.3. They can be changed in env.yml.

Preparation of processed dataset

We generated all protein mutant complex PDB data and wild-type complex PDB data from PDBs file data/SKEMPI2/PDBs, rde/datasets/PDB_generate.py, data/SKEMPI2/SKEMPI2.csv, and FoldX tool. Then we use rde/datasets/skempi_parallel.py to transform the PDB files of wild-type and mutant complexes into processed dataset SKEMPI2_cache.

python PDB_generate.py 
python skempi_parallel.py --reset

Datasets

Dataset Download Script Processed Dataset
SKEMPI v2 data/get_skempi_v2.sh data/SKEMPI2/SKEMPI2_cache
SKEMPI2.csv SKEMPI2_cache
M1707.csv
S1131.csv
M1340.csv
M595.csv
S494.csv
S285.csv S285_cache
Ssys.csv

Trained Weights


The overall SKEMPI2 trained weights is located in: DDAffinity

The M1340 trained weights is located in: M1340

Usage


Evaluate DDAffinity
python test_DDAffinity.py ./configs/train/mpnn_ddg.yml --device cuda:0
Blind testing: non-redundant blind testing on the multiple point mutation dataset M595
python case_study.py ./configs/inference/blind_testing.yml --device cuda:0
Case Study 1: Predict Mutation Effects for SARS-CoV-2 RBD
python case_study.py ./configs/inference/case_study_1.yml --device cuda:0
Case Study 2: Human Antibody Optimization
python case_study.py ./configs/inference/case_study_2.yml --device cuda:0
Train DDAffinity
python train_DDAffinity.py ./configs/train/mpnn_ddg.yml --num_cvfolds 10 --device cuda:0

Acknowledgements


We acknowledge that parts of our code is adapted from Rotamer Density Estimator (RDE). Thanks to the authors for sharing their codes.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published