GitHub

DDAffinity-network

Description

This repo contains code for Predicting the changes in binding affinity of multiple point mutations using protein three-dimensional structure by Guanglei Yu, Qichang Zhao, Xuehua Bi and Jianxin Wang.

We proposed a ProteinMPNN-inspired $\Delta\Delta G$ predictor using 3D structure and 2D sequences of wildtype $\mathcal{WT}$ and mutant $\mathcal{MT}$ protein complex as input. The mutant structure is generated by BuildModel and Optimize module using FoldX 5.0.

Clipped patches: when given $\mathcal{WT}$ and $\mathcal{MT}$, we clipped $\mathcal{WT}$ and $\mathcal{MT}$ into residue patches containing 256 residues respectively, which are the 256 nearest neighbors of mutant residues based on $C_{\beta}$ distances of inter-residues, including the mutant residues itself.
Two-step additive Gaussian noising strategy: To improve the performance and generalization of DDAffinity, we implemented a two-step additive Gaussian noising strategy for the atomic coordinates of residues. Firstly, the additive Gaussian noise ($std=0.2\mathring{\mathrm A}$) was combined with all input atomic coordinates, which yields the perturbed backbone dihedrals $(\phi,\psi,\omega)$ and sidechain dihedrals $(\chi^{(1)},\chi^{(2)},\chi^{(3)},\chi^{(4)})$. Secondly, inspired by the ideas of ProteinMPNN that can improve predictive performance and make prediction algorithm more robust, we also incorporate Gaussian noise ($std=0.2\mathring{\mathrm A}$) to the atomic coordinates of protein backbone atom set $\boldsymbol{A}={N,C_\alpha,C,O,C_\beta}$. Importantly, this perturbation was implemented without updating the backbone dihedrals and sidechain dihedrals. Additionally, we only implemented above mentioned two-step additive Gaussian noising strategy during training.
How to construct the $k$-nearest neighbor graph. We use three different neighbor residues: (1) Spatial distance $k_1$. A residue will be connected to its $k_1$-nearest neighbors according to their spatial Euclidean distances, which ensures that the spatial densities of different proteins are comparable. (2) Sequential distance $k_2$. The linear interactions of residues are defined as the sequential distance between the residue $r_i$ and its sequence neighbors if their sequential distances are no more than $(k_2-1)/2$. (3) Long-range distance $k_3$. For efficiently capturing those dependencies that are long-range in sequence but local in 3D Euclidean space, neighbors of residue $r_i$ are ranked in ascending order according to their Euclidean distances, and discarded if their sequence distances are not greater than $(k_2-1)/2$. After that, we select the $k_3$-nearest neighbors from the ordered neighbor list. In summary, $k=k_1+k_2+k_3$.

Overview of our DDAffinity architecture is shown below.

Contact

Please check out our latest work "CATH-ddG: towards robust mutation effect prediction on protein–protein interactions out of CATH homologous superfamily" on mutational effect prediction for protein-protein interactions at github

Install

DDAffinity Environment

conda env create -f env.yml -n DDAffinity
conda activate DDAffinity

The default PyTorch version is 1.12.1 and cudatoolkit version is 11.3. They can be changed in env.yml.

Preparation of processed dataset

We generated all protein mutant complex PDB data and wild-type complex PDB data from PDBs file data/SKEMPI2/PDBs, rde/datasets/PDB_generate.py, data/SKEMPI2/SKEMPI2.csv, and FoldX tool. Then we use rde/datasets/skempi_parallel.py to transform the PDB files of wild-type and mutant complexes into processed dataset SKEMPI2_cache.

python PDB_generate.py 
python skempi_parallel.py --reset

Datasets

Dataset	Download Script	Processed Dataset
SKEMPI v2	`data/get_skempi_v2.sh`	`data/SKEMPI2/SKEMPI2_cache`
SKEMPI2.csv	—	SKEMPI2_cache
M1707.csv	—	—
S1131.csv	—	—
M1340.csv	—	—
M595.csv	—	—
S494.csv	—	—
S285.csv	—	S285_cache
Ssys.csv	—	—

Trained Weights

The overall SKEMPI2 trained weights is located in: DDAffinity

The M1340 trained weights is located in: M1340

Usage

Evaluate DDAffinity

python test_DDAffinity.py ./configs/train/mpnn_ddg.yml --device cuda:0

Blind testing: non-redundant blind testing on the multiple point mutation dataset M595

python case_study.py ./configs/inference/blind_testing.yml --device cuda:0

Case Study 1: Predict Mutation Effects for SARS-CoV-2 RBD

python case_study.py ./configs/inference/case_study_1.yml --device cuda:0

Case Study 2: Human Antibody Optimization

python case_study.py ./configs/inference/case_study_2.yml --device cuda:0

Train DDAffinity

python train_DDAffinity.py ./configs/train/mpnn_ddg.yml --num_cvfolds 10 --device cuda:0

Acknowledgements

We acknowledge that parts of our code is adapted from Rotamer Density Estimator (RDE). Thanks to the authors for sharing their codes.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
assets		assets
configs		configs
data		data
rde		rde
trained_models		trained_models
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
case_study.py		case_study.py
env.yml		env.yml
test_DDAffinity.py		test_DDAffinity.py
train_DDAffinity.py		train_DDAffinity.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DDAffinity-network

Description

Contact

Install

DDAffinity Environment

Preparation of processed dataset

Datasets

Trained Weights

Usage

Evaluate DDAffinity

Blind testing: non-redundant blind testing on the multiple point mutation dataset M595

Case Study 1: Predict Mutation Effects for SARS-CoV-2 RBD

Case Study 2: Human Antibody Optimization

Train DDAffinity

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

ak422/DDAffinity

Folders and files

Latest commit

History

Repository files navigation

DDAffinity-network

Description

Contact

Install

DDAffinity Environment

Preparation of processed dataset

Datasets

Trained Weights

Usage

Evaluate DDAffinity

Blind testing: non-redundant blind testing on the multiple point mutation dataset M595

Case Study 1: Predict Mutation Effects for SARS-CoV-2 RBD

Case Study 2: Human Antibody Optimization

Train DDAffinity

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages