conda env create -f env.yaml -n flow
conda activate flowThe default cudatoolkit version is 11.3. You may change it in env.yaml.
Protein structures in the SAbDab dataset can be downloaded here. We have also provided data we used according to sabdab_summary_all.csv. Extract all_structures.zip into the data folder.
For the preparation of templates used in the training and inference process, we have provided the data utilized during our experimental procedures. Please download it and extract template.zip into the project directory.
PyRosetta is required to relax the generated structures and compute binding energy. Please follow the instructions here to install.
Ray is required to relax and evaluate the generated antibodies. Please install Ray using the following command:
pip install -U rayThe data for HIV antibody sampling, along with the corresponding code, can be downloaded here. The download includes all the sequences we sampled as well as their associated structural files.
We have open-sourced the model for generating CDRH3 and the model for sampling HIV antibodies in the ./trained_models directory.
Below is the usage of design_testset.py.
python design_testset.py --template_dict YOUR_TEMPLATE_DICT_PATH -c CONFIG_PATH -b 32 1We have included instructions for use in the test_all.sh . To sample and generate predictions for all proteins in the test set, simply run the command:
bash test_all.shBelow is the usage of design_pdb.py.
python design_pdb.py \
<path-to-pdb> \
--heavy <heavy-chain-id> \
--light <light-chain-id> \
--template TEMPLATE_PATH \
--config <path-to-config-file>The specific process for generating the template file can be referred to in the preparation steps found in the ./hiv_target directory.
If you wish to resample and generate sequences for HIV antibodies, you just need to run the command:
bash test_hiv.sh