A comprehensive implementation of Long-Short-Range Message-Passing and state-of-the-art models for molecular dynamics simulation, optimized for Multi-GPU training.
- Overview
- Architecture
- BRICS Algorithm
- Project Structure
- Prerequisites
- Installation
- Configuration
- Usage
- Troubleshooting
- API Reference
- Development
- Enhanced Troubleshooting
- Citation
- Credits
- License
- Acknowledgments
This repository provides:
- Implementation of Long-Short-Range Message-Passing (LSR-MP)
- Multiple state-of-the-art models for molecular dynamics simulation
- Multi-GPU training optimization
- Support for various fragmentation methods
Note: Preprocessing steps may take longer than expected, especially for larger molecular systems. Please be patient during the initial data preparation phase.
The Breaking of Retrosynthetically Interesting Chemical Substructures (BRICS) method is widely used in quantum chemistry, chemical retrosynthesis, and drug discovery.
- Bond Dissection: Compounds are dissected at 16 predefined bond types selected by organic chemists
- Environmental Awareness: Considers chemical environment (atom types) to maintain reasonable fragment sizes
- Filtering: Removes extremely small fragments, duplicates, and overlapping structures
- Stabilization: Adds supplementary atoms (mainly hydrogen) at bond-breaking points for chemical stability
LSR-MP/
├── README.md # Project documentation
├── build_env.sh # Environment setup script
├── run_ddp.py # Main training script with DDP support
├── prepare_lmdb.py # Data preprocessing utilities
├── graph.py # Graph construction utilities
├── xyz2mol.py # XYZ to molecule conversion
├── lightnp/ # Core framework package
│ ├── LSRM/ # Long-Short Range Message Passing implementation
│ │ ├── models/ # Model architectures
│ │ │ ├── lsrm_modules.py # LSR-MP model implementations
│ │ │ ├── e2former_lsrmp.py # E2Former integration
│ │ │ ├── long_short_interact_modules.py # Interaction modules
│ │ │ ├── nets/ # Neural network components
│ │ │ └── torchmdnet/ # TorchMD-Net integration
│ │ ├── data/ # Data handling and loading
│ │ │ ├── lmdb_dataset.py # LMDB dataset implementation
│ │ │ ├── atoms_loader.py # Atomic data loading
│ │ │ └── pyg_wrapper.py # PyTorch Geometric wrapper
│ │ └── utils/ # Utility functions
│ │ ├── build_group_graph.py # Graph construction
│ │ ├── transforms.py # Data transformations
│ │ └── model_utils.py # Model utilities
│ ├── data/ # General data processing
│ ├── train/ # Training infrastructure
│ │ ├── ddp_trainer.py # Distributed training
│ │ ├── hooks/ # Training hooks and callbacks
│ │ └── loss.py # Loss functions
│ └── utils/ # General utilities
├── checkpoints/ # Model checkpoints and training logs
├── MD22/ # MD22 dataset storage
└── plots/ # Visualization assets
├── LSR-MP.png # Architecture diagram
├── BRICS_algorithm.png # BRICS workflow
└── supplementary-fig.png # Additional figures
Before installation, ensure you have:
- Python 3.7+
- CUDA-compatible GPU(s) for training
- Conda or Miniconda installed
- Git for cloning the repository
If you plan to use the E2Former model, you must first clone the official E2Former repository:
git clone https://github.com/2023huang6385/E2Former.gitThen modify the path in lightnp/LSRM/models/e2former_lsrmp.py (line 14):
# Change this line to point to your E2Former installation
sys.path.append('/path/to/your/E2Former')Replace /path/to/your/E2Former with the actual path where you cloned the E2Former repository.
Create a new conda environment:
conda create -n LSR-MP python=3.8conda activate LSR-MPThe main packages required:
torch- PyTorch framework (v1.9.0+cu111)torch-geometric- Graph neural network library (v2.0.3)torch-scatter- Scatter operations for PyTorch (v2.0.8)ase- Atomic Simulation Environmentrdkit- Chemical informatics toolkitwandb- Experiment trackingpytorch_lightning- Training framework (v1.5.0)- \
Install using the provided script:
chmod +x build_env.sh
./build_env.shFor manual installation or customization:
# Install PyTorch with CUDA support
pip install torch==1.9.0+cu111 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu111
# Install PyTorch Geometric and related packages
pip install torch-scatter==2.0.8 torch-sparse==0.6.10 torch-cluster==1.5.9 -f https://data.pyg.org/whl/torch-1.9.0+cu111.html
pip install torch-geometric==2.0.3
# Install additional dependencies
pip install pytorch_lightning==1.5.0
pip install wandb torch-ema ase sympy
pip install opencv-python-headless
conda install yaml -y
# Install framework-specific requirements
pip install -r lightnp_env_requirements.txtVerify your installation:
python -c "import torch; print(f'PyTorch: {torch.__version__}')"
python -c "import torch_geometric; print(f'PyG: {torch_geometric.__version__}')"
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"| Parameter | Description | Default | Range |
|---|---|---|---|
--model |
Model architecture | Visnorm_shared_LSRMNorm2_2branchSerial |
See Available Models |
--hidden_channels |
Hidden layer dimensions | 128 | 64-512 |
--num_interactions |
Number of interaction layers | 6 | 1-12 |
--long_num_layers |
Long-range interaction layers | 2 | 1-6 |
--dropout |
Dropout rate | 0.1 | 0.0-0.5 |
| Parameter | Description | Default | Range |
|---|---|---|---|
--lr |
Learning rate | 0.0004 | 1e-5 to 1e-2 |
--batch_size |
Training batch size | 16 | 1-128 |
--ema_decay |
Exponential moving average decay | 0.999 | 0.9-0.9999 |
--early_stop_patience |
Early stopping patience | 500 | 100-1000 |
--rho_criteria |
Convergence criteria | 0.001 | 1e-4 to 1e-2 |
| Parameter | Description | Options |
|---|---|---|
--molecule |
Target molecule system | Ac_Ala3_NHMe, DHA, stachyose, AT_AT, AT_AT_CG_CG, double_walled_nanotube, buckyball_catcher |
--group_builder |
Fragmentation method | rdkit (BRICS), k-means, spectral |
--dataset |
Dataset name | my_dataset |
| Parameter | Description | Default | Typical Range |
|---|---|---|---|
--otfcutoff |
On-the-fly cutoff (short-range graph) | 4.0 Å | 3.0-6.0 Å |
--short_cutoff_upper |
Short-range upper cutoff | 4.0 Å | 3.0-6.0 Å |
--long_cutoff_lower |
Long-range lower cutoff | 0.0 Å | 0.0-2.0 Å |
--long_cutoff_upper |
Long-range upper cutoff | 9.0 Å | 6.0-12.0 Å |
Note: --otfcutoff must equal --short_cutoff_upper for proper graph construction.
Before training, prepare your molecular dynamics data:
-
Convert XYZ to LMDB format:
python prepare_lmdb.py --input_path /path/to/xyz/files --output_path ./MD22/
-
Build molecular graphs:
python graph.py --molecule AT_AT_CG_CG --fragmentation rdkit
-
Verify data integrity:
python -c "from lightnp.LSRM.data import LmdbDataset; ds = LmdbDataset('./MD22/AT_AT_CG_CG/my_dataset'); print(f'Dataset size: {len(ds)}')"
- BRICS (
rdkit) - Chemical substructure-based fragmentation - K-Means Clustering (
k-means) - Distance-based clustering - Spectral Clustering (
spectral) - Graph-based clustering
Ac_Ala3_NHMe- Alanine tripeptideDHA- Docosahexaenoic acidstachyose- TetrasaccharideAT_AT- Adenine-thymine base pairsAT_AT_CG_CG- Mixed DNA base pairsdouble_walled_nanotube- Carbon nanotube structurebuckyball_catcher- Fullerene host-guest complex
- VisNet-LSRM (
Visnorm_shared_LSRMNorm2_2branchSerial) - Long-Short Range Message Passing - Equivariant Transformer (
TorchMD_ET) - Transformer-based approach - PaiNN - Polarizable Atom Interaction Neural Network
- Equiformer-LSRM (
dot_product_attention_transformer_exp_l2_md17_lsrmserial) - E(3)-equivariant transformer with LSRM - E2Former (
E2Former) - State-of-the-art E(3)-equivariant transformer with direct force prediction
Train LSR-MP on a single GPU:
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port=1230 \
run_ddp.py \
--datapath ./ \
--model=Visnorm_shared_LSRMNorm2_2branchSerial \
--molecule AT_AT_CG_CG \
--dataset=my_dataset \
--group_builder rdkit \
--num_interactions=6 --long_num_layers=2 \
--lr=0.0004 --rho_criteria=0.001 \
--dropout=0 --hidden_channels=128 \
--calculate_meanstd --otfcutoff=4 \
--short_cutoff_upper=4 --long_cutoff_lower=0 --long_cutoff_upper=9 \
--early_stop --early_stop_patience=500 \
--no_broadcast --batch_size=16 \
--ema_decay=0.999 --dropout=0.1 \
--wandb --api_key [YOUR_WANDB_API_KEY]For Equiformer-LSRM model, use:
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port=1230 \
run_ddp.py \
--datapath ./ \
--model=dot_product_attention_transformer_exp_l2_md17_lsrmserial \
--molecule AT_AT_CG_CG \
--dataset=my_dataset \
--group_builder rdkit \
--num_interactions=6 --long_num_layers=2 \
--lr=0.0004 --rho_criteria=0.001 \
--dropout=0 --hidden_channels=128 \
--calculate_meanstd --otfcutoff=4 \
--short_cutoff_upper=4 --long_cutoff_lower=0 --long_cutoff_upper=9 \
--early_stop --early_stop_patience=500 \
--no_broadcast --batch_size=16 \
--ema_decay=0.999 --dropout=0.1 \
--wandb --api_key [YOUR_WANDB_API_KEY]For E2Former model (state-of-the-art E(3)-equivariant transformer), use:
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port=1230 \
run_ddp.py \
--datapath ./ \
--model=E2Former \
--molecule AT_AT_CG_CG \
--dataset=my_dataset \
--group_builder rdkit \
--num_interactions=6 \
--lr=0.0004 --rho_criteria=0.001 \
--dropout=0 --hidden_channels=128 \
--calculate_meanstd --otfcutoff=4 \
--short_cutoff_upper=4 \
--early_stop --early_stop_patience=500 \
--no_broadcast --batch_size=16 \
--ema_decay=0.999 --dropout=0.1 \
--wandb --api_key [YOUR_WANDB_API_KEY]Note: E2Former uses direct force prediction via eSCN force blocks and does not utilize long-short range message passing. The model is based on the vanilla E2Former architecture without LSR-MP extensions.
Important: Before using E2Former, ensure you have:
- Cloned the official E2Former repository:
git clone https://github.com/2023huang6385/E2Former.git - Updated the path in
lightnp/LSRM/models/e2former_lsrmp.pyline 14 to point to your E2Former installation
--datapath: Path to your dataset directory--model: Model architecture to use--molecule: Target molecule system--group_builder: Fragmentation method (rdkit, k-means, spectral)--num_interactions: Number of interaction layers--lr: Learning rate--hidden_channels: Hidden layer dimensions--batch_size: Training batch size
Train using Distributed Data Parallel across multiple GPUs:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port=1230 \
run_ddp.py \
--datapath ./ \
--model=Visnorm_shared_LSRMNorm2_2branchSerial \
--molecule AT_AT_CG_CG \
--dataset=my_dataset \
--group_builder rdkit \
--num_interactions=6 --long_num_layers=2 \
--lr=0.0004 --rho_criteria=0.001 \
--dropout=0 --hidden_channels=128 \
--calculate_meanstd --otfcutoff=4 \
--short_cutoff_upper=4 --long_cutoff_lower=0 --long_cutoff_upper=9 \
--early_stop --early_stop_patience=500 \
--no_broadcast --batch_size=16 \
--ema_decay=0.999 --dropout=0.1 \
--wandb --api_key [YOUR_WANDB_API_KEY]--nproc_per_nodemust equal the number of GPUs inCUDA_VISIBLE_DEVICES--otfcutoffmust equal--short_cutoff_upper(defines short-range graph radius)- Use
--wandbwith your API key for experiment tracking - Recommended settings provide good MD22 benchmark performance
Evaluate a trained model:
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port=1230 run_ddp.py \
--datapath [YOUR_DATA_PATH] \
--model=Visnorm_shared_LSRMNorm2_2branchSerial \
--molecule AT_AT_CG_CG \
--dataset=[DATASET_NAME] \
--test --restore_run [PATH_TO_TRAINED_MODEL] \
--wandb --api_key [YOUR_WANDB_API_KEY]- Force MAE - Mean absolute error for atomic forces
- Energy MAE - Mean absolute error for system energy
- CUDA Out of Memory: Reduce
--batch_sizeor use fewer--hidden_channels - Preprocessing Delays: Large molecular systems may require extended preprocessing time
- Missing Dependencies: Ensure all packages are installed via
build_env.sh
- Use multiple GPUs for faster training on large datasets
- Adjust cutoff parameters based on your molecular system size
- Monitor training with wandb for optimal hyperparameter tuning
# Data and Model Configuration
--datapath # Path to dataset directory
--model # Model architecture name
--molecule # Target molecule system
--dataset # Dataset identifier
--group_builder # Fragmentation method
# Training Configuration
--lr # Learning rate
--batch_size # Training batch size
--num_interactions # Number of interaction layers
--hidden_channels # Hidden layer dimensions
--dropout # Dropout rate
# Distributed Training
--nproc_per_node # Number of processes per node
--master_port # Master port for communication
# Monitoring and Logging
--wandb # Enable Weights & Biases logging
--api_key # W&B API key
--log_dir # Log directory path# Cutoff Parameters
--otfcutoff # On-the-fly cutoff radius
--short_cutoff_upper # Short-range interaction cutoff
--long_cutoff_lower # Long-range interaction lower bound
--long_cutoff_upper # Long-range interaction upper bound
# Optimization
--ema_decay # Exponential moving average decay
--early_stop # Enable early stopping
--early_stop_patience # Early stopping patience
--rho_criteria # Convergence criteria
# Model-Specific
--long_num_layers # Long-range interaction layers (LSR-MP)
--calculate_meanstd # Calculate dataset statistics
--no_broadcast # Disable parameter broadcastingfrom lightnp.LSRM.models.lsrm_modules import Visnorm_shared_LSRMNorm2_2branchSerial
# Initialize model
model = Visnorm_shared_LSRMNorm2_2branchSerial(
hidden_channels=128,
num_interactions=6,
long_num_layers=2,
dropout=0.1
)
# Forward pass
output = model(data)from lightnp.LSRM.models.e2former_lsrmp import E2Former
# Initialize E2Former model
model = E2Former(
hidden_channels=128,
num_interactions=6
)from lightnp.LSRM.data import LmdbDataset, collate_fn
from torch.utils.data import DataLoader
# Load dataset
dataset = LmdbDataset('./MD22/AT_AT_CG_CG/my_dataset')
# Create data loader
loader = DataLoader(
dataset,
batch_size=16,
shuffle=True,
collate_fn=collate_fn
)# For limited GPU memory
--batch_size 8 --hidden_channels 64 --gradient_checkpointing
# For high-memory GPUs (>24GB)
--batch_size 32 --hidden_channels 256 --mixed_precisionWe welcome contributions! Please follow these guidelines:
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Commit changes:
git commit -am 'Add feature' - Test your changes
- Push to branch:
git push origin feature-name - Submit a pull request
lightnp/LSRM/
├── models/
│ ├── lsrm_modules.py # Main LSR-MP implementations
│ ├── e2former_lsrmp.py # E2Former integration
│ └── nets/ # Neural network primitives
├── data/
│ ├── lmdb_dataset.py # LMDB data handling
│ └── atoms_loader.py # Atomic structure loading
└── utils/
├── build_group_graph.py # Graph construction
└── transforms.py # Data transformations
- Factory Pattern: Model creation via
create_model() - Strategy Pattern: Fragmentation methods in
group_builder - Observer Pattern: Training hooks and callbacks
- Template Method: Base classes for models and datasets
- Update documentation with model parameters and usage
Issue: CUDA version mismatch
# Error: RuntimeError: CUDA error: no kernel image is available
# Solution: Reinstall PyTorch with correct CUDA version
pip uninstall torch torchvision torchaudio
pip install torch==1.9.0+cu111 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu111Issue: PyTorch Geometric compilation errors
pip install --no-cache-dir --no-binary :all: torch-scatter torch-sparse torch-cluster -f https://data.pyg.org/whl/torch-1.9.0+cu111.htmlIssue: CUDA Out of Memory
# Error: RuntimeError: CUDA out of memory
# Solutions:
1. Reduce batch size: --batch_size 8
2. Reduce model size: --hidden_channels 64Issue: Graph construction errors
# Error: Invalid molecular graph
# Solutions:
1. Adjust cutoff radius: --otfcutoff 5.0
2. Use alternative fragmentation: --group_builder spectral
3. delete your processed lmdb/file path- Check GPU utilization:
nvidia-smi - Profile data loading: Add
--profile_data_loading - Optimize batch size: Use largest batch that fits in memory
- Enable mixed precision:
--mixed_precision - Use multiple GPUs: Scale with
--nproc_per_node
# Memory-efficient settings
--batch_size 4 \
--hidden_channels 64 \
--gradient_checkpointing \
--mixed_precision \
--cpu_offload# Optimize data loading
--num_workers 4 \
--pin_memory \
--prefetch_factor 2 \
--cache_dataIf you use LSR-MP in your research, please consider citing:
@inproceedings{
li2024longshortrange,
title={Long-Short-Range Message-Passing: A Physics-Informed Framework to Capture Non-Local Interaction for Scalable Molecular Dynamics Simulation},
author={Yunyang Li and Yusong Wang and Lin Huang and Han Yang and Xinran Wei and Jia Zhang and Tong Wang and Zun Wang and Bin Shao and Tie-Yan Liu},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=rvDQtdMnOl}
}This work builds upon several foundational contributions:
Equiformer
@inproceedings{
liao2023equiformer,
title={Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs},
author={Yi-Lun Liao and Tess Smidt},
booktitle={International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=KwmPfARgOTD}
}ViSNet
@article{wang2024enhancing,
title={Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing},
author={Wang, Yusong and Wang, Tong and Li, Shaoning and He, Xinheng and Li, Mingyu and Wang, Zun and Zheng, Nanning and Shao, Bin and Liu, Tie-Yan},
journal={Nature Communications},
volume={15},
number={1},
pages={313},
year={2024},
publisher={Nature Publishing Group UK London}
}E2Former
@article{li2025e2former,
title={E2Former: A Linear-time Efficient and Equivariant Transformer for Scalable Molecular Modeling},
author={Li, Yunyang and Huang, Lin and Ding, Zhihao and Wang, Chu and Wei, Xinran and Yang, Han and Wang, Zun and Liu, Chang and Shi, Yu and Jin, Peiran and others},
journal={arXiv preprint arXiv:2501.19216},
year={2025}
}This project is licensed under the MIT License - see the LICENSE file for details.
- TorchMD-Net team for the molecular dynamics foundation
- PyTorch Geometric developers for graph neural network utilities
- E2Former authors for the equivariant transformer architecture
- RDKit community for chemical informatics tools
- MD22 dataset contributors for benchmark data
- Research computing facilities for computational resources


