[ICLR 2024] Long-Short-Range Message-Passing

A comprehensive implementation of Long-Short-Range Message-Passing and state-of-the-art models for molecular dynamics simulation, optimized for Multi-GPU training.

Overview

This repository provides:

Implementation of Long-Short-Range Message-Passing (LSR-MP)
Multiple state-of-the-art models for molecular dynamics simulation
Multi-GPU training optimization
Support for various fragmentation methods

Note: Preprocessing steps may take longer than expected, especially for larger molecular systems. Please be patient during the initial data preparation phase.

Architecture

LSR-MP Illustration

BRICS Algorithm

The Breaking of Retrosynthetically Interesting Chemical Substructures (BRICS) method is widely used in quantum chemistry, chemical retrosynthesis, and drug discovery.

Key Features:

Bond Dissection: Compounds are dissected at 16 predefined bond types selected by organic chemists
Environmental Awareness: Considers chemical environment (atom types) to maintain reasonable fragment sizes
Filtering: Removes extremely small fragments, duplicates, and overlapping structures
Stabilization: Adds supplementary atoms (mainly hydrogen) at bond-breaking points for chemical stability

Visual Representation:

Project Structure

LSR-MP/
├── README.md                    # Project documentation
├── build_env.sh                 # Environment setup script
├── run_ddp.py                   # Main training script with DDP support
├── prepare_lmdb.py              # Data preprocessing utilities
├── graph.py                     # Graph construction utilities
├── xyz2mol.py                   # XYZ to molecule conversion
├── lightnp/                     # Core framework package
│   ├── LSRM/                    # Long-Short Range Message Passing implementation
│   │   ├── models/              # Model architectures
│   │   │   ├── lsrm_modules.py           # LSR-MP model implementations
│   │   │   ├── e2former_lsrmp.py         # E2Former integration
│   │   │   ├── long_short_interact_modules.py # Interaction modules
│   │   │   ├── nets/                     # Neural network components
│   │   │   └── torchmdnet/               # TorchMD-Net integration
│   │   ├── data/                # Data handling and loading
│   │   │   ├── lmdb_dataset.py          # LMDB dataset implementation
│   │   │   ├── atoms_loader.py          # Atomic data loading
│   │   │   └── pyg_wrapper.py           # PyTorch Geometric wrapper
│   │   └── utils/               # Utility functions
│   │       ├── build_group_graph.py     # Graph construction
│   │       ├── transforms.py            # Data transformations
│   │       └── model_utils.py           # Model utilities
│   ├── data/                    # General data processing
│   ├── train/                   # Training infrastructure
│   │   ├── ddp_trainer.py       # Distributed training
│   │   ├── hooks/               # Training hooks and callbacks
│   │   └── loss.py              # Loss functions
│   └── utils/                   # General utilities
├── checkpoints/                 # Model checkpoints and training logs
├── MD22/                        # MD22 dataset storage
└── plots/                       # Visualization assets
    ├── LSR-MP.png              # Architecture diagram
    ├── BRICS_algorithm.png     # BRICS workflow
    └── supplementary-fig.png   # Additional figures

Prerequisites

Before installation, ensure you have:

Python 3.7+
CUDA-compatible GPU(s) for training
Conda or Miniconda installed
Git for cloning the repository

Additional Requirements for E2Former Model

If you plan to use the E2Former model, you must first clone the official E2Former repository:

git clone https://github.com/2023huang6385/E2Former.git

Then modify the path in lightnp/LSRM/models/e2former_lsrmp.py (line 14):

# Change this line to point to your E2Former installation
sys.path.append('/path/to/your/E2Former')

Replace /path/to/your/E2Former with the actual path where you cloned the E2Former repository.

Installation

Step 1: Create Environment

Create a new conda environment:

conda create -n LSR-MP python=3.8

Step 2: Activate Environment

conda activate LSR-MP

Step 3: Install Dependencies

The main packages required:

torch - PyTorch framework (v1.9.0+cu111)
torch-geometric - Graph neural network library (v2.0.3)
torch-scatter - Scatter operations for PyTorch (v2.0.8)
ase - Atomic Simulation Environment
rdkit - Chemical informatics toolkit
wandb - Experiment tracking
pytorch_lightning - Training framework (v1.5.0)
\

Quick Start

Install using the provided script:

chmod +x build_env.sh
./build_env.sh

Detailed Installation

For manual installation or customization:

# Install PyTorch with CUDA support
pip install torch==1.9.0+cu111 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu111

# Install PyTorch Geometric and related packages
pip install torch-scatter==2.0.8 torch-sparse==0.6.10 torch-cluster==1.5.9 -f https://data.pyg.org/whl/torch-1.9.0+cu111.html
pip install torch-geometric==2.0.3

# Install additional dependencies
pip install pytorch_lightning==1.5.0
pip install wandb torch-ema ase sympy
pip install opencv-python-headless
conda install yaml -y

# Install framework-specific requirements
pip install -r lightnp_env_requirements.txt

Environment Verification

Verify your installation:

python -c "import torch; print(f'PyTorch: {torch.__version__}')"
python -c "import torch_geometric; print(f'PyG: {torch_geometric.__version__}')"
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

Configuration

Model Parameters

Parameter	Description	Default	Range
`--model`	Model architecture	`Visnorm_shared_LSRMNorm2_2branchSerial`	See Available Models
`--hidden_channels`	Hidden layer dimensions	128	64-512
`--num_interactions`	Number of interaction layers	6	1-12
`--long_num_layers`	Long-range interaction layers	2	1-6
`--dropout`	Dropout rate	0.1	0.0-0.5

Training Parameters

Parameter	Description	Default	Range
`--lr`	Learning rate	0.0004	1e-5 to 1e-2
`--batch_size`	Training batch size	16	1-128
`--ema_decay`	Exponential moving average decay	0.999	0.9-0.9999
`--early_stop_patience`	Early stopping patience	500	100-1000
`--rho_criteria`	Convergence criteria	0.001	1e-4 to 1e-2

Dataset Configuration

Parameter	Description	Options
`--molecule`	Target molecule system	`Ac_Ala3_NHMe`, `DHA`, `stachyose`, `AT_AT`, `AT_AT_CG_CG`, `double_walled_nanotube`, `buckyball_catcher`
`--group_builder`	Fragmentation method	`rdkit` (BRICS), `k-means`, `spectral`
`--dataset`	Dataset name	`my_dataset`

Cutoff Parameters

Parameter	Description	Default	Typical Range
`--otfcutoff`	On-the-fly cutoff (short-range graph)	4.0 Å	3.0-6.0 Å
`--short_cutoff_upper`	Short-range upper cutoff	4.0 Å	3.0-6.0 Å
`--long_cutoff_lower`	Long-range lower cutoff	0.0 Å	0.0-2.0 Å
`--long_cutoff_upper`	Long-range upper cutoff	9.0 Å	6.0-12.0 Å

Note: --otfcutoff must equal --short_cutoff_upper for proper graph construction.

Usage

Data Preparation

Before training, prepare your molecular dynamics data:

Convert XYZ to LMDB format:

python prepare_lmdb.py --input_path /path/to/xyz/files --output_path ./MD22/

Build molecular graphs:

python graph.py --molecule AT_AT_CG_CG --fragmentation rdkit

Verify data integrity:

python -c "from lightnp.LSRM.data import LmdbDataset; ds = LmdbDataset('./MD22/AT_AT_CG_CG/my_dataset'); print(f'Dataset size: {len(ds)}')"

Supported Components

Fragment Assignment Methods:

BRICS (rdkit) - Chemical substructure-based fragmentation
K-Means Clustering (k-means) - Distance-based clustering
Spectral Clustering (spectral) - Graph-based clustering

Supported Molecules:

Ac_Ala3_NHMe - Alanine tripeptide
DHA - Docosahexaenoic acid
stachyose - Tetrasaccharide
AT_AT - Adenine-thymine base pairs
AT_AT_CG_CG - Mixed DNA base pairs
double_walled_nanotube - Carbon nanotube structure
buckyball_catcher - Fullerene host-guest complex

Available Models:

VisNet-LSRM (Visnorm_shared_LSRMNorm2_2branchSerial) - Long-Short Range Message Passing
Equivariant Transformer (TorchMD_ET) - Transformer-based approach
PaiNN - Polarizable Atom Interaction Neural Network
Equiformer-LSRM (dot_product_attention_transformer_exp_l2_md17_lsrmserial) - E(3)-equivariant transformer with LSRM
E2Former (E2Former) - State-of-the-art E(3)-equivariant transformer with direct force prediction

Single GPU Training

Train LSR-MP on a single GPU:

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port=1230 \
  run_ddp.py \
    --datapath ./ \
    --model=Visnorm_shared_LSRMNorm2_2branchSerial \
    --molecule AT_AT_CG_CG \
    --dataset=my_dataset \
    --group_builder rdkit \
    --num_interactions=6 --long_num_layers=2 \
    --lr=0.0004 --rho_criteria=0.001 \
    --dropout=0 --hidden_channels=128 \
    --calculate_meanstd --otfcutoff=4 \
    --short_cutoff_upper=4 --long_cutoff_lower=0 --long_cutoff_upper=9 \
    --early_stop --early_stop_patience=500 \
    --no_broadcast --batch_size=16 \
    --ema_decay=0.999 --dropout=0.1 \
    --wandb --api_key [YOUR_WANDB_API_KEY]

For Equiformer-LSRM model, use:

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port=1230 \
  run_ddp.py \
    --datapath ./ \
    --model=dot_product_attention_transformer_exp_l2_md17_lsrmserial \
    --molecule AT_AT_CG_CG \
    --dataset=my_dataset \
    --group_builder rdkit \
    --num_interactions=6 --long_num_layers=2 \
    --lr=0.0004 --rho_criteria=0.001 \
    --dropout=0 --hidden_channels=128 \
    --calculate_meanstd --otfcutoff=4 \
    --short_cutoff_upper=4 --long_cutoff_lower=0 --long_cutoff_upper=9 \
    --early_stop --early_stop_patience=500 \
    --no_broadcast --batch_size=16 \
    --ema_decay=0.999 --dropout=0.1 \
    --wandb --api_key [YOUR_WANDB_API_KEY]

For E2Former model (state-of-the-art E(3)-equivariant transformer), use:

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port=1230 \
  run_ddp.py \
    --datapath ./ \
    --model=E2Former \
    --molecule AT_AT_CG_CG \
    --dataset=my_dataset \
    --group_builder rdkit \
    --num_interactions=6 \
    --lr=0.0004 --rho_criteria=0.001 \
    --dropout=0 --hidden_channels=128 \
    --calculate_meanstd --otfcutoff=4 \
    --short_cutoff_upper=4 \
    --early_stop --early_stop_patience=500 \
    --no_broadcast --batch_size=16 \
    --ema_decay=0.999 --dropout=0.1 \
    --wandb --api_key [YOUR_WANDB_API_KEY]

Note: E2Former uses direct force prediction via eSCN force blocks and does not utilize long-short range message passing. The model is based on the vanilla E2Former architecture without LSR-MP extensions.

Important: Before using E2Former, ensure you have:

Cloned the official E2Former repository: git clone https://github.com/2023huang6385/E2Former.git
Updated the path in lightnp/LSRM/models/e2former_lsrmp.py line 14 to point to your E2Former installation

Key Parameters:

--datapath: Path to your dataset directory
--model: Model architecture to use
--molecule: Target molecule system
--group_builder: Fragmentation method (rdkit, k-means, spectral)
--num_interactions: Number of interaction layers
--lr: Learning rate
--hidden_channels: Hidden layer dimensions
--batch_size: Training batch size

Multi-GPU Training

Train using Distributed Data Parallel across multiple GPUs:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port=1230 \
  run_ddp.py \
    --datapath ./ \
    --model=Visnorm_shared_LSRMNorm2_2branchSerial \
    --molecule AT_AT_CG_CG \
    --dataset=my_dataset \
    --group_builder rdkit \
    --num_interactions=6 --long_num_layers=2 \
    --lr=0.0004 --rho_criteria=0.001 \
    --dropout=0 --hidden_channels=128 \
    --calculate_meanstd --otfcutoff=4 \
    --short_cutoff_upper=4 --long_cutoff_lower=0 --long_cutoff_upper=9 \
    --early_stop --early_stop_patience=500 \
    --no_broadcast --batch_size=16 \
    --ema_decay=0.999 --dropout=0.1 \
    --wandb --api_key [YOUR_WANDB_API_KEY]

Important Notes:

--nproc_per_node must equal the number of GPUs in CUDA_VISIBLE_DEVICES
--otfcutoff must equal --short_cutoff_upper (defines short-range graph radius)
Use --wandb with your API key for experiment tracking
Recommended settings provide good MD22 benchmark performance

Testing

Evaluate a trained model:

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port=1230 run_ddp.py \
  --datapath [YOUR_DATA_PATH] \
  --model=Visnorm_shared_LSRMNorm2_2branchSerial \
  --molecule AT_AT_CG_CG \
  --dataset=[DATASET_NAME] \
  --test --restore_run [PATH_TO_TRAINED_MODEL] \
  --wandb --api_key [YOUR_WANDB_API_KEY]

Evaluation Metrics:

Force MAE - Mean absolute error for atomic forces
Energy MAE - Mean absolute error for system energy

Troubleshooting

Common Issues:

CUDA Out of Memory: Reduce --batch_size or use fewer --hidden_channels
Preprocessing Delays: Large molecular systems may require extended preprocessing time
Missing Dependencies: Ensure all packages are installed via build_env.sh

Performance Tips:

Use multiple GPUs for faster training on large datasets
Adjust cutoff parameters based on your molecular system size
Monitor training with wandb for optimal hyperparameter tuning

API Reference

Command Line Arguments

Core Arguments

# Data and Model Configuration
--datapath          # Path to dataset directory
--model             # Model architecture name
--molecule          # Target molecule system
--dataset           # Dataset identifier
--group_builder     # Fragmentation method

# Training Configuration
--lr                # Learning rate
--batch_size        # Training batch size
--num_interactions  # Number of interaction layers
--hidden_channels   # Hidden layer dimensions
--dropout           # Dropout rate

# Distributed Training  
--nproc_per_node    # Number of processes per node
--master_port       # Master port for communication

# Monitoring and Logging
--wandb             # Enable Weights & Biases logging
--api_key           # W&B API key
--log_dir           # Log directory path

Advanced Arguments

# Cutoff Parameters
--otfcutoff         # On-the-fly cutoff radius
--short_cutoff_upper # Short-range interaction cutoff
--long_cutoff_lower # Long-range interaction lower bound
--long_cutoff_upper # Long-range interaction upper bound

# Optimization
--ema_decay         # Exponential moving average decay
--early_stop        # Enable early stopping
--early_stop_patience # Early stopping patience
--rho_criteria      # Convergence criteria

# Model-Specific
--long_num_layers   # Long-range interaction layers (LSR-MP)
--calculate_meanstd # Calculate dataset statistics
--no_broadcast      # Disable parameter broadcasting

Model APIs

LSR-MP Model

from lightnp.LSRM.models.lsrm_modules import Visnorm_shared_LSRMNorm2_2branchSerial

# Initialize model
model = Visnorm_shared_LSRMNorm2_2branchSerial(
    hidden_channels=128,
    num_interactions=6,
    long_num_layers=2,
    dropout=0.1
)

# Forward pass
output = model(data)

E2Former Integration

from lightnp.LSRM.models.e2former_lsrmp import E2Former

# Initialize E2Former model
model = E2Former(
    hidden_channels=128,
    num_interactions=6
)

Data Loading APIs

LMDB Dataset

from lightnp.LSRM.data import LmdbDataset, collate_fn
from torch.utils.data import DataLoader

# Load dataset
dataset = LmdbDataset('./MD22/AT_AT_CG_CG/my_dataset')

# Create data loader
loader = DataLoader(
    dataset,
    batch_size=16,
    shuffle=True,
    collate_fn=collate_fn
)

Performance Optimization

GPU Memory Optimization

# For limited GPU memory
--batch_size 8 --hidden_channels 64 --gradient_checkpointing

# For high-memory GPUs (>24GB)
--batch_size 32 --hidden_channels 256 --mixed_precision

Development

Contributing

We welcome contributions! Please follow these guidelines:

Fork the repository
Create a feature branch: git checkout -b feature-name
Commit changes: git commit -am 'Add feature'
Test your changes
Push to branch: git push origin feature-name
Submit a pull request

Code Structure

Core Components

lightnp/LSRM/
├── models/
│   ├── lsrm_modules.py      # Main LSR-MP implementations
│   ├── e2former_lsrmp.py    # E2Former integration
│   └── nets/                # Neural network primitives
├── data/
│   ├── lmdb_dataset.py      # LMDB data handling
│   └── atoms_loader.py      # Atomic structure loading
└── utils/
    ├── build_group_graph.py # Graph construction
    └── transforms.py        # Data transformations

Design Patterns

Factory Pattern: Model creation via create_model()
Strategy Pattern: Fragmentation methods in group_builder
Observer Pattern: Training hooks and callbacks
Template Method: Base classes for models and datasets

Update documentation with model parameters and usage

Enhanced Troubleshooting

Common Issues

Installation Problems

Issue: CUDA version mismatch

# Error: RuntimeError: CUDA error: no kernel image is available
# Solution: Reinstall PyTorch with correct CUDA version
pip uninstall torch torchvision torchaudio
pip install torch==1.9.0+cu111 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu111

Issue: PyTorch Geometric compilation errors

pip install --no-cache-dir --no-binary :all: torch-scatter torch-sparse torch-cluster -f https://data.pyg.org/whl/torch-1.9.0+cu111.html

Training Issues

Issue: CUDA Out of Memory

# Error: RuntimeError: CUDA out of memory
# Solutions:
1. Reduce batch size: --batch_size 8
2. Reduce model size: --hidden_channels 64

Data Processing Issues

Issue: Graph construction errors

# Error: Invalid molecular graph
# Solutions:
1. Adjust cutoff radius: --otfcutoff 5.0
2. Use alternative fragmentation: --group_builder spectral
3. delete your processed lmdb/file path

Performance Issues

Slow Training

Check GPU utilization: nvidia-smi
Profile data loading: Add --profile_data_loading
Optimize batch size: Use largest batch that fits in memory
Enable mixed precision: --mixed_precision
Use multiple GPUs: Scale with --nproc_per_node

Memory Optimization

# Memory-efficient settings
--batch_size 4 \
--hidden_channels 64 \
--gradient_checkpointing \
--mixed_precision \
--cpu_offload

I/O Bottlenecks

# Optimize data loading
--num_workers 4 \
--pin_memory \
--prefetch_factor 2 \
--cache_data

Citation

If you use LSR-MP in your research, please consider citing:

@inproceedings{
li2024longshortrange,
title={Long-Short-Range Message-Passing: A Physics-Informed Framework to Capture Non-Local Interaction for Scalable Molecular Dynamics Simulation},
author={Yunyang Li and Yusong Wang and Lin Huang and Han Yang and Xinran Wei and Jia Zhang and Tong Wang and Zun Wang and Bin Shao and Tie-Yan Liu},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=rvDQtdMnOl}
}

Credits

This work builds upon several foundational contributions:

Equiformer

@inproceedings{
    liao2023equiformer,
    title={Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs},
    author={Yi-Lun Liao and Tess Smidt},
    booktitle={International Conference on Learning Representations},
    year={2023},
    url={https://openreview.net/forum?id=KwmPfARgOTD}
}

ViSNet

@article{wang2024enhancing,
  title={Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing},
  author={Wang, Yusong and Wang, Tong and Li, Shaoning and He, Xinheng and Li, Mingyu and Wang, Zun and Zheng, Nanning and Shao, Bin and Liu, Tie-Yan},
  journal={Nature Communications},
  volume={15},
  number={1},
  pages={313},
  year={2024},
  publisher={Nature Publishing Group UK London}
}

E2Former

@article{li2025e2former,
  title={E2Former: A Linear-time Efficient and Equivariant Transformer for Scalable Molecular Modeling},
  author={Li, Yunyang and Huang, Lin and Ding, Zhihao and Wang, Chu and Wei, Xinran and Yang, Han and Wang, Zun and Liu, Chang and Shi, Yu and Jin, Peiran and others},
  journal={arXiv preprint arXiv:2501.19216},
  year={2025}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

TorchMD-Net team for the molecular dynamics foundation
PyTorch Geometric developers for graph neural network utilities
E2Former authors for the equivariant transformer architecture
RDKit community for chemical informatics tools
MD22 dataset contributors for benchmark data
Research computing facilities for computational resources

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
checkpoints		checkpoints
lightnp		lightnp
plots		plots
.gitignore		.gitignore
README.md		README.md
build_env.sh		build_env.sh
graph.py		graph.py
lightnp_env_requirements.txt		lightnp_env_requirements.txt
prepare_lmdb.py		prepare_lmdb.py
rdkit_label_builder.py		rdkit_label_builder.py
run_ddp.py		run_ddp.py
xyz2mol.py		xyz2mol.py

liyy2/LSR-MP

Folders and files

Latest commit

History

Repository files navigation