Skip to content

BattModels/mist

Repository files navigation

MIST: Molecular Insight SMILES Transformer

GitHub License arXiv:2409.15370 Model on HF

MIST is a family of molecular foundation models for molecular property prediction. The models were pre-trained on Smirk 😏 tokenized SMILES strings from the Enamine REAL Space dataset using the Masked Language Modeling (MLM) objective, then fine-tuned for downstream prediction tasks.

Installation

The following provides installation instructions for the top-level package (electrolyte_fm), optional add-ons for our various additional analysis and downstream applications (See ./opt may require additional configuration.

  1. Install uv and julia (only needed for /opt tasks)
  2. Instantiate the environment: uv sync
  3. Use submit/submit.py to submit a training job or checkout one of our applications in ./opt

You may need to install rust if pre-built wheels for smirk are not available on PyPI. Feel free to open an issue to request additional pre-built wheels.

Polaris

  1. Install rust and uv

  2. Load conda

module purge
module use /soft/modulefiles/
module --ignore_cache load conda/2024-04-29
conda activate base
  1. Install the environment
uv sync

Artemis

Same as above except:

  1. Skip loading conda (just use uv)
  2. Ensure a module for CUDA@12.2 exists, may need to install with spack (make sure buildable: True)

Apptainer

  1. Install or load from a module Apptainer
  2. Build the image bash container/build.sh, once build relocate the image mv /tmp/mist.sif ./mist.sif
  3. Run training within the image apptainer run --nv mist.sif python train.py ...

See submit/dgx.j2 or submit/delta.j2 for a more complete example of using the container

Submitting Jobs

We use a python script (submit/submit.py) to template training jobs for submission on HPC systems across multiple sites. Templates may need to be modified for your particular HPC cluster, but should provide a starting point.

source ./activate # Activate Environment
./submit/submit.py ./submit/polaris.j2 --data ./submit/pretrain.yaml | qsub

See submit/submit.py --help for more info

Note: ./activate is used to activate the python virtual environment and set various environment variables.

Development

Pre-commit

We use pre-commit to preform various linting checks on the code. To enable:

  1. Install poetry (See above)
  2. Run pre-commit: uv run pre-commit
  3. Run before committing: uv run pre-commit install --allow-missing-config