MIST is a family of molecular foundation models for molecular property prediction. The models were pre-trained on Smirk 😏 tokenized SMILES strings from the Enamine REAL Space dataset using the Masked Language Modeling (MLM) objective, then fine-tuned for downstream prediction tasks.
The following provides installation instructions for the top-level package (electrolyte_fm), optional add-ons for our
various additional analysis and downstream applications (See ./opt may require additional configuration.
- Install uv and julia (only needed for
/opttasks) - Instantiate the environment:
uv sync - Use
submit/submit.pyto submit a training job or checkout one of our applications in./opt
You may need to install rust if pre-built wheels for smirk are not available on PyPI. Feel free to open an issue to request additional pre-built wheels.
module purge
module use /soft/modulefiles/
module --ignore_cache load conda/2024-04-29
conda activate base- Install the environment
uv syncSame as above except:
- Skip loading conda (just use uv)
- Ensure a module for CUDA@12.2 exists, may need to install with spack (make sure
buildable: True)
- Install or load from a module Apptainer
- Build the image
bash container/build.sh, once build relocate the imagemv /tmp/mist.sif ./mist.sif - Run training within the image
apptainer run --nv mist.sif python train.py ...
See
submit/dgx.j2orsubmit/delta.j2for a more complete example of using the container
We use a python script (submit/submit.py) to template training jobs for submission on HPC systems across multiple sites.
Templates may need to be modified for your particular HPC cluster, but should provide a starting point.
source ./activate # Activate Environment
./submit/submit.py ./submit/polaris.j2 --data ./submit/pretrain.yaml | qsubSee submit/submit.py --help for more info
Note: ./activate is used to activate the python virtual environment and set various environment variables.
We use pre-commit to preform various linting checks on the code. To enable:
- Install poetry (See above)
- Run pre-commit:
uv run pre-commit - Run before committing:
uv run pre-commit install --allow-missing-config