MIST: Molecular Insight SMILES Transformer

MIST is a family of molecular foundation models for molecular property prediction. The models were pre-trained on Smirk 😏 tokenized SMILES strings from the Enamine REAL Space dataset using the Masked Language Modeling (MLM) objective, then fine-tuned for downstream prediction tasks.

Installation

The following provides installation instructions for the top-level package (electrolyte_fm), optional add-ons for our various additional analysis and downstream applications (See ./opt may require additional configuration.

Install uv and julia (only needed for /opt tasks)
Instantiate the environment: uv sync
Use submit/submit.py to submit a training job or checkout one of our applications in ./opt

You may need to install rust if pre-built wheels for smirk are not available on PyPI. Feel free to open an issue to request additional pre-built wheels.

Polaris

Install rust and uv
Load conda

module purge
module use /soft/modulefiles/
module --ignore_cache load conda/2024-04-29
conda activate base

Install the environment

uv sync

Artemis

Same as above except:

Skip loading conda (just use uv)
Ensure a module for CUDA@12.2 exists, may need to install with spack (make sure buildable: True)

Apptainer

Install or load from a module Apptainer
Build the image bash container/build.sh, once build relocate the image mv /tmp/mist.sif ./mist.sif
Run training within the image apptainer run --nv mist.sif python train.py ...

See submit/dgx.j2 or submit/delta.j2 for a more complete example of using the container

Submitting Jobs

We use a python script (submit/submit.py) to template training jobs for submission on HPC systems across multiple sites. Templates may need to be modified for your particular HPC cluster, but should provide a starting point.

source ./activate # Activate Environment
./submit/submit.py ./submit/polaris.j2 --data ./submit/pretrain.yaml | qsub

See submit/submit.py --help for more info

Note: ./activate is used to activate the python virtual environment and set various environment variables.

Development

Pre-commit

We use pre-commit to preform various linting checks on the code. To enable:

Install poetry (See above)
Run pre-commit: uv run pre-commit
Run before committing: uv run pre-commit install --allow-missing-config

Name		Name	Last commit message	Last commit date
Latest commit History 1,789 Commits
.github/workflows		.github/workflows
container		container
electrolyte_fm		electrolyte_fm
opt		opt
submit		submit
test		test
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md
activate		activate
mist.def		mist.def
pyproject.toml		pyproject.toml
spack.yaml		spack.yaml
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIST: Molecular Insight SMILES Transformer

Installation

Polaris

Artemis

Apptainer

Submitting Jobs

Development

Pre-commit

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MIST: Molecular Insight SMILES Transformer

Installation

Polaris

Artemis

Apptainer

Submitting Jobs

Development

Pre-commit

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages