Steps for training/running the models

Setting up the environment

Modify the prefix in environment.yaml to point to the location where the environment will be stored.
Create a conda environment using

conda env create -f environment.yaml
Activate the environment

conda activate subg_matching

Training a model

Navigate to scripts/. You are supposed to run the run_large_dataset.sh or run_new_dataset.sh scripts.
Set the gpus variable to indicate a tuple of all the GPU indices available for the experiment. If just GPU 2 is available, set it to (2). Any non-zero length for the list works.
Run bash run_large_dataset.sh on the command line. This will start training the model. The model evaluates on the test dataset at the end of training by default.
Note that in our original codebase, we have used wandb to manage and monitor runs. However, we have set WANDB_MODE=disabled in the bash script since we don't expect every user to be familiar with wandb. In case the user has experience using it, the WANDB_MODE=disabled part of the command can be deleted, so that it starts as such - CUDA_VISIBLE_DEVICES=...
Results will be stored in the <experiment_dir>/<experiment_id> directory, which in this case is experiments/rqX_custom_models. This includes trained models, partial configs and logs. The train/validation scores are printed at every epoch in the corresponding log file, and the test score is evaluated at the end of training.

Additional files

We provide additional files here - https://rebrand.ly/dessub.

Model names

Some models have a different naming convention in the codebase than in the paper.
Models discussed in the main text
1. Our-Early-Best - configs/edge_models/scoring=sinkhorn_pp=hinge___tp=sinkhorn_pp=hinge_when=post___unify=true.yaml
2. Our-Late-Best - configs/edge_models/scoring=sinkhorn_pp=hinge___tp=none.yaml
3. GMN adaptation - configs/node_models/scoring=attention_pp=identity___tp=attention_pp=identity_when=post.yaml
4. IsoNet adapation - configs/edge_models/scoring=sinkhorn_pp=lrl___tp=none.yaml

Other details

AIDS, MUTAG, PTC-FM, PTC-FR, PTC-MM and PTC-MR are under the directory large_dataset while NCI-H23H, MOLT-4H, MCF-7H and MSRC-21 come under new_dataset.
Training GraphSim for new datasets requires the conv_pool_size: [4,4,3,3] line to be active while for large datasets, conv_pool_size: [3,3,2,2] should be chosen.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
GMN		GMN
configs		configs
large_dataset/splits		large_dataset/splits
new_dataset/splits		new_dataset/splits
scripts		scripts
subgraph_matching		subgraph_matching
utils		utils
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Steps for training/running the models

Setting up the environment

Training a model

Additional files

Model names

Other details

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Steps for training/running the models

Setting up the environment

Training a model

Additional files

Model names

Other details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages