Skip to content

Interactive visualization and clustering tool for retrosynthesis tree datasets using Tree Edit Distance (TED) and tmap

License

Notifications You must be signed in to change notification settings

raweru/synthmap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SynthMap Logo

SynthMap is a tool designed for visualization and clustering of retrosynthesis tree datasets. It takes in AiZynthFinder output files, and first calculates Tree Edit Distance (TED) to measure similarity between all trees. This similarity data is then used to generate a tmap layout, which forms the basis for an interactive HTML visualization, allowing for intuitive exploration and analysis of synthesis strategies.

Features

  • Interactive Visualizations: Create beautiful HTML maps of your synthesis routes
  • Smart Clustering: Group similar routes using Tree Edit Distance (TED)
  • Customizable Views: Toggle tree visibility, adjust colors, and scale points
  • AiZynthFinder Integration: Works seamlessly with .json.gz output files

Installation

  1. Clone the repository:

    git clone https://github.com/raweru/synthmap.git
    cd synthmap
  2. Create and activate Conda environment: Make sure you have Anaconda or Miniconda installed. Then create the environment using the appropriate file for your operating system:

    • Windows:
      conda env create -f env_win.yml
      conda activate synthmap
    • Linux:
      conda env create -f env_linux.yml
      conda activate synthmap

    Note: This tool has been tested on Windows 11 and Red Hat Linux.

Usage

Run the main script, providing the path to your AiZynthFinder output file:

python synthmap.py <path/to/aizynthfinder_output.json.gz> [OPTIONS]

Example:

python synthmap.py output.json.gz --ted-threshold 3.0 --output my_visualization

Options:

Options:

  • --ted-threshold: Maximum TED for connecting trees (default: 3.0). Not used if --visualize-all is set.
  • --ted-mode: TED calculation mode for clustering or distance calculation (default: "shape")
    • shape: Considers only tree structure and node types
    • classification_aware: Considers reaction classifications for finer-grained similarity
  • --visualize-all: Visualize all trees in a global tmap layout without explicit TED threshold-based clustering
  • --output: Output file name prefix (without .html extension). Can include relative/absolute path
  • --bg-color: Background color for the Faerun plot (hex code)
  • --title: Title for the Faerun visualization
  • --point-scale: Scale factor for plotted points
  • --max-point-size: Maximum size of plotted points

Point Scaling Recommendations:

  • For small datasets (around 100 trees): use --point-scale 10 --max-point-size 100
  • For medium datasets: use smaller values, keeping max-point-size approximately 10x point-scale
  • For large datasets: use --point-scale 1 --max-point-size 10

Example Output

Here's an example of the interactive visualization generated by SynthMap:

Example SynthMap Output

You can also explore pre-generated example visualizations html files in /examples folder in your web browser. Note that the .js files in the same folder are required for the visualization to work properly - make sure to keep them together with their corresponding HTML files.

License

This project is licensed under the MIT License. See the LICENSE.md file for details. It incorporates code derived from the original tmap library, available at https://github.com/reymond-group/tmap.

About

Interactive visualization and clustering tool for retrosynthesis tree datasets using Tree Edit Distance (TED) and tmap

Resources

License

Stars

Watchers

Forks

Packages

No packages published