To install via conda, it is available on the matsengrp channel:
conda install -c matsengrp larch-phyloCurrently only available in Linux.
- GCC 7.5
- cmake 3.16
- Boost 1.85
For Ubuntu 18.04 LTS the following commands installs the requirements:
sudo apt install --no-install-recommends git git-lfs cmake make g++ mpi-default-dev libprotobuf-dev libboost-dev libboost-program-options-dev libboost-filesystem-dev libboost-iostreams-dev libboost-date-time-dev protobuf-compiler automake autoconf libtool nasmTo get a recent cmake, download from https://cmake.org/download/, for example:
wget https://github.com/Kitware/CMake/releases/download/v3.23.1/cmake-3.23.1-linux-x86_64.tar.gz- singularity 3.5.3
- conda 22.9.0
Larch can be built utilizing a Singularity container or a Conda environment.
To build Singularity image, use the definition provided:
singularity build larch-singularity.sif larch-singularity.def
singularity shell larch-singularity.sif --netTo setup a conda environment capable of building Larch, create larch using the standard environment file provided:
conda env create -f environment.ymlTo setup a conda environment capable of building Larch including development tools, create larch-dev using the development environment file provided:
conda env create -f environment-dev.ymlThere are 4 executables that are built automatically as part of the larch package and provide various methods for exploring tree space and manipulating DAGs/trees:
larch-testis the suite of tests used to validate the various routines.larch-usheris a tool that takes an input tree/DAG and explores tree space through SPR moves.larch-dagutilis a utility that manipulates (e.g. merge, prune) or inspects DAGs/trees.larch-dag2dotis a utility that writes a DAG to a DOT file format for easier viewing.
Note: If you run against memory limitations during the cmake step, you can regulate number of parallel threads with export CMAKE_NUM_THREADS="8" (reduce number as necessary).
To build all from larch/ directory, run:
git submodule update --init --recursive
mkdir build
cd build
cmake ..
make -j16
# optionally, to install outside of build directory
make installCmake build options:
- add
-DMAKE_BUILD_TYPE=Debugto build in debug mode.-DMAKE_BUILD_TYPE=Releaseis enabled by default. - add
-DCMAKE_CXX_CLANG_TIDY="clang-tidy"to enable clang-tidy. - add
-DUSE_ASAN=yesto enable asan and ubsan. - add
-DCMAKE_INSTALL_PREFIX=path/to/installto select install location. By default, this will perform a system-wide installation. To install in current conda environment, use-DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX.
For all tools in this suite, a number of file formats are supported for loading and storing MATs and MADAGs. When passing filepaths as arguments, the file format can be explicitly specified with --input-format/--output-format options. Alternatively, the program can infer the file format when filepath contains a recognized file extension.
File format options:
MADAG dagbinSupported as input and output.*.dagbinis the recognized extension.MADAG protobufSupported as input and output.*.pb_dagis the recognized extension, or using*.pbWITHOUT a--MAT-refseq-fileoption.MAT protobufSupported as input only.*.pb_treeis the recognized extension, or using*.pbWITH a--MAT-refseq-fileoption.MADAG jsonSupported as input only.*.json_dagor*.jsonis the recognized extension.
From the larch/build/bin directory:
ln -s ../../data
./larch-testPassing nocatch to the tests executable will allow exceptions to escape, which is useful for debugging. A gdb session can be started with gdb --args build/larch-test nocatch.
larch-test options:
nocatchallows test exceptions to escape, which is useful for debugging. A gdb session can be started withgdb --args build/larch-test nocatch.--listproduces a list of all available tests, along with an ID number.--rangeruns tests by ID with a string of comma-separated range or single ID arguments [e.g. 1-5,7,9,12-13].-tagexcludes tests with a given tag.+tagincludes tests with a given tag.- For example, the
-tag "slow"removes tests which require an long runtime to complete.
From the larch/build/bin directory:
./larch-usher -i ../data/testcase/tree_1.pb.gz -o output_dag.pb -c 10This command runs 10 iterations of larch-usher on the provided tree, and writes the final result to the file output_dag.pb
larch-usher options:
-i,--input[REQUIRED] Filepath to the input tree/DAG (accepted file formats are: MADAG protobuf, MAT protobuf, JSON, Dagbin).-o,--output[REQUIRED] Filepath to the output tree/DAG (accepted file formats are: MADAG protobuf, Dagbin).-c,--count[Default: 1] Number of larch-usher iterations to run.-r,--MAT-refseq-file[REQUIRED if provided input file is a MAT protobuf] Filepath to json reference sequence.-v,--VCF-input-fileFilepath to VCF containing ambiguous sequence data.-l,--logpath[Default:optimization_log] Filepath to write summary log.-s,--switch-subtrees[Default: never] Switch to optimizing subtrees after the specified number of iterations.--min-subtree-clade-size[Default: 100] The minimum number of leaves in a subtree sampled for optimization (ignored without option-s).--max-subtree-clade-size[Default: 1000] The maximum number of leaves in a subtree sampled for optimization (ignored without option-s).--move-coeff-nodes[Default: 1] New node coefficient for scoring moves. Set to 0 to apply only parsimony-optimal SPR moves.--move-coeff-pscore[Default: 1] Parsimony score coefficient for scoring moves. Set to 0 to apply only topologically novel SPR moves.--sample-method[Default:parsimony] Select method for sampling optimization tree from the DAG. Options are: (parsimony,random,rf-minsum,rf-maxsum).--sample-uniformly[Default: use natural distribution] Use a uniform distribution to sample trees for optimization.- For example, if the sampling method is
parsimonyand--sample-uniformlyis provided, then a uniform distribution on parsimony-optimal trees is sampled from. --callback-option[Default:best-moves] Specify which SPR moves are chosen and applied. Options are: (all-moves,best-moves-fixed-tree,best-moves-treebased,best-moves).--trim[Default: do not trim] Trim optimized dag to contain only parsimony-optimal trees before writing to protobuf.--keep-fragment-uncollapsed[Default: collapse] Do not collapse empty (non-mutation-bearing) edges in the optimization tree.--quiet[Default: write intermediate files] Do not write intermediate protobuf file at each iteration.--input-format[Default: format inferred by file extension] Specify the format of the input file. Options are: (dagbin,pb,dag-pb,tree-pb,json,dag-json)--output-format[Default: format inferred by file extension] Specify the format of the output file. Options are: (dagbin,pb,dag-pb)-SEnable smart stopping: larch-usher will terminate when parsimony improvement ceases to occur.-Tspecify a hard time limit after which larch-usher will terminate.--ignore-root-edge-mutationslarch-usher will ignore the contribution that edges directly descending from the UA node contribute to parsimony score.
From the larch/build/bin directory:
./larch-dagutil -i ../data/testcase/tree_1.pb.gz -i ../data/testcase/tree_2.pb.gz -o merged_trees.pbThis executable takes a list of protobuf files and merges the resulting DAGs together into one.
There is some non-determinism in parsimony score that can happen when merging multiple DAGs on the same ambiguous leafset without providing a VCF. The larch-dagutil implementation can merge multiple DAGs whose leafsets contain matching sampleIds into a single DAG, but the protobuf format only stores edge mutations, which are fully disambiguated. So the ambiguities are recovered by passing a VCF file to the program. When a VCF is not supplied, the overall parsimony score of the merged DAG is not well-defined. This is because the nodes are added in parallel, and so the disambiguation assigned to any given leaf node is determined by the order in which the parallel algorithm accesses the leaves from each DAG. So the disambiguation for each leaf is based on a random choice of the trees from which the DAG is constructed, and is not necessarily consistent with the disambiguation for its sister leaves.
dag-util options:
-i,--inputFilepath to the input Tree/DAG (accepted file formats are: MADAG protobuf, MAT protobuf, JSON, Dagbin).-o,--output[Default: does not print output] Filepath to the output Tree/DAG (accepted file formats are: MADAG protobuf, Dagbin).-r,--MAT-refseq-file[REQUIRED if input protobufs are MAT protobuf format] Filepath to json reference sequence.-t,--trimTrim output (Default trimming method is trim to best parsimony).--rfTrim output to minimize RF distance to the provided DAG file (Ignored if-tflag is not provided).-s,--sampleWrite a sampled single tree from DAG to file, rather than the whole DAG.--dag-infoPrint stats about the DAG (tree count, all parsimony scores, all RF distances)--parsimonyPrint all parsimony scores.--sum-rf-distancePrint all sum RF distances.--input-format[Default: format inferred by file extension] Specify the format of the input file(s). Options are: (dagbin,pb,dag-pb,tree-pb,json,dag-json)--output-format[Default: format inferred by file extension] Specify the format of the output file. Options are: (dagbin,pb,dag-pb)--rf-format[Default: format inferred by file extension] Specify the format of the RF file. Options are: (dagbin,pb,dag-pb,tree-pb,json,dag-json)
From the larch/build/bin directory:
./larch-dag2dot -i ../data/testcase/full_dag.pbThis command writes the provided DAG in dot format to stdout.
dag2dot options:
-i,--inputFilepath to the input Tree/DAG (accepted file formats are: MADAG protobuf, MAT protobuf, JSON, Dagbin).-o,--output[Default: DOT written to stdout] Filepath to the output DOT file.--input-format[Default: format inferred by file extension] Specify the format of the input file. Options are: (dagbin,pb,dag-pb,tree-pb,json,dag-json)--dag/--tree[REQUIRED if file extension is *.pb] Specify whether input file is a DAG or a Tree.
- Lohmann, N. (2022). JSON for Modern C++ (Version 3.10.5) [Computer software]. https://github.com/nlohmann
- Eric Niebler. Range library for C++14/17/20. https://github.com/ericniebler/range-v3