From d0920a000814a1f5216518af40030211902b01f1 Mon Sep 17 00:00:00 2001 From: Josh Horton Date: Fri, 24 Jan 2025 13:39:56 +0000 Subject: [PATCH 1/6] add info on parallel execution and gathering --- docs/guide/execution/quickrun_execution.rst | 62 +++++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/docs/guide/execution/quickrun_execution.rst b/docs/guide/execution/quickrun_execution.rst index 30987a096..c0b675969 100644 --- a/docs/guide/execution/quickrun_execution.rst +++ b/docs/guide/execution/quickrun_execution.rst @@ -44,6 +44,68 @@ The ``quickrun`` command can be integrated into as: openfe quickrun transformation.json -o results.json +Parallel execution of repeats with Quickrun +=========================================== + +Serial execution of multiple repeats of a transformation can be inefficient when working with a HPC, in this case higher +throughput can be achieved by running one repeat per HPC job allowing for parallel execution. Most protocols are setup to +run ``three repeats in seral`` by default, however, this can be changed via the protocol setting ``protocol_repeats``, see the +:ref:`protocol configuration guide ` for more details. Each transformation can then be executed +multiple times via the ``openfe quickrun`` command to produce a set of repeats, however, you need to ensure to use unique results +files for each repeat to ensure they don't overwrite each other. We recommend using folders named ``results_x`` where x is 0-2 +to store the repeated calculations as our :ref:`openfe gather ` command also supports this file structure. + +Here is an example of a simple script that will create and submit a separate job script (\*.job named file) +for every alchemical transformation (for the simplest SLURM use case) in a network running each repeat in parallel and writing the +results to a unique folder: + +.. code-block:: bash + + for file in network_setup/transformations/*.json; do + relpath="${file:30}" # strip off "network_setup/" + dirpath=${relpath%.*} # strip off final ".json" + jobpath="network_setup/transformations/${dirpath}.job" + if [ -f "${jobpath}" ]; then + echo "${jobpath} already exists" + exit 1 + fi + for repeat in {0..2}; do + cmd="openfe quickrun ${file} -o results_${repeat}/${relpath} -d results_${repeat}/${dirpath}" + echo -e "#!/usr/bin/env bash\n${cmd}" > "${jobpath}" + sbatch "${jobpath}" + done + done + +This should result in the following file structure after execution: + +:: + + results_parallel/ + ├── results_0 + │   ├── easy_rbfe_lig_ejm_31_complex_lig_ejm_42_complex + │   │   └── shared_RelativeHybridTopologyProtocolUnit-79c279f04ec84218b7935bc0447539a9_attempt_0 + │   │   ├── checkpoint.nc + │   │   ├── simulation.nc + │   ├── easy_rbfe_lig_ejm_31_complex_lig_ejm_42_complex.json + ├── results_1 + │   ├── easy_rbfe_lig_ejm_31_complex_lig_ejm_42_complex + │   │   └── shared_RelativeHybridTopologyProtocolUnit-a3cef34132aa4e9cbb824fcbcd043b0e_attempt_0 + │   │   ├── checkpoint.nc + │   │   ├── simulation.nc + │   ├── easy_rbfe_lig_ejm_31_complex_lig_ejm_42_complex.json + └── results_2 + ├── easy_rbfe_lig_ejm_31_complex_lig_ejm_42_complex + │   └── shared_RelativeHybridTopologyProtocolUnit-abb2b104151c45fc8b0993fa0a7ee0af_attempt_0 + │   ├── checkpoint.nc + │   ├── simulation.nc + └── easy_rbfe_lig_ejm_31_complex_lig_ejm_42_complex.json + +The results of which can be gathered from the CLI using the ``openfe gather`` command, in this case you should direct +it to the root directory which includes the repeat results and it will automatically collate the information + +:: + + openfe gather results_parallel See Also -------- From b0405cf16bdc90293ef7a9db2eff3bc2a259f8ca Mon Sep 17 00:00:00 2001 From: Josh Horton Date: Fri, 24 Jan 2025 13:49:41 +0000 Subject: [PATCH 2/6] add --n-protocol-repeats flag info --- docs/guide/execution/quickrun_execution.rst | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/guide/execution/quickrun_execution.rst b/docs/guide/execution/quickrun_execution.rst index c0b675969..0ea977af2 100644 --- a/docs/guide/execution/quickrun_execution.rst +++ b/docs/guide/execution/quickrun_execution.rst @@ -49,9 +49,10 @@ Parallel execution of repeats with Quickrun Serial execution of multiple repeats of a transformation can be inefficient when working with a HPC, in this case higher throughput can be achieved by running one repeat per HPC job allowing for parallel execution. Most protocols are setup to -run ``three repeats in seral`` by default, however, this can be changed via the protocol setting ``protocol_repeats``, see the -:ref:`protocol configuration guide ` for more details. Each transformation can then be executed -multiple times via the ``openfe quickrun`` command to produce a set of repeats, however, you need to ensure to use unique results +run ``three repeats in seral`` by default, however, this can be changed either via the protocol setting ``protocol_repeats``, see the +:ref:`protocol configuration guide ` for more details or overridden via the +``--n-protocol-repeats`` flag of the ``openfe quickrun`` command. Each transformation can then be executed multiple times via the +``openfe quickrun`` command to produce a set of repeats, however, you need to ensure to use unique results files for each repeat to ensure they don't overwrite each other. We recommend using folders named ``results_x`` where x is 0-2 to store the repeated calculations as our :ref:`openfe gather ` command also supports this file structure. @@ -70,7 +71,7 @@ results to a unique folder: exit 1 fi for repeat in {0..2}; do - cmd="openfe quickrun ${file} -o results_${repeat}/${relpath} -d results_${repeat}/${dirpath}" + cmd="openfe quickrun ${file} -o results_${repeat}/${relpath} -d results_${repeat}/${dirpath} --n-protocol-repeats 1" echo -e "#!/usr/bin/env bash\n${cmd}" > "${jobpath}" sbatch "${jobpath}" done From 7e320e46303ec0cbcfc6cb738daa7adb39c23d37 Mon Sep 17 00:00:00 2001 From: Josh Horton Date: Wed, 29 Jan 2025 11:10:29 +0000 Subject: [PATCH 3/6] Update docs/guide/execution/quickrun_execution.rst Co-authored-by: Alyssa Travitz <31974495+atravitz@users.noreply.github.com> --- docs/guide/execution/quickrun_execution.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/guide/execution/quickrun_execution.rst b/docs/guide/execution/quickrun_execution.rst index 0ea977af2..08843843f 100644 --- a/docs/guide/execution/quickrun_execution.rst +++ b/docs/guide/execution/quickrun_execution.rst @@ -47,8 +47,8 @@ The ``quickrun`` command can be integrated into as: Parallel execution of repeats with Quickrun =========================================== -Serial execution of multiple repeats of a transformation can be inefficient when working with a HPC, in this case higher -throughput can be achieved by running one repeat per HPC job allowing for parallel execution. Most protocols are setup to +Serial execution of multiple repeats of a transformation can be inefficient when simulation times are long. +Higher throughput can be achieved with parallel execution by running one repeat per HPC job. Most protocols are set up to run ``three repeats in seral`` by default, however, this can be changed either via the protocol setting ``protocol_repeats``, see the :ref:`protocol configuration guide ` for more details or overridden via the ``--n-protocol-repeats`` flag of the ``openfe quickrun`` command. Each transformation can then be executed multiple times via the From c4d102f363eb3132f9180dc1bf0189f2cc599fab Mon Sep 17 00:00:00 2001 From: Josh Horton Date: Wed, 29 Jan 2025 11:10:36 +0000 Subject: [PATCH 4/6] Update docs/guide/execution/quickrun_execution.rst Co-authored-by: Alyssa Travitz <31974495+atravitz@users.noreply.github.com> --- docs/guide/execution/quickrun_execution.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/guide/execution/quickrun_execution.rst b/docs/guide/execution/quickrun_execution.rst index 08843843f..c1664ec49 100644 --- a/docs/guide/execution/quickrun_execution.rst +++ b/docs/guide/execution/quickrun_execution.rst @@ -49,9 +49,10 @@ Parallel execution of repeats with Quickrun Serial execution of multiple repeats of a transformation can be inefficient when simulation times are long. Higher throughput can be achieved with parallel execution by running one repeat per HPC job. Most protocols are set up to -run ``three repeats in seral`` by default, however, this can be changed either via the protocol setting ``protocol_repeats``, see the -:ref:`protocol configuration guide ` for more details or overridden via the -``--n-protocol-repeats`` flag of the ``openfe quickrun`` command. Each transformation can then be executed multiple times via the +run three repeats in serial by default, but this can be changed by either: + + 1. Defining the protocol setting ``protocol_repeats`` - see the :ref:`protocol configuration guide ` for more details. + 2. Using the ``openfe plan-rhfe-network`` (or ``plan-rbfe-network``) command line flag ``--n-protocol-repeats``. Each transformation can then be executed multiple times via the ``openfe quickrun`` command to produce a set of repeats, however, you need to ensure to use unique results files for each repeat to ensure they don't overwrite each other. We recommend using folders named ``results_x`` where x is 0-2 to store the repeated calculations as our :ref:`openfe gather ` command also supports this file structure. From 4caf47c1731e826d24d9e9fb6b57d91a1022b6ea Mon Sep 17 00:00:00 2001 From: Josh Horton Date: Wed, 29 Jan 2025 11:11:06 +0000 Subject: [PATCH 5/6] Update docs/guide/execution/quickrun_execution.rst Co-authored-by: Alyssa Travitz <31974495+atravitz@users.noreply.github.com> --- docs/guide/execution/quickrun_execution.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/guide/execution/quickrun_execution.rst b/docs/guide/execution/quickrun_execution.rst index c1664ec49..1fc8cc686 100644 --- a/docs/guide/execution/quickrun_execution.rst +++ b/docs/guide/execution/quickrun_execution.rst @@ -57,7 +57,7 @@ run three repeats in serial by default, but this can be changed by either: files for each repeat to ensure they don't overwrite each other. We recommend using folders named ``results_x`` where x is 0-2 to store the repeated calculations as our :ref:`openfe gather ` command also supports this file structure. -Here is an example of a simple script that will create and submit a separate job script (\*.job named file) +Here is an example of a simple script that will create and submit a separate job script (``\*.job`` named file) for every alchemical transformation (for the simplest SLURM use case) in a network running each repeat in parallel and writing the results to a unique folder: From 8552fff0a3615017d2fedca28536b1b85f29f7bc Mon Sep 17 00:00:00 2001 From: Josh Horton Date: Wed, 29 Jan 2025 11:18:47 +0000 Subject: [PATCH 6/6] format list --- docs/guide/execution/quickrun_execution.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/guide/execution/quickrun_execution.rst b/docs/guide/execution/quickrun_execution.rst index 1fc8cc686..f306696d6 100644 --- a/docs/guide/execution/quickrun_execution.rst +++ b/docs/guide/execution/quickrun_execution.rst @@ -52,7 +52,9 @@ Higher throughput can be achieved with parallel execution by running one repeat run three repeats in serial by default, but this can be changed by either: 1. Defining the protocol setting ``protocol_repeats`` - see the :ref:`protocol configuration guide ` for more details. - 2. Using the ``openfe plan-rhfe-network`` (or ``plan-rbfe-network``) command line flag ``--n-protocol-repeats``. Each transformation can then be executed multiple times via the + 2. Using the ``openfe plan-rhfe-network`` (or ``plan-rbfe-network``) command line flag ``--n-protocol-repeats``. + +Each transformation can then be executed multiple times via the ``openfe quickrun`` command to produce a set of repeats, however, you need to ensure to use unique results files for each repeat to ensure they don't overwrite each other. We recommend using folders named ``results_x`` where x is 0-2 to store the repeated calculations as our :ref:`openfe gather ` command also supports this file structure.