MontgomeryLab · taimontgomery · Oct 19, 2022 · Oct 14, 2022 · Oct 14, 2022 · Oct 14, 2022
diff --git a/README.md b/README.md
@@ -95,9 +95,8 @@ tiny get-template
 | Reference annotations<br/>[(example)](START_HERE/reference_data/ram1.gff3) | GFF3 / GFF2 / GTF | Column 9 attributes (defined as "tag=value" or "tag "):<ul><li>Each feature must have an `ID` or `gene_id`  or `Parent` tag (referred to as `ID` henceforth).</li><li>Feature classes can be defined with the `Class` tag. If undefined, the default value \__UNKNOWN_\_ will be used.</li><li>Discontinuous features must be defined with the `Parent` tag whose value is the logical parent's `ID`, or by sharing the same `ID`.</li><li>Attribute values containing commas must represent lists.</li><li>`Parent` tags with multiple values are not yet supported.</li><li>See the example link (left) for col. 9 formatting.</li></ul> |
 | Sequencing data<br/>[(example)](START_HERE/fastq_files)                    | FASTQ(.gz)        | Files must be demultiplexed.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
 | Reference genome<br/>[(example)](START_HERE/reference_data/ram1.fa)        | FASTA             | Chromosome identifiers (e.g. Chr1): <ul><li>Must match your reference annotation file chromosome identifiers</li><li>Are case sensitive</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
-| Bowtie indexes (optional) <sup>1</sup>                                     | ebwt              | Must be small indexes (.ebwtl indexes are not supported)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
 
-<br/><sup>1</sup> Bowtie indexes can be created for you. See the [configuration file documentation](doc/Configuration.md#building-bowtie-indexes).
+
 
 ### Running an End-to-End Analysis
 In most cases you will use this toolset as an end-to-end pipeline. This will run a full, standard small RNA sequencing data analysis according to your configuration file. Before starting, you will need the following:

diff --git a/START_HERE/TUTORIAL.md b/START_HERE/TUTORIAL.md
@@ -35,9 +35,7 @@ The output you see on your terminal is from `cwltool`, which coordinates the exe
 When the analysis is complete you'll notice a new folder has appeared whose name contains the date and time of the run. Inside you'll find subdirectories containing the file and terminal outputs for each step, and the processed Run Config file for auto-documentation of the run.
 
 ### Bowtie indexes
-Bowtie indexes were built during this run because paths.yml didn't define an `ebwt` prefix. Now, you'll see the `ebwt` points to the freshly built indexes in your run directory. This means that indexes won't be rebuilt during any subsequent runs that use this `paths.yml` file. If you need to rebuild your indexes:
-1. Change the value of ebwt to `ebwt: ''` in paths.yml
-2. Ensure that your Run Config file contains `run_bowtie_build: True`
+Bowtie indexes were built during this run because `paths.yml` didn't define an `ebwt` prefix. Now, you'll see the `ebwt` points to the freshly built indexes in your run directory. This means that indexes won't be rebuilt during any subsequent runs that use this `paths.yml` file. If you need to rebuild your indexes, simply delete the value to the right of `ebwt` in paths.yml
 
 ## Running Your Data
 Expected runtime: ~10-60 minutes (expect longer runtimes if a bowtie index must be built)

diff --git a/START_HERE/paths.yml b/START_HERE/paths.yml
@@ -23,9 +23,8 @@ tmp_directory:
 ######-------------------------------- BOWTIE-BUILD ---------------------------------######
 #
 # To build bowtie indexes:
-#   1. Your Run Config file must contain run_bowtie_build: true
-#   2. Your reference genome file(s) must be listed under reference_genome_files (below)
-#   3. ebwt (below) must be an empty string, or ''
+#   1. Your reference genome file(s) must be listed under reference_genome_files (below)
+#   2. ebwt (below) must be empty (nothing after ":")
 #
 # Once your indexes have been built, this config file will be modified such
 # that ebwt points to their location (prefix) within your Run Directory. This

diff --git a/START_HERE/run_config.yml b/START_HERE/run_config.yml
@@ -28,10 +28,6 @@ paths_config: ./paths.yml
 ##-- If none provided, the default of user_tinyrna will be used --##
 run_name: tinyrna
 
-##-- If True: run bowtie-build before analyzing libraries --##
-##-- NOTE: this option may be ignored depending on your Paths file. See Paths file. --##
-run_bowtie_build: True
-
 ##-- Number of threads to use when a step supports multi-threading --##
 ##-- For best performance, this should be equal to your computer's processor core count --##
 threads: 4
@@ -334,6 +330,7 @@ run_directory: ~
 tmp_directory: ~
 features_csv: { }
 samples_csv: { }
+run_bowtie_build: false
 reference_genome_files: [ ]
 plot_style_sheet: ~
 adapter_fasta: ~

diff --git a/doc/Configuration.md b/doc/Configuration.md
@@ -95,12 +95,11 @@ When the pipeline starts up, tinyRNA will process the Run Config based on the co
 If you don't have bowtie indexes already built for your reference genome, tinyRNA can build them for you at the beginning of an end-to-end run and reuse them on subsequent runs with the same Paths File.
 
 To build bowtie indexes:
-1. Open your Run Config in a text editor and find the `run_bowtie_build` key. Set its value to `true` and save it.
-2. Open your Paths File in a text editor and find the `reference_genome_files` key. Add your reference genome file(s) under this key, one per line with a `- ` in front.
-3. Find the `ebwt` key and delete its value.
-4. Execute an end-to-end pipeline run.
+1. Open your Paths File in a text editor and find the `reference_genome_files` key. Add your reference genome file(s) under this key, one per line with a `- ` in front.
+2. Find the `ebwt` key and delete its value.
+3. Execute an end-to-end pipeline run.
 
-Once your indexes have been built, your Paths File will be modified such that `ebwt` points to their location (prefix) within your Run Directory. This means that indexes will not be unnecessarily rebuilt on subsequent runs as long as the same Paths File is used. If you need them rebuilt, simply repeat steps 3 and 4 above.
+Once your indexes have been built, your Paths File will be modified such that `ebwt` points to their location (prefix) within your Run Directory. This means that indexes will not be unnecessarily rebuilt on subsequent runs as long as the same Paths File is used. If you need them rebuilt, simply repeat steps 2 and 3 above.
 
 ## Samples Sheet Details
 |  _Column:_ | Input FASTQ Files   | Sample/Group Name | Replicate Number | Control | Normalization |

diff --git a/tests/unit_tests_configuration.py b/tests/unit_tests_configuration.py
@@ -0,0 +1,139 @@
+import contextlib
+import io
+import os
+import unittest
+from unittest.mock import patch, mock_open, call
+
+from tiny.rna.configuration import Configuration
+
+
+class ConfigurationTests(unittest.TestCase):
+    @classmethod
+    def setUpClass(self):
+        self.root_cfg_dir = os.path.abspath("../tiny/templates")
+        self.run_config = self.root_cfg_dir + "/run_config_template.yml"
+        self.paths = self.root_cfg_dir + "/paths.yml"
+
+        self.default_prefix = os.path.join(
+            self.root_cfg_dir,
+            Configuration(self.run_config)['run_directory'],
+            "bowtie-build/ram1"
+        )
+        self.maxDiff = 1522
+
+    """============ Helper functions ============"""
+
+    def config_with(self, prefs):
+        config = Configuration(self.run_config)
+        for key, val in prefs.items():
+            config[key] = val
+        return config
+
+    def bt_idx_files_from_prefix(self, prefix):
+        return [
+            {'path': f"{prefix}.{subext}.ebwt", 'class': 'File'}
+            for subext in ['1', '2', '3', '4', 'rev.1', 'rev.2']
+        ]
+
+    """================ Tests =================="""
+
+    """Does get_ebwt_prefix() produce the expected prefix path?"""
+
+    def test_get_ebwt_prefix(self):
+        config = Configuration(self.run_config)
+        actual_prefix = config.get_ebwt_prefix()
+        expected_prefix = self.default_prefix
+
+        self.assertEqual(actual_prefix, expected_prefix)
+
+    """Does get_ebwt_prefix() throw an error if reference genome files aren't provided?"""
+
+    def test_get_ebwt_prefix_no_genome(self):
+        config = Configuration(self.run_config)
+        config['reference_genome_files'] = None
+
+        with self.assertRaises(ValueError):
+            config.get_ebwt_prefix()
+
+    """Does get_bt_index_files() output the paths of indexes that have already been built?"""
+
+    def test_get_bt_index_files_prebuilt_indexes(self):
+        config = self.config_with({'run_bowtie_build': False})
+        prefix = config.paths['ebwt'] = os.path.abspath("./testdata/counter/validation/ebwt/ram1")
+        expected = self.bt_idx_files_from_prefix(prefix)
+        self.assertListEqual(config.get_bt_index_files(), expected)
+
+    """Does get_bt_index_files() output the paths of the index files that are expected
+    to be built from the reference genome?"""
+
+    def test_get_bt_index_files_unbuilt_indexes_with_genome(self):
+        config = self.config_with({'run_bowtie_build': True})
+        prefix = config.paths['ebwt'] = "mock_prefix"
+        expected = self.bt_idx_files_from_prefix(prefix)
+        self.assertListEqual(config.get_bt_index_files(), expected)
+
+    """Does get_bt_index_files() produce an error and quit when index files are
+    missing and a reference genome has not been provided?"""
+
+    def test_get_bt_index_files_missing_indexes_without_genome(self):
+        config = self.config_with({'run_bowtie_build': False, 'reference_genome_files': None})
+        prefix = config.paths['ebwt'] = "missing"
+        errmsg = '\n'.join([
+            "The following Bowtie index file couldn't be found:",
+            "\t" + f"{prefix}.1.ebwt",
+            "\nPlease either correct your ebwt prefix or add reference genomes in the Paths File."
+        ])
+
+        with self.assertRaisesRegex(SystemExit, errmsg):
+            config.get_bt_index_files()
+
+    """Does get_bt_index_files() produce an error without quitting when index files
+    are missing but a reference genome was provided, and does it return the list of
+    index files that will be built from the genome?"""
+
+    def test_get_bt_index_files_missing_indexes_with_genome(self):
+        config = self.config_with({'run_bowtie_build': False})
+        bad_prefix = config.paths['ebwt'] = "missing"
+        genome_prefix = self.default_prefix
+
+        expected_files = self.bt_idx_files_from_prefix(genome_prefix)
+        expected_error = '\n'.join([
+            "The following Bowtie index file couldn't be found:",
+            "\t" + f"{bad_prefix}.1.ebwt",
+            "\nIndexes will be built from your reference genome files during this run.",
+            ""
+        ])
+
+        stderr = io.StringIO()
+        with contextlib.redirect_stderr(stderr):
+            actual = config.get_bt_index_files()
+
+        self.assertEqual(stderr.getvalue(), expected_error)
+        self.assertListEqual(actual, expected_files)
+
+    """Does verify_bowtie_build_outputs() update the paths in ["bt_index_files"] and rewrite 
+    these changes to the processed Run Config if long indexes were produced? Does it also
+    write to the Paths File to update the new ebwt prefix?"""
+
+    def test_verify_bowtie_build_outputs(self):
+        ebwt_short = ["1.ebwt", "2.ebwt", "3.ebwt"]
+        ebwt_long = ["1.ebwtl", "2.ebwtl", "3.ebwtl"]
+        run_conf_ebwt = [Configuration.cwl_file(f, verify=False) for f in ebwt_short]
+        expected_ebwt = [Configuration.cwl_file(f, verify=False) for f in ebwt_long]
+
+        config = self.config_with({'bt_index_files': run_conf_ebwt})
+
+        with patch('tiny.rna.configuration.open', mock_open()) as mo, \
+                patch('tiny.rna.configuration.glob', return_value=ebwt_long) as g:
+            config.verify_bowtie_build_outputs()
+
+        expected_writes = [
+            call(self.paths, 'w'),
+            call(os.path.join(self.root_cfg_dir, config['run_directory'], os.path.basename(self.run_config)), 'w')
+        ]
+
+        self.assertListEqual(config['bt_index_files'], expected_ebwt)
+        self.assertListEqual(mo.call_args_list, expected_writes)
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/tiny/cwl/tools/bowtie-build.cwl b/tiny/cwl/tools/bowtie-build.cwl
@@ -83,7 +83,7 @@ outputs:
   index_files:
     type: File[]
     outputBinding:
-      glob: $(inputs.ebwt_base).*.ebwt
+      glob: $(inputs.ebwt_base).*.ebwt*
 
   console_output:
     type: stdout
diff --git a/tiny/entry.py b/tiny/entry.py
@@ -100,11 +100,7 @@ def run(tinyrna_cwl_path: str, config_file: str) -> None:
         # Use the cwltool CWL runner via command line
         return_code = run_cwltool_subprocess(config_object, workflow, run_directory)
 
-    # If the workflow completed without errors, we want to update
-    # the Paths Sheet to point to the new bowtie index prefix
-    if config_object['run_bowtie_build'] and return_code == 0:
-        paths_sheet_filename = config_object.paths.inf
-        config_object.paths.write_processed_config(paths_sheet_filename)
+    config_object.execute_post_run_tasks(return_code)
 
 
 @report_execution_time("Pipeline resume runtime")