Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,9 +95,8 @@ tiny get-template
| Reference annotations<br/>[(example)](START_HERE/reference_data/ram1.gff3) | GFF3 / GFF2 / GTF | Column 9 attributes (defined as "tag=value" or "tag "):<ul><li>Each feature must have an `ID` or `gene_id` or `Parent` tag (referred to as `ID` henceforth).</li><li>Feature classes can be defined with the `Class` tag. If undefined, the default value \__UNKNOWN_\_ will be used.</li><li>Discontinuous features must be defined with the `Parent` tag whose value is the logical parent's `ID`, or by sharing the same `ID`.</li><li>Attribute values containing commas must represent lists.</li><li>`Parent` tags with multiple values are not yet supported.</li><li>See the example link (left) for col. 9 formatting.</li></ul> |
| Sequencing data<br/>[(example)](START_HERE/fastq_files) | FASTQ(.gz) | Files must be demultiplexed. |
| Reference genome<br/>[(example)](START_HERE/reference_data/ram1.fa) | FASTA | Chromosome identifiers (e.g. Chr1): <ul><li>Must match your reference annotation file chromosome identifiers</li><li>Are case sensitive</li></ul> |
| Bowtie indexes (optional) <sup>1</sup> | ebwt | Must be small indexes (.ebwtl indexes are not supported) |

<br/><sup>1</sup> Bowtie indexes can be created for you. See the [configuration file documentation](doc/Configuration.md#building-bowtie-indexes).


### Running an End-to-End Analysis
In most cases you will use this toolset as an end-to-end pipeline. This will run a full, standard small RNA sequencing data analysis according to your configuration file. Before starting, you will need the following:
Expand Down
4 changes: 1 addition & 3 deletions START_HERE/TUTORIAL.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,7 @@ The output you see on your terminal is from `cwltool`, which coordinates the exe
When the analysis is complete you'll notice a new folder has appeared whose name contains the date and time of the run. Inside you'll find subdirectories containing the file and terminal outputs for each step, and the processed Run Config file for auto-documentation of the run.

### Bowtie indexes
Bowtie indexes were built during this run because paths.yml didn't define an `ebwt` prefix. Now, you'll see the `ebwt` points to the freshly built indexes in your run directory. This means that indexes won't be rebuilt during any subsequent runs that use this `paths.yml` file. If you need to rebuild your indexes:
1. Change the value of ebwt to `ebwt: ''` in paths.yml
2. Ensure that your Run Config file contains `run_bowtie_build: True`
Bowtie indexes were built during this run because `paths.yml` didn't define an `ebwt` prefix. Now, you'll see the `ebwt` points to the freshly built indexes in your run directory. This means that indexes won't be rebuilt during any subsequent runs that use this `paths.yml` file. If you need to rebuild your indexes, simply delete the value to the right of `ebwt` in paths.yml

## Running Your Data
Expected runtime: ~10-60 minutes (expect longer runtimes if a bowtie index must be built)
Expand Down
5 changes: 2 additions & 3 deletions START_HERE/paths.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,8 @@ tmp_directory:
######-------------------------------- BOWTIE-BUILD ---------------------------------######
#
# To build bowtie indexes:
# 1. Your Run Config file must contain run_bowtie_build: true
# 2. Your reference genome file(s) must be listed under reference_genome_files (below)
# 3. ebwt (below) must be an empty string, or ''
# 1. Your reference genome file(s) must be listed under reference_genome_files (below)
# 2. ebwt (below) must be empty (nothing after ":")
#
# Once your indexes have been built, this config file will be modified such
# that ebwt points to their location (prefix) within your Run Directory. This
Expand Down
5 changes: 1 addition & 4 deletions START_HERE/run_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,6 @@ paths_config: ./paths.yml
##-- If none provided, the default of user_tinyrna will be used --##
run_name: tinyrna

##-- If True: run bowtie-build before analyzing libraries --##
##-- NOTE: this option may be ignored depending on your Paths file. See Paths file. --##
run_bowtie_build: True

##-- Number of threads to use when a step supports multi-threading --##
##-- For best performance, this should be equal to your computer's processor core count --##
threads: 4
Expand Down Expand Up @@ -334,6 +330,7 @@ run_directory: ~
tmp_directory: ~
features_csv: { }
samples_csv: { }
run_bowtie_build: false
reference_genome_files: [ ]
plot_style_sheet: ~
adapter_fasta: ~
Expand Down
9 changes: 4 additions & 5 deletions doc/Configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,12 +95,11 @@ When the pipeline starts up, tinyRNA will process the Run Config based on the co
If you don't have bowtie indexes already built for your reference genome, tinyRNA can build them for you at the beginning of an end-to-end run and reuse them on subsequent runs with the same Paths File.

To build bowtie indexes:
1. Open your Run Config in a text editor and find the `run_bowtie_build` key. Set its value to `true` and save it.
2. Open your Paths File in a text editor and find the `reference_genome_files` key. Add your reference genome file(s) under this key, one per line with a `- ` in front.
3. Find the `ebwt` key and delete its value.
4. Execute an end-to-end pipeline run.
1. Open your Paths File in a text editor and find the `reference_genome_files` key. Add your reference genome file(s) under this key, one per line with a `- ` in front.
2. Find the `ebwt` key and delete its value.
3. Execute an end-to-end pipeline run.

Once your indexes have been built, your Paths File will be modified such that `ebwt` points to their location (prefix) within your Run Directory. This means that indexes will not be unnecessarily rebuilt on subsequent runs as long as the same Paths File is used. If you need them rebuilt, simply repeat steps 3 and 4 above.
Once your indexes have been built, your Paths File will be modified such that `ebwt` points to their location (prefix) within your Run Directory. This means that indexes will not be unnecessarily rebuilt on subsequent runs as long as the same Paths File is used. If you need them rebuilt, simply repeat steps 2 and 3 above.

## Samples Sheet Details
| _Column:_ | Input FASTQ Files | Sample/Group Name | Replicate Number | Control | Normalization |
Expand Down
139 changes: 139 additions & 0 deletions tests/unit_tests_configuration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
import contextlib
import io
import os
import unittest
from unittest.mock import patch, mock_open, call

from tiny.rna.configuration import Configuration


class ConfigurationTests(unittest.TestCase):
@classmethod
def setUpClass(self):
self.root_cfg_dir = os.path.abspath("../tiny/templates")
self.run_config = self.root_cfg_dir + "/run_config_template.yml"
self.paths = self.root_cfg_dir + "/paths.yml"

self.default_prefix = os.path.join(
self.root_cfg_dir,
Configuration(self.run_config)['run_directory'],
"bowtie-build/ram1"
)
self.maxDiff = 1522

"""============ Helper functions ============"""

def config_with(self, prefs):
config = Configuration(self.run_config)
for key, val in prefs.items():
config[key] = val
return config

def bt_idx_files_from_prefix(self, prefix):
return [
{'path': f"{prefix}.{subext}.ebwt", 'class': 'File'}
for subext in ['1', '2', '3', '4', 'rev.1', 'rev.2']
]

"""================ Tests =================="""

"""Does get_ebwt_prefix() produce the expected prefix path?"""

def test_get_ebwt_prefix(self):
config = Configuration(self.run_config)
actual_prefix = config.get_ebwt_prefix()
expected_prefix = self.default_prefix

self.assertEqual(actual_prefix, expected_prefix)

"""Does get_ebwt_prefix() throw an error if reference genome files aren't provided?"""

def test_get_ebwt_prefix_no_genome(self):
config = Configuration(self.run_config)
config['reference_genome_files'] = None

with self.assertRaises(ValueError):
config.get_ebwt_prefix()

"""Does get_bt_index_files() output the paths of indexes that have already been built?"""

def test_get_bt_index_files_prebuilt_indexes(self):
config = self.config_with({'run_bowtie_build': False})
prefix = config.paths['ebwt'] = os.path.abspath("./testdata/counter/validation/ebwt/ram1")
expected = self.bt_idx_files_from_prefix(prefix)
self.assertListEqual(config.get_bt_index_files(), expected)

"""Does get_bt_index_files() output the paths of the index files that are expected
to be built from the reference genome?"""

def test_get_bt_index_files_unbuilt_indexes_with_genome(self):
config = self.config_with({'run_bowtie_build': True})
prefix = config.paths['ebwt'] = "mock_prefix"
expected = self.bt_idx_files_from_prefix(prefix)
self.assertListEqual(config.get_bt_index_files(), expected)

"""Does get_bt_index_files() produce an error and quit when index files are
missing and a reference genome has not been provided?"""

def test_get_bt_index_files_missing_indexes_without_genome(self):
config = self.config_with({'run_bowtie_build': False, 'reference_genome_files': None})
prefix = config.paths['ebwt'] = "missing"
errmsg = '\n'.join([
"The following Bowtie index file couldn't be found:",
"\t" + f"{prefix}.1.ebwt",
"\nPlease either correct your ebwt prefix or add reference genomes in the Paths File."
])

with self.assertRaisesRegex(SystemExit, errmsg):
config.get_bt_index_files()

"""Does get_bt_index_files() produce an error without quitting when index files
are missing but a reference genome was provided, and does it return the list of
index files that will be built from the genome?"""

def test_get_bt_index_files_missing_indexes_with_genome(self):
config = self.config_with({'run_bowtie_build': False})
bad_prefix = config.paths['ebwt'] = "missing"
genome_prefix = self.default_prefix

expected_files = self.bt_idx_files_from_prefix(genome_prefix)
expected_error = '\n'.join([
"The following Bowtie index file couldn't be found:",
"\t" + f"{bad_prefix}.1.ebwt",
"\nIndexes will be built from your reference genome files during this run.",
""
])

stderr = io.StringIO()
with contextlib.redirect_stderr(stderr):
actual = config.get_bt_index_files()

self.assertEqual(stderr.getvalue(), expected_error)
self.assertListEqual(actual, expected_files)

"""Does verify_bowtie_build_outputs() update the paths in ["bt_index_files"] and rewrite
these changes to the processed Run Config if long indexes were produced? Does it also
write to the Paths File to update the new ebwt prefix?"""

def test_verify_bowtie_build_outputs(self):
ebwt_short = ["1.ebwt", "2.ebwt", "3.ebwt"]
ebwt_long = ["1.ebwtl", "2.ebwtl", "3.ebwtl"]
run_conf_ebwt = [Configuration.cwl_file(f, verify=False) for f in ebwt_short]
expected_ebwt = [Configuration.cwl_file(f, verify=False) for f in ebwt_long]

config = self.config_with({'bt_index_files': run_conf_ebwt})

with patch('tiny.rna.configuration.open', mock_open()) as mo, \
patch('tiny.rna.configuration.glob', return_value=ebwt_long) as g:
config.verify_bowtie_build_outputs()

expected_writes = [
call(self.paths, 'w'),
call(os.path.join(self.root_cfg_dir, config['run_directory'], os.path.basename(self.run_config)), 'w')
]

self.assertListEqual(config['bt_index_files'], expected_ebwt)
self.assertListEqual(mo.call_args_list, expected_writes)

if __name__ == '__main__':
unittest.main()
2 changes: 1 addition & 1 deletion tiny/cwl/tools/bowtie-build.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ outputs:
index_files:
type: File[]
outputBinding:
glob: $(inputs.ebwt_base).*.ebwt
glob: $(inputs.ebwt_base).*.ebwt*

console_output:
type: stdout
6 changes: 1 addition & 5 deletions tiny/entry.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,11 +100,7 @@ def run(tinyrna_cwl_path: str, config_file: str) -> None:
# Use the cwltool CWL runner via command line
return_code = run_cwltool_subprocess(config_object, workflow, run_directory)

# If the workflow completed without errors, we want to update
# the Paths Sheet to point to the new bowtie index prefix
if config_object['run_bowtie_build'] and return_code == 0:
paths_sheet_filename = config_object.paths.inf
config_object.paths.write_processed_config(paths_sheet_filename)
config_object.execute_post_run_tasks(return_code)


@report_execution_time("Pipeline resume runtime")
Expand Down
Loading