diff --git a/README.md b/README.md index be6ee175..21dfe51c 100644 --- a/README.md +++ b/README.md @@ -50,13 +50,16 @@ This option provides the latest features but stable releases are more rigorously ./setup.sh ``` -## tiny-count installation +## tiny-count Standalone Installation Alternatively, you can install tinyRNA's precision counting tool by itself. Unlike the full tinyRNA suite, this option can be installed in existing conda environments and requires fewer dependencies. ```shell conda install -c bioconda -c conda-forge tiny-count ``` +If you'd like to jump right in and start using tiny-count, see our
+👉 [tutorial](START_HERE/tiny-count_TUTORIAL.md) 👈 + ## Usage The `tinyrna` conda environment must be activated before using the tinyRNA workflow. @@ -70,7 +73,7 @@ The `tinyrna` conda environment must be activated before using the tinyRNA workf conda deactivate ``` If you'd like to jump right in and start using tinyRNA, see our
-👉 [tutorial](START_HERE/TUTORIAL.md) 👈 +👉 [tutorial](START_HERE/tinyRNA_TUTORIAL.md) 👈 You can execute the workflow in its entirety for a full end-to-end analysis pipeline, or you can execute individual steps on their own. In most cases you will use the command `tiny` for pipeline operations. @@ -290,8 +293,8 @@ See the [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. To see what is active ## Authors -* **Kristen Brown** - 2018-2019 - Colorado State University - [biokcb](https://github.com/biokcb) * **Alex Tate** - 01/2021-present - Colorado State University - [AlexTate](https://github.com/AlexTate) +* **Kristen Brown** - 2018-2019 - Colorado State University - [biokcb](https://github.com/biokcb) See also the list of [contributors](https://github.com/MontgomeryLab/tinyrna/contributors) who participated in this project. diff --git a/START_HERE/tiny-count_TUTORIAL.md b/START_HERE/tiny-count_TUTORIAL.md new file mode 100644 index 00000000..1a8b6072 --- /dev/null +++ b/START_HERE/tiny-count_TUTORIAL.md @@ -0,0 +1,55 @@ +# Getting Started With tiny-count + +tiny-count is a counting utility that allows for hierarchical assignment of small RNA reads to features based on user-defined selection rules. This tutorial offers an introductory procedure for setting up and running tiny-count using your own data files. + +If you instead want to use the tinyRNA workflow, where tiny-count execution is handled automatically, please see the [other tutorial](tinyRNA_TUTORIAL). + +## Installation +Standalone installation requires [conda](https://docs.conda.io/en/main/miniconda.html). If conda is already installed, you can install tiny-count from the bioconda channel. See the [tiny-count installation section in the README](../README.md#tiny-count-standalone-installation) for instructions. + +Alternatively, if you have already installed tinyRNA, you can use the `tiny-count` command within the tinyrna conda environment. + +## Your Data Files +Gather the following files for the analysis: +1. **SAM files** containing small RNA reads aligned to a reference genome, one file per sample +2. **GFF3 or GFF2/GTF file(s)** containing annotations for features that you want to assign reads to + +## Configuration Files +First, you'll need to obtain template copies of the configuration files. Start by activating the conda environment where tiny-count is installed, then run the following command: + +``` +tiny-count --get-templates +``` + +Next, fill out the configuration files that were copied: + +### 1. The Samples Sheet (samples.csv) +Edit this file to add the paths to your SAM files, and to define the group name, replicate number, etc. for each sample. + +### 2. The Paths File (paths.yml) +Edit this file to add the paths to your GFF annotation(s) under the `gff_files` key. You can leave the `alias` key as-is for now. All other keys in this file are used in the tinyRNA workflow. + +### 3. The Features Sheet (features.csv) +Edit this file to define the selection rules for assigning reads to features. For now, we'll add a fully permissive rule: + +| Select for... | with value... | Classify as... | Source Filter | Type Filter | Hierarchy | Strand | 5' End Nucleotide | Length | Overlap | +|----------------|---------------|----------------|---------------|-------------|-----------|--------|-------------------|--------|---------| +| Any | Any | Any | | | 0 | Both | Any | Any | Partial | + +## First Run +Now you're ready to run tiny-count. Make sure you've activated the conda environment where tiny-count is installed, then run the following command: + +``` +tiny-count --paths-file paths.yml +``` + +## Outputs +The primary output is feature_counts.csv, a table of classified counts per feature. You can read about the other file outputs in the [Counts and Pipeline Statistics section of the README](../README.md#counts-and-pipeline-statistics). + +## Next Steps +Now that you've run tiny-count, you can edit the configuration files to customize the analysis. For example, you can increase the specificity of your selection rule, or add more selection rules with similar or different hierarchy values, or add more GFF files to the Paths File. You can also add more samples to the Samples Sheet, and run tiny-count again to add them to the output. + +### What to read next: +- [Feature selection rules and the selection process](../doc/tiny-count.md#feature-selection) +- [GFF aliases in the Features Sheet](../doc/Configuration.md#gff-files) +- [Command line options](../doc/Parameters.md#tiny-count) \ No newline at end of file diff --git a/START_HERE/TUTORIAL.md b/START_HERE/tinyRNA_TUTORIAL.md similarity index 95% rename from START_HERE/TUTORIAL.md rename to START_HERE/tinyRNA_TUTORIAL.md index 2c594eda..783ae5ff 100644 --- a/START_HERE/TUTORIAL.md +++ b/START_HERE/tinyRNA_TUTORIAL.md @@ -1,10 +1,10 @@ -# Getting Started +# Getting Started With the tinyRNA Workflow This folder (`START_HERE`) contains a working minimal configuration and a generated/simulated sample dataset. We've assembled this configuration to make it easy to start using tinyRNA, and to provide a basis for your own project configuration. ## Installation -See the [README](../README.md#installation) for installation instructions and tips. +See the [README](../README.md#tinyrna-installation) for installation instructions and tips. ## This folder diff --git a/tests/testdata/collapser/helpstring.txt b/tests/testdata/collapser/helpstring.txt index 5f42108d..c1c68e72 100644 --- a/tests/testdata/collapser/helpstring.txt +++ b/tests/testdata/collapser/helpstring.txt @@ -1,4 +1,4 @@ -usage: tiny-collapse -i FASTQFILE -o OUTPREFIX [-h] [-t THRESHOLD] [-c] +usage: tiny-collapse -i FASTQFILE -o OUTPREFIX [-t THRESHOLD] [-c] [--5p-trim LENGTH] [--3p-trim LENGTH] Collapse sequences from a fastq file to a fasta file. Headers in the output @@ -16,7 +16,6 @@ Required arguments: {prefix}_collapsed_lowcounts.fa Optional arguments: - -h, --help show this help message and exit -t THRESHOLD, --threshold THRESHOLD Sequences <= THRESHOLD will be omitted from {prefix}_collapsed.fa and will instead be placed in diff --git a/tests/unit_tests_configuration.py b/tests/unit_tests_configuration.py index b2f9fe34..5eed377e 100644 --- a/tests/unit_tests_configuration.py +++ b/tests/unit_tests_configuration.py @@ -5,7 +5,7 @@ import unittest from unittest.mock import patch, mock_open, call -from tiny.rna.configuration import Configuration, SamplesSheet, PathsFile +from tiny.rna.configuration import Configuration, SamplesSheet, PathsFile, get_templates from unit_test_helpers import csv_factory, paths_template_file, make_paths_file from tiny.rna.util import r_reserved_keywords @@ -227,6 +227,7 @@ def test_validate_r_safe_sample_groups(self): with self.assertRaisesRegex(AssertionError, msg): SamplesSheet.validate_r_safe_sample_groups(dict.fromkeys(bad)) + class PathsFileTest(unittest.TestCase): @classmethod @@ -350,5 +351,52 @@ def test_pipeline_mapping(self): self.assertEqual(config['none_path'], None) +class ConfigurationTest(unittest.TestCase): + + """Does get_templates copy the expected number of files for each context?""" + + def test_get_templates_contexts(self): + context_file_count = { + 'tiny': 5, + 'tiny-count': 3, + 'tiny-plot': 1 + } + + with patch('tiny.rna.configuration.shutil.copyfile') as cf: + for context, count in context_file_count.items(): + cf.reset_mock() + get_templates(context) + # Check against unique calls to copyfile incase there are duplicate calls for some reason + self.assertEqual(len(set(call.args[0] for call in cf.call_args_list)), count) + + """Does get_templates properly handle cases where template files already exist in the CWD?""" + + def test_get_templates_conflicts(self): + tiny_count = ['paths.yml', 'samples.csv', 'features.csv'] + tiny_plot = ['tinyrna-light.mplstyle'] + tiny = ['run_config_template.yml', *tiny_count, *tiny_plot] + + contexts = { + 'tiny': tiny, + 'tiny-count': tiny_count, + 'tiny-plot': tiny_plot + } + + for context, files in contexts.items(): + with patch('tiny.rna.configuration.os.path.exists') as pe: + pe.return_value = True + + try: + get_templates(context) + except SystemExit as e: + err_msg = e.args[0] + err_files = err_msg.splitlines()[1:-1] + exp_set = set(files) + act_set = set(map(str.strip, err_files)) + + self.assertSetEqual(exp_set, act_set) + + + if __name__ == '__main__': unittest.main() diff --git a/tests/unit_tests_entry.py b/tests/unit_tests_entry.py index b913200f..6552b46b 100644 --- a/tests/unit_tests_entry.py +++ b/tests/unit_tests_entry.py @@ -54,8 +54,8 @@ def setUpClass(self): def test_get_templates(self): test_functions = [ - helpers.LambdaCapture(lambda: entry.get_templates(self.templates_path)), # The pre-install invocation - helpers.ShellCapture("tiny get-templates") # The post-install command + helpers.LambdaCapture(lambda: entry.get_templates("tiny")), # The pre-install invocation + helpers.ShellCapture("tiny get-templates") # The post-install command ] template_files = ['run_config_template.yml', 'samples.csv', 'features.csv', 'paths.yml', 'tinyrna-light.mplstyle'] diff --git a/tests/unit_tests_plotter.py b/tests/unit_tests_plotter.py index 9939dde0..7bbbb9c5 100644 --- a/tests/unit_tests_plotter.py +++ b/tests/unit_tests_plotter.py @@ -204,7 +204,7 @@ def test_scatter_major_ticks(self): hi_bound = 2**x + x # Walk upper bound forward much faster vlim = np.array((lo_bound, hi_bound)) - # title = f"Range: 2^{int(np.log2(view_lims[0]))} .. 2^{np.log2(view_lims[1]):.1f}" + # title = f"Range: 2^{int(np.log2(view_lims[0])):.1f} .. 2^{np.log2(view_lims[1]):.1f}" # ^ must be set within scatter_* functions in plotter.py, not worth refactoring to support plotter.scatter_by_dge(counts, dge, f'lim_{x:.2f}', vlim) diff --git a/tiny/entry.py b/tiny/entry.py index 848f7be2..50295e0b 100644 --- a/tiny/entry.py +++ b/tiny/entry.py @@ -20,7 +20,7 @@ from cwltool.utils import DEFAULT_TMP_PREFIX from pkg_resources import resource_filename -from tiny.rna.configuration import Configuration, ConfigBase +from tiny.rna.configuration import Configuration, ConfigBase, get_templates from tiny.rna.resume import ResumeCounterConfig, ResumePlotterConfig from tiny.rna.util import report_execution_time, SmartFormatter, add_transparent_help @@ -321,25 +321,6 @@ def furnish_if_file_record(file_dict): return 0 -def get_templates(templates_path: str) -> None: - """Copies all configuration file templates to the current working directory - - Args: - templates_path: The path to the project's templates directory. This directory - contains templates for the run configuration, sample inputs, feature selection - rules, the project's matplotlib stylesheet, and paths for all the above. - - Returns: None - - """ - - print("Copying template input files to current directory...") - - # Copy template files to the current working directory - for template in template_files: - shutil.copyfile(f"{templates_path}/{template}", f"{os.getcwd()}/{template}") - - def setup_cwl(tinyrna_cwl_path: str, config_file: str) -> None: """Retrieves the project's workflow files, and if provided, processes the run config file @@ -383,7 +364,7 @@ def main(): "replot": lambda: resume(cwl_path, args.config, "tiny-plot"), "recount": lambda: resume(cwl_path, args.config, "tiny-count"), "setup-cwl": lambda: setup_cwl(cwl_path, args.config), - "get-templates": lambda: get_templates(templates_path) + "get-templates": lambda: get_templates("tiny") } command_map[args.command]() diff --git a/tiny/rna/configuration.py b/tiny/rna/configuration.py index f53ef50b..e2cdb2db 100644 --- a/tiny/rna/configuration.py +++ b/tiny/rna/configuration.py @@ -403,6 +403,42 @@ def main(): config_object.write_processed_config(f"processed_{file_basename}") +def get_templates(context: str): + """Copies a context-based subset of configuration file templates to the CWD + + Args: + context: The command for which template files should be provided + (currently: "tiny" or "tiny-count" or "tiny-plot") + + Returns: None + """ + + tiny_count = ['paths.yml', 'samples.csv', 'features.csv'] + tiny_plot = ['tinyrna-light.mplstyle'] + tiny = ['run_config_template.yml', *tiny_count, *tiny_plot] + + files_to_copy = { + 'tiny': tiny, + 'tiny-count': tiny_count, + 'tiny-plot': tiny_plot + }.get(context, None) + + if files_to_copy is None: + raise ValueError(f"Invalid template file context: {context}") + else: + conflicts = [f for f in files_to_copy if os.path.exists(f)] + if conflicts: + sys.exit( + "The following files already exist in the current directory:\n\t" + + '\n\t'.join(conflicts) + "\nPlease remove or rename them and try again." + ) + + # Copy template files to the current working directory + templates_path = resource_filename('tiny', 'templates') + for template in files_to_copy: + shutil.copyfile(f"{templates_path}/{template}", f"{os.getcwd()}/{template}") + + class PathsFile(ConfigBase): """A configuration class for managing and validating Paths Files. Relative paths are automatically resolved on lookup and list types are enforced. @@ -620,6 +656,7 @@ def get_sample_basename(filename): root, _ = os.path.splitext(filename) return os.path.basename(root) + class CSVReader(csv.DictReader): """A simple wrapper class for csv.DictReader @@ -733,5 +770,6 @@ def check_backward_compatibility(self, header_vals): if compat_errors: raise ValueError('\n\n'.join(compat_errors)) + if __name__ == '__main__': Configuration.main() \ No newline at end of file diff --git a/tiny/rna/counter/counter.py b/tiny/rna/counter/counter.py index d89512c1..a20dd3a1 100644 --- a/tiny/rna/counter/counter.py +++ b/tiny/rna/counter/counter.py @@ -15,7 +15,7 @@ from tiny.rna.counter.features import Features, FeatureCounter from tiny.rna.counter.statistics import MergedStatsManager from tiny.rna.util import report_execution_time, from_here, ReadOnlyDict, get_timestamp, add_transparent_help -from tiny.rna.configuration import CSVReader, PathsFile +from tiny.rna.configuration import CSVReader, PathsFile, get_templates # Global variables for multiprocessing counter: FeatureCounter @@ -70,7 +70,7 @@ def get_args(): args = arg_parser.parse_args() if args.get_templates: - get_templates() + get_templates("tiny-count") sys.exit(0) else: args_dict = vars(args) @@ -79,17 +79,6 @@ def get_args(): return ReadOnlyDict(args_dict) -def get_templates(): - """Copies config file templates required by tiny-count into the current directory""" - - templates_path = resource_filename('tiny', 'templates') - template_files = ['paths.yml', 'samples.csv', 'features.csv'] - - # Copy template files to the current working directory - for template in template_files: - shutil.copyfile(f"{templates_path}/{template}", f"{os.getcwd()}/{template}") - - def load_samples(samples_csv: str, is_pipeline: bool) -> List[Dict[str, str]]: """Parses the Samples Sheet to determine library names and alignment files for counting