Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,16 @@ This option provides the latest features but stable releases are more rigorously
./setup.sh
```

## tiny-count installation
## tiny-count Standalone Installation
Alternatively, you can install tinyRNA's precision counting tool by itself. Unlike the full tinyRNA suite, this option can be installed in existing conda environments and requires fewer dependencies.

```shell
conda install -c bioconda -c conda-forge tiny-count
```

If you'd like to jump right in and start using tiny-count, see our<br>
👉 [tutorial](START_HERE/tiny-count_TUTORIAL.md) 👈

## Usage

The `tinyrna` conda environment must be activated before using the tinyRNA workflow.
Expand All @@ -70,7 +73,7 @@ The `tinyrna` conda environment must be activated before using the tinyRNA workf
conda deactivate
```
If you'd like to jump right in and start using tinyRNA, see our<br>
👉 [tutorial](START_HERE/TUTORIAL.md) 👈
👉 [tutorial](START_HERE/tinyRNA_TUTORIAL.md) 👈

You can execute the workflow in its entirety for a full end-to-end analysis pipeline, or you can execute individual steps on their own. In most cases you will use the command `tiny` for pipeline operations.

Expand Down Expand Up @@ -290,8 +293,8 @@ See the [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. To see what is active

## Authors

* **Kristen Brown** - 2018-2019 - Colorado State University - [biokcb](https://github.com/biokcb)
* **Alex Tate** - 01/2021-present - Colorado State University - [AlexTate](https://github.com/AlexTate)
* **Kristen Brown** - 2018-2019 - Colorado State University - [biokcb](https://github.com/biokcb)

See also the list of [contributors](https://github.com/MontgomeryLab/tinyrna/contributors) who participated in this project.

Expand Down
55 changes: 55 additions & 0 deletions START_HERE/tiny-count_TUTORIAL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Getting Started With tiny-count

tiny-count is a counting utility that allows for hierarchical assignment of small RNA reads to features based on user-defined selection rules. This tutorial offers an introductory procedure for setting up and running tiny-count using your own data files.

If you instead want to use the tinyRNA workflow, where tiny-count execution is handled automatically, please see the [other tutorial](tinyRNA_TUTORIAL).

## Installation
Standalone installation requires [conda](https://docs.conda.io/en/main/miniconda.html). If conda is already installed, you can install tiny-count from the bioconda channel. See the [tiny-count installation section in the README](../README.md#tiny-count-standalone-installation) for instructions.

Alternatively, if you have already installed tinyRNA, you can use the `tiny-count` command within the tinyrna conda environment.

## Your Data Files
Gather the following files for the analysis:
1. **SAM files** containing small RNA reads aligned to a reference genome, one file per sample
2. **GFF3 or GFF2/GTF file(s)** containing annotations for features that you want to assign reads to

## Configuration Files
First, you'll need to obtain template copies of the configuration files. Start by activating the conda environment where tiny-count is installed, then run the following command:

```
tiny-count --get-templates
```

Next, fill out the configuration files that were copied:

### 1. The Samples Sheet (samples.csv)
Edit this file to add the paths to your SAM files, and to define the group name, replicate number, etc. for each sample.

### 2. The Paths File (paths.yml)
Edit this file to add the paths to your GFF annotation(s) under the `gff_files` key. You can leave the `alias` key as-is for now. All other keys in this file are used in the tinyRNA workflow.

### 3. The Features Sheet (features.csv)
Edit this file to define the selection rules for assigning reads to features. For now, we'll add a fully permissive rule:

| Select for... | with value... | Classify as... | Source Filter | Type Filter | Hierarchy | Strand | 5' End Nucleotide | Length | Overlap |
|----------------|---------------|----------------|---------------|-------------|-----------|--------|-------------------|--------|---------|
| Any | Any | Any | | | 0 | Both | Any | Any | Partial |

## First Run
Now you're ready to run tiny-count. Make sure you've activated the conda environment where tiny-count is installed, then run the following command:

```
tiny-count --paths-file paths.yml
```

## Outputs
The primary output is feature_counts.csv, a table of classified counts per feature. You can read about the other file outputs in the [Counts and Pipeline Statistics section of the README](../README.md#counts-and-pipeline-statistics).

## Next Steps
Now that you've run tiny-count, you can edit the configuration files to customize the analysis. For example, you can increase the specificity of your selection rule, or add more selection rules with similar or different hierarchy values, or add more GFF files to the Paths File. You can also add more samples to the Samples Sheet, and run tiny-count again to add them to the output.

### What to read next:
- [Feature selection rules and the selection process](../doc/tiny-count.md#feature-selection)
- [GFF aliases in the Features Sheet](../doc/Configuration.md#gff-files)
- [Command line options](../doc/Parameters.md#tiny-count)
4 changes: 2 additions & 2 deletions START_HERE/TUTORIAL.md → START_HERE/tinyRNA_TUTORIAL.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Getting Started
# Getting Started With the tinyRNA Workflow

This folder (`START_HERE`) contains a working minimal configuration and a generated/simulated sample dataset. We've assembled this configuration to make it easy to start using tinyRNA, and to provide a basis for your own project configuration.

## Installation

See the [README](../README.md#installation) for installation instructions and tips.
See the [README](../README.md#tinyrna-installation) for installation instructions and tips.

## This folder

Expand Down
3 changes: 1 addition & 2 deletions tests/testdata/collapser/helpstring.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
usage: tiny-collapse -i FASTQFILE -o OUTPREFIX [-h] [-t THRESHOLD] [-c]
usage: tiny-collapse -i FASTQFILE -o OUTPREFIX [-t THRESHOLD] [-c]
[--5p-trim LENGTH] [--3p-trim LENGTH]

Collapse sequences from a fastq file to a fasta file. Headers in the output
Expand All @@ -16,7 +16,6 @@ Required arguments:
{prefix}_collapsed_lowcounts.fa

Optional arguments:
-h, --help show this help message and exit
-t THRESHOLD, --threshold THRESHOLD
Sequences <= THRESHOLD will be omitted from
{prefix}_collapsed.fa and will instead be placed in
Expand Down
50 changes: 49 additions & 1 deletion tests/unit_tests_configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import unittest
from unittest.mock import patch, mock_open, call

from tiny.rna.configuration import Configuration, SamplesSheet, PathsFile
from tiny.rna.configuration import Configuration, SamplesSheet, PathsFile, get_templates
from unit_test_helpers import csv_factory, paths_template_file, make_paths_file
from tiny.rna.util import r_reserved_keywords

Expand Down Expand Up @@ -227,6 +227,7 @@ def test_validate_r_safe_sample_groups(self):
with self.assertRaisesRegex(AssertionError, msg):
SamplesSheet.validate_r_safe_sample_groups(dict.fromkeys(bad))


class PathsFileTest(unittest.TestCase):

@classmethod
Expand Down Expand Up @@ -350,5 +351,52 @@ def test_pipeline_mapping(self):
self.assertEqual(config['none_path'], None)


class ConfigurationTest(unittest.TestCase):

"""Does get_templates copy the expected number of files for each context?"""

def test_get_templates_contexts(self):
context_file_count = {
'tiny': 5,
'tiny-count': 3,
'tiny-plot': 1
}

with patch('tiny.rna.configuration.shutil.copyfile') as cf:
for context, count in context_file_count.items():
cf.reset_mock()
get_templates(context)
# Check against unique calls to copyfile incase there are duplicate calls for some reason
self.assertEqual(len(set(call.args[0] for call in cf.call_args_list)), count)

"""Does get_templates properly handle cases where template files already exist in the CWD?"""

def test_get_templates_conflicts(self):
tiny_count = ['paths.yml', 'samples.csv', 'features.csv']
tiny_plot = ['tinyrna-light.mplstyle']
tiny = ['run_config_template.yml', *tiny_count, *tiny_plot]

contexts = {
'tiny': tiny,
'tiny-count': tiny_count,
'tiny-plot': tiny_plot
}

for context, files in contexts.items():
with patch('tiny.rna.configuration.os.path.exists') as pe:
pe.return_value = True

try:
get_templates(context)
except SystemExit as e:
err_msg = e.args[0]
err_files = err_msg.splitlines()[1:-1]
exp_set = set(files)
act_set = set(map(str.strip, err_files))

self.assertSetEqual(exp_set, act_set)



if __name__ == '__main__':
unittest.main()
4 changes: 2 additions & 2 deletions tests/unit_tests_entry.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,8 @@ def setUpClass(self):

def test_get_templates(self):
test_functions = [
helpers.LambdaCapture(lambda: entry.get_templates(self.templates_path)), # The pre-install invocation
helpers.ShellCapture("tiny get-templates") # The post-install command
helpers.LambdaCapture(lambda: entry.get_templates("tiny")), # The pre-install invocation
helpers.ShellCapture("tiny get-templates") # The post-install command
]
template_files = ['run_config_template.yml', 'samples.csv', 'features.csv',
'paths.yml', 'tinyrna-light.mplstyle']
Expand Down
2 changes: 1 addition & 1 deletion tests/unit_tests_plotter.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@ def test_scatter_major_ticks(self):
hi_bound = 2**x + x # Walk upper bound forward much faster
vlim = np.array((lo_bound, hi_bound))

# title = f"Range: 2^{int(np.log2(view_lims[0]))} .. 2^{np.log2(view_lims[1]):.1f}"
# title = f"Range: 2^{int(np.log2(view_lims[0])):.1f} .. 2^{np.log2(view_lims[1]):.1f}"
# ^ must be set within scatter_* functions in plotter.py, not worth refactoring to support
plotter.scatter_by_dge(counts, dge, f'lim_{x:.2f}', vlim)

Expand Down
23 changes: 2 additions & 21 deletions tiny/entry.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from cwltool.utils import DEFAULT_TMP_PREFIX
from pkg_resources import resource_filename

from tiny.rna.configuration import Configuration, ConfigBase
from tiny.rna.configuration import Configuration, ConfigBase, get_templates
from tiny.rna.resume import ResumeCounterConfig, ResumePlotterConfig
from tiny.rna.util import report_execution_time, SmartFormatter, add_transparent_help

Expand Down Expand Up @@ -321,25 +321,6 @@ def furnish_if_file_record(file_dict):
return 0


def get_templates(templates_path: str) -> None:
"""Copies all configuration file templates to the current working directory

Args:
templates_path: The path to the project's templates directory. This directory
contains templates for the run configuration, sample inputs, feature selection
rules, the project's matplotlib stylesheet, and paths for all the above.

Returns: None

"""

print("Copying template input files to current directory...")

# Copy template files to the current working directory
for template in template_files:
shutil.copyfile(f"{templates_path}/{template}", f"{os.getcwd()}/{template}")


def setup_cwl(tinyrna_cwl_path: str, config_file: str) -> None:
"""Retrieves the project's workflow files, and if provided, processes the run config file

Expand Down Expand Up @@ -383,7 +364,7 @@ def main():
"replot": lambda: resume(cwl_path, args.config, "tiny-plot"),
"recount": lambda: resume(cwl_path, args.config, "tiny-count"),
"setup-cwl": lambda: setup_cwl(cwl_path, args.config),
"get-templates": lambda: get_templates(templates_path)
"get-templates": lambda: get_templates("tiny")
}

command_map[args.command]()
Expand Down
38 changes: 38 additions & 0 deletions tiny/rna/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -403,6 +403,42 @@ def main():
config_object.write_processed_config(f"processed_{file_basename}")


def get_templates(context: str):
"""Copies a context-based subset of configuration file templates to the CWD

Args:
context: The command for which template files should be provided
(currently: "tiny" or "tiny-count" or "tiny-plot")

Returns: None
"""

tiny_count = ['paths.yml', 'samples.csv', 'features.csv']
tiny_plot = ['tinyrna-light.mplstyle']
tiny = ['run_config_template.yml', *tiny_count, *tiny_plot]

files_to_copy = {
'tiny': tiny,
'tiny-count': tiny_count,
'tiny-plot': tiny_plot
}.get(context, None)

if files_to_copy is None:
raise ValueError(f"Invalid template file context: {context}")
else:
conflicts = [f for f in files_to_copy if os.path.exists(f)]
if conflicts:
sys.exit(
"The following files already exist in the current directory:\n\t"
+ '\n\t'.join(conflicts) + "\nPlease remove or rename them and try again."
)

# Copy template files to the current working directory
templates_path = resource_filename('tiny', 'templates')
for template in files_to_copy:
shutil.copyfile(f"{templates_path}/{template}", f"{os.getcwd()}/{template}")


class PathsFile(ConfigBase):
"""A configuration class for managing and validating Paths Files.
Relative paths are automatically resolved on lookup and list types are enforced.
Expand Down Expand Up @@ -620,6 +656,7 @@ def get_sample_basename(filename):
root, _ = os.path.splitext(filename)
return os.path.basename(root)


class CSVReader(csv.DictReader):
"""A simple wrapper class for csv.DictReader

Expand Down Expand Up @@ -733,5 +770,6 @@ def check_backward_compatibility(self, header_vals):

if compat_errors: raise ValueError('\n\n'.join(compat_errors))


if __name__ == '__main__':
Configuration.main()
15 changes: 2 additions & 13 deletions tiny/rna/counter/counter.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from tiny.rna.counter.features import Features, FeatureCounter
from tiny.rna.counter.statistics import MergedStatsManager
from tiny.rna.util import report_execution_time, from_here, ReadOnlyDict, get_timestamp, add_transparent_help
from tiny.rna.configuration import CSVReader, PathsFile
from tiny.rna.configuration import CSVReader, PathsFile, get_templates

# Global variables for multiprocessing
counter: FeatureCounter
Expand Down Expand Up @@ -70,7 +70,7 @@ def get_args():
args = arg_parser.parse_args()

if args.get_templates:
get_templates()
get_templates("tiny-count")
sys.exit(0)
else:
args_dict = vars(args)
Expand All @@ -79,17 +79,6 @@ def get_args():
return ReadOnlyDict(args_dict)


def get_templates():
"""Copies config file templates required by tiny-count into the current directory"""

templates_path = resource_filename('tiny', 'templates')
template_files = ['paths.yml', 'samples.csv', 'features.csv']

# Copy template files to the current working directory
for template in template_files:
shutil.copyfile(f"{templates_path}/{template}", f"{os.getcwd()}/{template}")


def load_samples(samples_csv: str, is_pipeline: bool) -> List[Dict[str, str]]:
"""Parses the Samples Sheet to determine library names and alignment files for counting

Expand Down