MontgomeryLab · taimontgomery · Dec 14, 2022 · Dec 6, 2022 · Dec 6, 2022 · Dec 6, 2022
diff --git a/README.md b/README.md
@@ -85,7 +85,7 @@ The pipeline requires that you identify:
 
 For more information, please see the [configuration file documentation](doc/Configuration.md). The `START_HERE` directory demonstrates a working configuration using these files. You can also get a copy of them by running the command:
 ```shell
-tiny get-template
+tiny get-templates
 ```
 
 

diff --git a/START_HERE/paths.yml b/START_HERE/paths.yml
@@ -59,7 +59,7 @@ adapter_fasta:
 ######--------------------------------- tiny-plot -----------------------------------######
 #
 # Optional: override the styles used by tiny-plot by providing your own .mplstyle sheet
-# Run "tiny get-template" in your terminal to get a copy of the current style sheet
+# Run "tiny get-templates" in your terminal to get a copy of the current style sheet
 #
 ######-------------------------------------------------------------------------------######
 

diff --git a/START_HERE/run_config.yml b/START_HERE/run_config.yml
@@ -317,7 +317,7 @@ dir_name_plotter: plots
 #
 ###########################################################################################
 
-version: 1.2
+version: 1.2.1
 
 ######--------------------------- DERIVED FROM PATHS FILE ---------------------------######
 #

diff --git a/START_HERE/samples.csv b/START_HERE/samples.csv
@@ -1,4 +1,4 @@
-Input FastQ Files,Sample/Group Name,Replicate number,Control,Normalization
+FASTQ/SAM Files,Sample/Group Name,Replicate number,Control,Normalization
 ./fastq_files/cond1_rep1.fastq.gz,condition1,1,TRUE,
 ./fastq_files/cond1_rep2.fastq.gz,condition1,2,,
 ./fastq_files/cond1_rep3.fastq.gz,condition1,3,,

diff --git a/doc/Configuration.md b/doc/Configuration.md
@@ -9,7 +9,7 @@ The pipeline requires that you identify:
 
 The `START_HERE` directory demonstrates a working configuration using these files. You can also get a copy of them (and other optional template files) with:
 ```
-tiny get-template
+tiny get-templates
 ```
 
 ## Overview
@@ -117,7 +117,7 @@ The final output directory name has three components:
 - The `run_directory` basename defined in your Paths File
 
 ## Samples Sheet Details
-|  _Column:_ | Input FASTQ Files   | Sample/Group Name | Replicate Number | Control | Normalization |
+|  _Column:_ | FASTQ/SAM Files     | Sample/Group Name | Replicate Number | Control | Normalization |
 |-----------:|---------------------|-------------------|------------------|---------|---------------|
 | _Example:_ | cond1_rep1.fastq.gz | condition1        | 1                | True    | RPM           |
 
@@ -151,4 +151,4 @@ Rules that match features in the first stage of selection will be used in a seco
 See [tiny-count's documentation](tiny-count.md#feature-selection) for an explanation of each column.
 
 ## Plot Stylesheet Details
-Matplotlib uses key-value "rc parameters" to allow for customization of its properties and styles, and one way these parameters can be specified is with a [matplotlibrc file](https://matplotlib.org/3.4.3/tutorials/introductory/customizing.html#a-sample-matplotlibrc-file), which we simply refer to as the Plot Stylesheet. You can obtain a copy of the default stylesheet used by tiny-plot with the command `tiny get-template`. Please keep in mind that tiny-plot overrides these defaults for a few specific elements of certain plots. Feel free to reach out if there is a plot style you wish to override but find you are unable to.
+Matplotlib uses key-value "rc parameters" to allow for customization of its properties and styles, and one way these parameters can be specified is with a [matplotlibrc file](https://matplotlib.org/3.4.3/tutorials/introductory/customizing.html#a-sample-matplotlibrc-file), which we simply refer to as the Plot Stylesheet. You can obtain a copy of the default stylesheet used by tiny-plot with the command `tiny get-templates`. Please keep in mind that tiny-plot overrides these defaults for a few specific elements of certain plots. Feel free to reach out if there is a plot style you wish to override but find you are unable to.
diff --git a/doc/Parameters.md b/doc/Parameters.md
@@ -63,22 +63,22 @@ Optional arguments:
 
 ## tiny-count
 
-### All Features
-| Run Config Key         | Commandline Argument   |
-|------------------------|------------------------|
-| counter_all_features:  | `--all-features`       |
+### Get Templates
+| Run Config Key | Commandline Argument |
+|----------------|----------------------|
+|                | `--get-templates`    |
 
-By default, tiny-count will only evaluate alignments to features which match a `Select for...` & `with value...` of at least one rule in your Features Sheet. It is this matching feature set, and only this set, which is included in `feature_counts.csv` and therefore available for analysis by tiny-deseq.r and tiny-plot. Switching this option "on" will include all features in every input GFF file, regardless of attribute matches, for tiny-count and downstream steps.
+Copies the template configuration files required by tiny-count into the current directory. This argument can't be combined with `--paths-file`. All other arguments are ignored when provided, and once the templates have been copied tiny-count exits.
 
 ### Normalize by Hits
- | Run Config Key             | Commandline Argument      |
+| Run Config Key             | Commandline Argument      |
 |----------------------------|---------------------------|
 | counter-normalize-by-hits: | `--normalize-by-hits T/F` |
 
 By default, tiny-count will divide the number of counts associated with each sequence, twice, before they are assigned to a feature. Each unique sequence's count is determined by tiny-collapse (or a compatible collapsing utility) and is preserved through the alignment process. The original count is divided first by the number of loci that the sequence aligns to, and second by the number of features passing selection at each locus. Switching this option "off" disables the latter normalization step.
 
 ### Decollapse
- | Run Config Key      | Commandline Argument   |
+| Run Config Key      | Commandline Argument   |
 |---------------------|------------------------|
 | counter_decollapse: | `--decollapse`         |
 
@@ -89,63 +89,64 @@ The SAM files produced by the tinyRNA pipeline are collapsed by default; alignme
 |--------------------|----------------------|
 | counter_stepvector | `--stepvector`       |
 
-A custom Cython implementation of HTSeq's StepVector is used for finding features that overlap each alignment interval. While the core C++ component of the StepVector is the same, we have found that our Cython implementation can result in runtimes up to 50% faster than HTSeq's implementation. This parameter allows you to use HTSeq's StepVector if you wish (for example, if the Cython StepVector is incompatible with your system)
-
-### Allow Features with Multiple ID Values
- | Run Config Key         | Commandline Argument |
-|------------------------|----------------------|
-| counter_allow_multi_id | `--multi-id`         |
-
-By default, an error will be produced if a GFF file contains a feature with multiple comma separated values listed under its ID attribute. Switching this option "on" instructs tiny-count to accept these features without error, but only the first listed value is used as the ID.
+A custom Cython implementation of HTSeq's StepVector is used for finding features that overlap each alignment interval. While the core C++ component of the StepVector is the same, we have found that our Cython implementation can result in runtimes up to 50% faster than HTSeq's implementation. This parameter allows you to use HTSeq's StepVector if you wish.
 
 ### Is Pipeline
- | Run Config Key | Commandline Argument |
+| Run Config Key | Commandline Argument |
 |----------------|----------------------|
 |                | `--is-pipeline`      |
 
 This commandline argument tells tiny-count that it is running as a workflow step rather than a standalone/manual run. Under these conditions tiny-count will look for all input files in the current working directory regardless of the paths defined in the Samples Sheet and Features Sheet.
 
 ### Report Diags
- | Run Config Key | Commandline Argument |
+| Run Config Key | Commandline Argument |
 |----------------|----------------------|
 | counter_diags: | `--report-diags`     |
 
 Diagnostic information will include intermediate alignment files for each library and an additional stats table with information about counts that were not assigned to a feature. See [the description of these outputs](../README.md#Diagnostics) for details.
 
 ### Full tiny-count Help String
 ```
-tiny-count -pf PATHS -o OUTPUTPREFIX [-h] [-nh T/F] [-dc]
-           [-sv {Cython,HTSeq}] [-a] [-p] [-d]
+tiny-count (-pf FILE | --get-templates) [-o PREFIX] [-nh T/F] [-dc]
+           [-sv {Cython,HTSeq}] [-p] [-d]
 
-This submodule assigns feature counts for SAM alignments using a Feature Sheet
-ruleset. If you find that you are sourcing all of your input files from a
-prior run, we recommend that you instead run `tiny recount` within that run's
-directory.
+tiny-count is a precision counting tool for hierarchical classification and
+quantification of small RNA-seq reads
 
 Required arguments:
-  -pf PATHS, --paths-file PATHS
-                        your Paths File
-  -o OUTPUTPREFIX, --out-prefix OUTPUTPREFIX
-                        output prefix to use for file names
+  You must either provide a Paths File or request templates for detailing
+  your configuration.
+
+  -pf FILE, --paths-file FILE
+                        your Paths File (default: None)
+  --get-templates       Copies the template configuration files required by
+                        tiny-count into the current directory. (default:
+                        False)
 
 Optional arguments:
-  -h, --help            show this help message and exit
+  These options can be used in conjunction with the Paths File (-pf)
+  argument mentioned above.
+
+  -o PREFIX, --out-prefix PREFIX
+                        The output prefix to use for file names. All
+                        occurrences of the substring {timestamp} will be
+                        replaced with the current date and time. (default:
+                        tiny-count_{timestamp})
   -nh T/F, --normalize-by-hits T/F
                         If T/true, normalize counts by (selected) overlapping
-                        feature counts. Default: true.
+                        feature counts. (default: T)
   -dc, --decollapse     Create a decollapsed copy of all SAM files listed in
                         your Samples Sheet. This option is ignored for non-
-                        collapsed inputs.
+                        collapsed inputs. (default: False)
   -sv {Cython,HTSeq}, --stepvector {Cython,HTSeq}
                         Select which StepVector implementation is used to find
-                        features overlapping an interval.
-  -a, --all-features    Represent all features in output counts table, even if
-                        they did not match a Select for / with value.
+                        features overlapping an interval. (default: Cython)
   -p, --is-pipeline     Indicates that tiny-count was invoked as part of a
                         pipeline run and that input files should be sourced as
-                        such.
+                        such. (default: False)
   -d, --report-diags    Produce diagnostic information about
-                        uncounted/eliminated selection elements.
+                        uncounted/eliminated selection elements. (default:
+                        False)
 ```
 ## tiny-deseq.r
 
@@ -198,36 +199,36 @@ Optional arguments:
 ## tiny-plot
 
 ### Plot Requests
- | Run Config Key | Commandline Argument         |
+| Run Config Key | Commandline Argument         |
 |----------------|------------------------------|
 | plot_requests: | `--plots PLOT PLOT PLOT ...` |
 
 tiny-plot will only produce the list of plots requested.
 
 ### P value
- | Run Config Key | Commandline Argument |
+| Run Config Key | Commandline Argument |
 |----------------|----------------------|
 | plot_pval:     | `--p-value VALUE`    |
 
 Feature expression levels are considered significant if their P value is less than this value, with a default of 0.05. Non-differentially expressed features are plotted as gray points, and in `sample_avg_scatter_by_dge_class`, these points are not colored by feature class.
 
 ### Style Sheet
- | Run Config Key | Paths File Key    | Commandline Argument     |
+| Run Config Key | Paths File Key    | Commandline Argument     |
 |----------------|-------------------|--------------------------|
 |                | plot_style_sheet: | `--style-sheet MPLSTYLE` |
 
 The plot style sheet can be used to override the default Matplotlib styles used by tiny-plot. Unlike the other parameters, this option is found in the Paths File. See the [Plot Stylesheet documentation](Configuration.md#plot-stylesheet-details) for more information.
 
 ### Vector Scatter
- | Run Config Key      | Commandline Argument |
+| Run Config Key      | Commandline Argument |
 |---------------------|----------------------|
 | plot_vector_points: | `--vector-scatter`   |
 
 The scatter plots produced by tiny-plot have rasterized points by default. This allows for faster plot generation, smaller file sizes, and files that are more easily handled by PDF readers. Plots are produced in 300 dpi by default, so in most cases this rasterization is seldom noticeable under normal zoom levels. Switching this option "on" will cause points to be vectorized allowing for zooming without pixelation.
 >**Note**: only scatter points are rasterized with this option switched "off"; all other elements are vectorized in every plot type.
 
 ### Bounds for len_dist Charts
- | Run Config Key     | Commandline Argument   |
+| Run Config Key     | Commandline Argument   |
 |--------------------|------------------------|
 | plot_len_dist_min: | `--len-dist-min VALUE` | 
 | plot_len_dist_max: | `--len-dist-max VALUE` |
@@ -247,8 +248,8 @@ The labels that should be used for special groups in `class_charts` and `sample_
 tiny-plot [-rc RAW_COUNTS] [-nc NORM_COUNTS] [-uc RULE_COUNTS]
           [-ss STAT] [-dge COMPARISON [COMPARISON ...]]
           [-len 5P_LEN [5P_LEN ...]] [-h] [-o PREFIX] [-pv VALUE]
-          [-s MPLSTYLE] [-v] [-ldi VALUE] [-lda VALUE] -p PLOT
-          [PLOT ...]
+          [-s MPLSTYLE] [-v] [-ldi VALUE] [-lda VALUE] [-una LABEL]
+          [-unk LABEL] -p PLOT [PLOT ...]
 
 This script produces basic static plots for publication as part of the tinyRNA
 workflow. Input file requirements vary by plot type and you are free to supply

diff --git a/doc/Pipeline.md b/doc/Pipeline.md
@@ -3,7 +3,7 @@ The following commands deal with pipeline operations for carrying out end-to-end
 
 ```shell
 # Retrieving config files
-tiny get-template
+tiny get-templates
 tiny setup-cwl
 
 # End-to-end analysis

diff --git a/doc/tiny-count.md b/doc/tiny-count.md
@@ -4,19 +4,16 @@
 For an explanation of tiny-count's parameters in the Run Config and by commandline, see [the parameters documentation](Parameters.md#tiny-count).
 
 ## Resuming an End-to-End Analysis
-tiny-count offers a variety of options for refining your analysis. You might find that repeat analyses are required while tuning these options to your goals. However, the earlier pipeline steps are resource and time intensive, so it is inconvenient to rerun an end-to-end analysis to test new selection rules. Using the command `tiny recount`, tinyRNA will run the workflow starting at the tiny-count step using inputs from a prior end-to-end run. See the [pipeline resume documentation](Pipeline.md#resuming-a-prior-analysis) for details and prerequesites.
+tiny-count offers a variety of options for refining your analysis. You might find that repeat analyses are required while tuning these options to your goals. Using the command `tiny recount`, tinyRNA will run the workflow starting at the tiny-count step using inputs from a prior end-to-end run to save time. See the [pipeline resume documentation](Pipeline.md#resuming-a-prior-analysis) for details and prerequisites.
 
 ## Running as a Standalone Tool
-If you would like to run tiny-count as a standalone tool, not as part of an end-to-end or resumed analysis, you can do so with the command `tiny-count`. The command requires that you specify the paths to your Samples Sheet and Features Sheet, and a filename prefix for outputs. [All other arguments are optional](Parameters.md#full-tiny-count-help-string). You will need to make a copy of your Samples Sheet and modify it so that the `Input FASTQ Files` column instead contains paths to the corresponding SAM files from a prior end-to-end run. SAM files from third party sources are also supported, and can be produced from reads collapsed by tiny-collapse or fastx, or from non-collapsed reads.
-
->**Important:** reusing the same output filename prefix between standalone runs will result in prior outputs being overwritten.
+If you would like to run tiny-count as a standalone tool, not as part of an end-to-end or resumed analysis, you can do so with the command `tiny-count`. The command has [one required argument](Parameters.md#full-tiny-count-help-string): your Paths File. Your Samples Sheet will need to list SAM files rather than FASTQ files in the `FASTQ/SAM Files` column. SAM files from third party sources are also supported, and if they have been produced from reads collapsed by tiny-collapse or fastx, tiny-count will honor the reported read counts.
 
 #### Using Non-collapsed Sequence Alignments
-While third-party SAM files from non-collapsed reads are supported, there are some caveats. These files will result in substantially higher resource usage and runtimes; we strongly recommend collapsing prior to alignment. Additionally, the sequence-related stats produced by tiny-count will no longer represent _unique_ sequences. These stats will instead refer to all sequences with unique QNAMEs (that is, multi-alignment bundles still cary a sequence count of 1.)
+While third-party SAM files from non-collapsed reads are supported, there are some caveats. These files will result in substantially higher resource usage and runtimes; we strongly recommend collapsing prior to alignment. Additionally, the sequence-related stats produced by tiny-count will no longer represent _unique_ sequences. These stats will instead refer to all sequences with unique QNAMEs (that is, multi-alignment bundles still cary a sequence count of 1).
 
 
 # Feature Selection
-![Feature Selection Diagram](../images/tiny-count_selection.png)
 
 We provide a Features Sheet (`features.csv`) in which you can define selection rules to more accurately capture counts for the small RNAs of interest. The parameters for these rules include attributes commonly used in the classification of small RNAs, such as length, strandedness, and 5' nucleotide.
 
@@ -26,6 +23,8 @@ Selection occurs in three stages, with the output of each stage as input to the
 1. Features are matched to rules based on their attributes defined in GFF files
 2. At each alignment locus, overlapping features are selected based on the overlap requirements of their matched rules. Selected features are sorted by hierarchy value so that smaller values take precedence in the next stage.
 3. Finally, features are selected for read assignment based on the small RNA attributes of the alignment locus. Once reads are assigned to a feature, they are excluded from matches with larger hierarchy values.
+
+![Feature Selection Diagram](../images/tiny-count_selection.png)
 
 ## Stage 1: Feature Attribute Parameters
 | _features.csv columns:_ | Select for... | with value... | Classify as... | Source Filter | Type Filter |

diff --git a/setup.py b/setup.py
@@ -14,7 +14,7 @@
 AUTHOR = 'Kristen Brown, Alex Tate'
 PLATFORM = 'Unix'
 REQUIRES_PYTHON = '>=3.9.0'
-VERSION = '1.2'
+VERSION = '1.2.1'
 REQUIRED = []  # Required packages are installed via Conda's environment.yml
 
 

diff --git a/tests/testdata/config_files/features.csv b/tests/testdata/config_files/features.csv
@@ -0,0 +1,7 @@
+Select for...,with value...,Classify as...,Source Filter,Type Filter,Hierarchy,Strand,5' End Nucleotide,Length,Overlap
+Class,mask,,,,1,both,all,all,Partial
+Class,miRNA,,,,2,sense,all,16-22,Full
+Class,piRNA,5pA,,,2,both,A,24-32,Full
+Class,piRNA,5pT,,,2,both,T,24-32,Full
+Class,siRNA,,,,2,both,all,15-22,Full
+Class,unk,,,,3,both,all,all,Full