You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: START_HERE/tinyRNA_TUTORIAL.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,7 +32,7 @@ And when you're done, you can close your terminal or use `conda deactivate` to r
32
32
The output you see on your terminal is from `cwltool`, which coordinates the execution of the workflow CWL. The terminal output from individual steps is redirected to a logfile for later reference.
33
33
34
34
### File outputs
35
-
When the analysis is complete you'll notice a new folder has appeared whose name contains the date and time of the run. Inside you'll find subdirectories containing the file and terminal outputs for each step, and the processed Run Config file for auto-documentation of the run.
35
+
When the analysis is complete you'll notice a new timestamped folder has appeared. Inside you'll find subdirectories containing the file outputs for each step, and processed copies of your configuration files which serve as auto-documentation of the run. These configuration copies also allow for repeat analyses using the existing file outputs.
36
36
37
37
### Bowtie indexes
38
38
Bowtie indexes were built during this run because `paths.yml` didn't define an `ebwt` prefix. Now, you'll see the `ebwt` points to the freshly built indexes in your run directory. This means that indexes won't be rebuilt during any subsequent runs that use this `paths.yml` file. If you need to rebuild your indexes, simply delete the value to the right of `ebwt` in paths.yml
The `tiny run` command performs a comprehensive analysis of your [input files](../README.md#requirements-for-user-provided-input-files) according to the preferences defined in your [configuration files](Configuration.md).
19
19
20
20
## Resuming a Prior Analysis
21
-
The tiny-count and tiny-plot steps offer a wide variety of options for refining your analysis. You might find that repeat analyses are required while tuning these options to your goals. However, the earlier pipeline steps (fastp, tiny-collapse, and bowtie) handle the largest volume of data and are resource intensive, so you can save time by reusing their outputs for subsequent analyses. One could do so by running the later steps individually (e.g. using commands `tiny-count`, `tiny-deseq.r`, and `tiny-plot`), but assembling their commandline inputs by hand is labor-intensive and prone to spelling mistakes.
21
+
The tiny-count and tiny-plot steps offer many options for refining your analysis. You might find that repeat analyses are required while tuning these options to your goals. However, the earlier pipeline steps (fastp, tiny-collapse, and bowtie) handle the largest volume of data and are resource intensive, so you can save time by reusing their outputs for subsequent analyses.
22
+
23
+
The commands `tiny recount` and `tiny replot` allow the workflow to be resumed using outputs from a prior run. The Run Directory for each end-to-end analysis will contain the run's four primary configuration files, and these files can be freely edited to change the resume run's behavior without sacrificing auto-documentation.
22
24
23
-
The commands `tiny recount` and `tiny replot` seek to solve this problem. As discussed in the [Run Config documentation](Configuration.md#the-processed-run-config), the Run Directory for each end-to-end analysis will contain a processed Run Config, and this is the file that determines the behavior of a resume run.
24
25
25
26
<figurealign="center">
26
27
<figcaption><b>tiny recount</b></figcaption>
@@ -29,25 +30,19 @@ The commands `tiny recount` and `tiny replot` seek to solve this problem. As dis
You can modify the behavior of a resume run by changing settings in:
34
-
- The **processed** Run Config
35
-
- The **original** Features Sheet that was used for the end-to-end run (as indicated by `features_csv` in the processed Run Config)
36
-
- The **original** Paths File (as indicated by `paths_config` in the processed Run Config)
37
-
38
33
### The Steps
39
-
1. Make and save the desired changes in the files above
40
-
2. In your terminal, `cd` to the Run Directory of the end-to-end run you wish to resume
34
+
1. Make and save changes to the configuration files within the target Run Directory
35
+
2. In your terminal, `cd` to the target Run Directory
41
36
3. Run the desired resume command
42
37
43
-
### A Note on File Inputs
44
-
File inputs are sourced from the **original** output subdirectories of prior steps in the target Run Directory. For `tiny replot`, this means that files from previous executions of `tiny recount` will **not** be used as inputs; only the original end-to-end outputs are used.
38
+
### Auto-Documentation
39
+
Among the subdirectories produced in your Run Directory after an end-to-end run, you'll find a directory named "config" which holds a copy of the run's four primary configuration files. These files serve as documentation for the run and, unlike those found at the root of the Run Directory, they should not be modified. A timestamped "config" directory is created after each resume run to similarly document the configurations that were used.
45
40
46
-
### Where to Find Outputs from Resume Runs
41
+
### Resume Run Outputs
47
42
Output subdirectories for resume runs can be found alongside the originals, and will have a timestamp appended to their name to differentiate them.
48
43
49
-
### Auto-Documentation of Resume Runs
50
-
A new processed Run Config will be saved in the Run Directory at the beginning of each resume run. It will be labelled with the same timestamp used in the resume run's other outputs to differentiate it. It includes the changes to your Paths File and Run Config. A copy of your Features Sheet is saved to the timestamped tiny-count output directory during `tiny recount` runs.
44
+
### Repeated Analyses
45
+
If a `recount` run is performed and a `replot` is performed later in the same Run Directory, then only the outputs of the `recount` run are used for generating the plots. If multiple `recount` runs precede the `replot` then the most recent outputs are used.
51
46
52
47
## Parallelization
53
48
Most steps in the pipeline run in parallel to minimize runtimes. This is particularly advantageous for multiprocessor systems like server environments. However, parallelization isn't always beneficial. If your computer doesn't have enough free memory, or if you have a large sample file set and/or reference genome, parallel execution might push your machine to its limits. When this happens you might see memory errors or your computer may become unresponsive. In these cases it makes more sense to run resource intensive steps one at a time, in serial, rather than in parallel. To do so, set `run_parallel: false` in your Run Config. This will affect fastp, tiny-collapse, and bowtie since these steps typically handle the largest volumes of data.
0 commit comments