Plotter: outputs are organized into subdirectories by type by AlexTate · Pull Request #239 · MontgomeryLab/tinyRNA

AlexTate · 2022-10-15T18:13:04Z

Outputs of tiny-plot are now organized into subdirectories. Directories are created only for the plots requested, and only once a plot is complete and ready to be saved.

PCA plots are placed in the root level of the tiny-plot output directory rather than their own subdirectory. This is because there will always be just one PCA plot, whereas all other plot types will have multiple outputs with a sufficient number of groups and replicates. These changes also open up opportunities for downstream steps to further process tiny-plot outputs.

Closes #233

…tion from tiny-count. If multiple ID values are listed, they are now concatenated rather than selecting the first. I think this will be much more intuitive and it also releases ReferenceTables.get_figure_id() so that it can be used without a constructed ReferenceTables object. I've also converted the argparse output in tiny-count to a read-only dictionary. This prefs object is being passed around to a LOT of classes in tiny-count, and in doing so we risk accidentally changing preferences. This "bug" was previously leveraged by the StepVector routine; it has been refactored to no longer rely on the mutibility of prefs.

…e, then the value of Parent is used as the ID. It is no longer treated as an error.

…ceTables to its own standalone function. This allows parsing machinery to be shared with the new GFFValidation class.

…gnment_chroms_mismatch_heuristic()

…tup (configuration.py) and tiny-count startup (counter.py) GFF validation is treated as an optional step that must be specifically requested in configuration.py. This is because we will assume that resume runs are using inputs that have already been validated. GFF validation is skipped in tiny-count during pipeline runs. This is because we will assume that both end-to-end runs and resume runs are using inputs that have already been validated.

…ing printed. Adding this exception so that we can call sys.exit() on validation failure and let the validation report speak for itself, rather than following the report with an unnecessary stacktrace

…cs will now read up to 50,000 lines of each SAM file (while checking every 10,000 lines for chromosome matches) because it is quite a bit faster than I assumed. For 9 library files this represents only ~0.4s of runtime

…xisting tests have been updated.

…all 3 keys were queried with every function call, in reverse order from lowest to highest priority, even if the preferred key was present. Now the chain will check the highest priority keys first, and continue as soon as a match is found

…s mapped from True/False to +/-. Now, if a feature's strand is anything but +/-, it is mapped to None. The GFFValidator produces a warning about this but no longer treats it as a hard error. Per Tai, a strand type of None matches strand selectors for "sense", "antisense", and "both." 5' and 3' anchored selectors can also evaluate these features, but evaluation does not distinguish between 5' and 3' ends.

… for the Overlap column. I think this makes it easier to explain how the 5'/3' anchored selectors behave with unstranded features

…ures and adding that Parent is now used as a fallback ID attribute

…f unstranded features. Also refined/simplified the Overlap explanation in Stage 2

… feature has a Parent= but no ID/gene_id=. This was causing an infinite loop when ReferenceTables later tried to find the root ancestor of these features.

…and values after they have been parsed. Note that this happens after comma separated values have been split. This means that value list items can contain URL encoded commas which are then preserved as part of the value (rather than being split on the encoded comma)

…multiple parents aren't supported.

…efer to tiny-collapse as Collapser

…r C. elegans and Arabidopsis, then runs all four files through ReferenceTables.get(). Genomes need only be downloaded once. Nevertheless, this is a long-running test so I've set it for manual activation only

…es are created only for the plots requested, and only once a plot is complete and ready to be saved

…n the root level of the tiny-plot output directory rather than their own subdirectory. This is because there will always be just one PCA plot, whereas all other plot types will have multiple outputs with a sufficient number of groups and replicates. These changes also open up opportunities for downstream steps to further process tiny-plot outputs

taimontgomery · 2022-10-16T04:36:36Z

Tested on ram1.

AlexTate added 23 commits October 6, 2022 18:08

If a feature lacks an ID/gene_id attribute, but has a Parent attribut…

d0de331

…e, then the value of Parent is used as the ID. It is no longer treated as an error.

The GFF parsing loop (and error handling) has been moved from Referen…

9f47f7c

…ceTables to its own standalone function. This allows parsing machinery to be shared with the new GFFValidation class.

Unit tests for the new GFFValidation class. Still needs tests for ali…

f83f2f1

…gnment_chroms_mismatch_heuristic()

Small corrections for configuration.py's usage of GFFValidator

71ac07a

Script termination via sys.exit() no longer results in a traceback be…

8743abb

…ing printed. Adding this exception so that we can call sys.exit() on validation failure and let the validation report speak for itself, rather than following the report with an unnecessary stacktrace

Final corrections for unit tests. Missing tests have been added and e…

3bd3ca3

…xisting tests have been updated.

Decided to add IntervalAnchorMatch to the list of available selectors…

f4e619d

… for the Overlap column. I think this makes it easier to explain how the 5'/3' anchored selectors behave with unstranded features

Updates for the input file requirements table. Removing stranded feat…

6a67b89

…ures and adding that Parent is now used as a fallback ID attribute

Adding the "anchored" overlap selector and updates for the behavior o…

08221c0

…f unstranded features. Also refined/simplified the Overlap explanation in Stage 2

Small correction/refinement of Stage 2 explanation

ba6659e

Update to support the new None strand type

6d21b23

Bugfix to avoid circular references in ReferenceTables.parents when a…

b384d1b

… feature has a Parent= but no ID/gene_id=. This was causing an infinite loop when ReferenceTables later tried to find the root ancestor of these features.

Updating GFF file requirements to notify users that features listing …

12b3872

…multiple parents aren't supported.

Unrelated minor changes: correcting user facing error messages that r…

2bf00aa

…efer to tiny-collapse as Collapser

Added a test that downloads complete GFF/GTF genomes from Ensemble fo…

73ed931

…r C. elegans and Arabidopsis, then runs all four files through ReferenceTables.get(). Genomes need only be downloaded once. Nevertheless, this is a long-running test so I've set it for manual activation only

Outputs of tiny-plot are now organized into subdirectories. Directori…

7a927de

…es are created only for the plots requested, and only once a plot is complete and ready to be saved

AlexTate requested a review from taimontgomery October 15, 2022 18:13

taimontgomery merged commit 7f1f982 into master Oct 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plotter: outputs are organized into subdirectories by type#239

Plotter: outputs are organized into subdirectories by type#239
taimontgomery merged 23 commits intomasterfrom
issue-233

AlexTate commented Oct 15, 2022

Uh oh!

taimontgomery commented Oct 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AlexTate commented Oct 15, 2022

Uh oh!

taimontgomery commented Oct 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants