Skip to content

Plotter: outputs are organized into subdirectories by type#239

Merged
taimontgomery merged 23 commits intomasterfrom
issue-233
Oct 16, 2022
Merged

Plotter: outputs are organized into subdirectories by type#239
taimontgomery merged 23 commits intomasterfrom
issue-233

Conversation

@AlexTate
Copy link
Member

Outputs of tiny-plot are now organized into subdirectories. Directories are created only for the plots requested, and only once a plot is complete and ready to be saved.

PCA plots are placed in the root level of the tiny-plot output directory rather than their own subdirectory. This is because there will always be just one PCA plot, whereas all other plot types will have multiple outputs with a sufficient number of groups and replicates. These changes also open up opportunities for downstream steps to further process tiny-plot outputs.

Closes #233

…tion from tiny-count. If multiple ID values are listed, they are now concatenated rather than selecting the first. I think this will be much more intuitive and it also releases ReferenceTables.get_figure_id() so that it can be used without a constructed ReferenceTables object.

I've also converted the argparse output in tiny-count to a read-only dictionary. This prefs object is being passed around to a LOT of classes in tiny-count, and in doing so we risk accidentally changing preferences. This "bug" was previously leveraged by the StepVector routine; it has been refactored to no longer rely on the mutibility of prefs.
…e, then the value of Parent is used as the ID. It is no longer treated as an error.
…ceTables to its own standalone function. This allows parsing machinery to be shared with the new GFFValidation class.
…tup (configuration.py) and tiny-count startup (counter.py)

GFF validation is treated as an optional step that must be specifically requested in configuration.py. This is because we will assume that resume runs are using inputs that have already been validated.

GFF validation is skipped in tiny-count during pipeline runs. This is because we will assume that both end-to-end runs and resume runs are using inputs that have already been validated.
…ing printed. Adding this exception so that we can call sys.exit() on validation failure and let the validation report speak for itself, rather than following the report with an unnecessary stacktrace
…cs will now read up to 50,000 lines of each SAM file (while checking every 10,000 lines for chromosome matches) because it is quite a bit faster than I assumed. For 9 library files this represents only ~0.4s of runtime
…all 3 keys were queried with every function call, in reverse order from lowest to highest priority, even if the preferred key was present. Now the chain will check the highest priority keys first, and continue as soon as a match is found
…s mapped from True/False to +/-. Now, if a feature's strand is anything but +/-, it is mapped to None. The GFFValidator produces a warning about this but no longer treats it as a hard error.

Per Tai, a strand type of None matches strand selectors for "sense", "antisense", and "both." 5' and 3' anchored selectors can also evaluate these features, but evaluation does not distinguish between 5' and 3' ends.
… for the Overlap column. I think this makes it easier to explain how the 5'/3' anchored selectors behave with unstranded features
…ures and adding that Parent is now used as a fallback ID attribute
…f unstranded features. Also refined/simplified the Overlap explanation in Stage 2
… feature has a Parent= but no ID/gene_id=. This was causing an infinite loop when ReferenceTables later tried to find the root ancestor of these features.
…and values after they have been parsed. Note that this happens after comma separated values have been split. This means that value list items can contain URL encoded commas which are then preserved as part of the value (rather than being split on the encoded comma)
…r C. elegans and Arabidopsis, then runs all four files through ReferenceTables.get(). Genomes need only be downloaded once. Nevertheless, this is a long-running test so I've set it for manual activation only
…es are created only for the plots requested, and only once a plot is complete and ready to be saved
…n the root level of the tiny-plot output directory rather than their own subdirectory. This is because there will always be just one PCA plot, whereas all other plot types will have multiple outputs with a sufficient number of groups and replicates. These changes also open up opportunities for downstream steps to further process tiny-plot outputs
@taimontgomery
Copy link
Collaborator

Tested on ram1.

@taimontgomery taimontgomery merged commit 7f1f982 into master Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tiny-plot: place outputs into subdirectories by plot type

2 participants