Plotter: outputs are organized into subdirectories by type#239
Merged
taimontgomery merged 23 commits intomasterfrom Oct 16, 2022
Merged
Plotter: outputs are organized into subdirectories by type#239taimontgomery merged 23 commits intomasterfrom
taimontgomery merged 23 commits intomasterfrom
Conversation
…tion from tiny-count. If multiple ID values are listed, they are now concatenated rather than selecting the first. I think this will be much more intuitive and it also releases ReferenceTables.get_figure_id() so that it can be used without a constructed ReferenceTables object. I've also converted the argparse output in tiny-count to a read-only dictionary. This prefs object is being passed around to a LOT of classes in tiny-count, and in doing so we risk accidentally changing preferences. This "bug" was previously leveraged by the StepVector routine; it has been refactored to no longer rely on the mutibility of prefs.
…e, then the value of Parent is used as the ID. It is no longer treated as an error.
…ceTables to its own standalone function. This allows parsing machinery to be shared with the new GFFValidation class.
…gnment_chroms_mismatch_heuristic()
…tup (configuration.py) and tiny-count startup (counter.py) GFF validation is treated as an optional step that must be specifically requested in configuration.py. This is because we will assume that resume runs are using inputs that have already been validated. GFF validation is skipped in tiny-count during pipeline runs. This is because we will assume that both end-to-end runs and resume runs are using inputs that have already been validated.
…ing printed. Adding this exception so that we can call sys.exit() on validation failure and let the validation report speak for itself, rather than following the report with an unnecessary stacktrace
…cs will now read up to 50,000 lines of each SAM file (while checking every 10,000 lines for chromosome matches) because it is quite a bit faster than I assumed. For 9 library files this represents only ~0.4s of runtime
…xisting tests have been updated.
…all 3 keys were queried with every function call, in reverse order from lowest to highest priority, even if the preferred key was present. Now the chain will check the highest priority keys first, and continue as soon as a match is found
…s mapped from True/False to +/-. Now, if a feature's strand is anything but +/-, it is mapped to None. The GFFValidator produces a warning about this but no longer treats it as a hard error. Per Tai, a strand type of None matches strand selectors for "sense", "antisense", and "both." 5' and 3' anchored selectors can also evaluate these features, but evaluation does not distinguish between 5' and 3' ends.
… for the Overlap column. I think this makes it easier to explain how the 5'/3' anchored selectors behave with unstranded features
…ures and adding that Parent is now used as a fallback ID attribute
…f unstranded features. Also refined/simplified the Overlap explanation in Stage 2
… feature has a Parent= but no ID/gene_id=. This was causing an infinite loop when ReferenceTables later tried to find the root ancestor of these features.
…and values after they have been parsed. Note that this happens after comma separated values have been split. This means that value list items can contain URL encoded commas which are then preserved as part of the value (rather than being split on the encoded comma)
…multiple parents aren't supported.
…efer to tiny-collapse as Collapser
…r C. elegans and Arabidopsis, then runs all four files through ReferenceTables.get(). Genomes need only be downloaded once. Nevertheless, this is a long-running test so I've set it for manual activation only
…es are created only for the plots requested, and only once a plot is complete and ready to be saved
…n the root level of the tiny-plot output directory rather than their own subdirectory. This is because there will always be just one PCA plot, whereas all other plot types will have multiple outputs with a sufficient number of groups and replicates. These changes also open up opportunities for downstream steps to further process tiny-plot outputs
Collaborator
|
Tested on ram1. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Outputs of tiny-plot are now organized into subdirectories. Directories are created only for the plots requested, and only once a plot is complete and ready to be saved.
PCA plots are placed in the root level of the tiny-plot output directory rather than their own subdirectory. This is because there will always be just one PCA plot, whereas all other plot types will have multiple outputs with a sufficient number of groups and replicates. These changes also open up opportunities for downstream steps to further process tiny-plot outputs.
Closes #233