Merged
Conversation
…(or any other pipeline step). This means the Samples Sheet will be subject to many of the same validation steps in tiny-count standalone runs as it is at pipeline startup. Additionally: - The normalization column is now validated and stored by the class - The samples sheet can contain .sam and .bam files if it is used in a standalone run context - File paths are properly resolved when used in a pipeline step context - tiny-deseq-related validations (control condition and deseq2 compatibility checks) are skipped when used in a standalone run or pipeline step context. They are only validated at pipeline startup. - The class now reports the filename and row number if any exception is raised during csv parsing
…consistent, and more thorough! This means that the from_here() method can be removed from util.py, which is a good thing in the end because ConfigBase.from_here() (which SamplesSheet uses) is more developed
…he PathsFile class (and because it just makes more sense)
…roper ordering of alignments
…instead of the superclass' line_num in error catching. This is more meaningful to the user since CSV records can technically span multiple lines
…row_num so it is redundant to add that info to AssertionErrors. Also adding more robust regex for matching normalization definitions
…n Paths File and Run Config
…et class and the AlignmentReader class
…ls that produce "unordered" outputs
…iles that were missed earlier
Collaborator
|
Tested with Lib303 comparing sam and bam input files from tinyRNA output (collapsed or decollapsed). We might want to note in the documentation that the "Assigned Reads" reported depends on genomic and feature hit normalization. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BAM files are now supported. This PR also improves the routine that examines alignment file headers to ensure that they have compatible order.
@HDheaders areSO:querynameorGO:query@HDheaders areSO:coordinateorGO:reference(-> error)@HDheaders (SO:unsorted,SO:unknown, orGO:none) will have the last@PGheader inspected as a fallback. If it reports anything other than Bowtie, Bowtie2, or STAR, it is an error. This is because these tools follow the multi-mapping adjacency convention by default (though this list is incomplete)Additionally, the Samples Sheet is more thoroughly validated during tiny-count standalone runs. The Normalization column is also validated at pipeline startup during end-to-end runs
Closes #302