Skip to content

tiny-count: BAM support#304

Merged
taimontgomery merged 18 commits intomasterfrom
issue-302
May 2, 2023
Merged

tiny-count: BAM support#304
taimontgomery merged 18 commits intomasterfrom
issue-302

Conversation

@AlexTate
Copy link
Member

@AlexTate AlexTate commented May 1, 2023

BAM files are now supported. This PR also improves the routine that examines alignment file headers to ensure that they have compatible order.

  • Strictly compatible @HD headers are SO:queryname or GO:query
  • Strictly incompatible @HD headers are SO:coordinate or GO:reference (-> error)
  • Files with ambiguous @HD headers (SO:unsorted, SO:unknown, or GO:none) will have the last @PG header inspected as a fallback. If it reports anything other than Bowtie, Bowtie2, or STAR, it is an error. This is because these tools follow the multi-mapping adjacency convention by default (though this list is incomplete)

Additionally, the Samples Sheet is more thoroughly validated during tiny-count standalone runs. The Normalization column is also validated at pipeline startup during end-to-end runs

Closes #302

AlexTate added 15 commits April 22, 2023 17:46
…(or any other pipeline step). This means the Samples Sheet will be subject to many of the same validation steps in tiny-count standalone runs as it is at pipeline startup.

Additionally:
- The normalization column is now validated and stored by the class
- The samples sheet can contain .sam and .bam files if it is used in a standalone run context
- File paths are properly resolved when used in a pipeline step context
- tiny-deseq-related validations (control condition and deseq2 compatibility checks) are skipped when used in a standalone run or pipeline step context. They are only validated at pipeline startup.
- The class now reports the filename and row number if any exception is raised during csv parsing
…consistent, and more thorough! This means that the from_here() method can be removed from util.py, which is a good thing in the end because ConfigBase.from_here() (which SamplesSheet uses) is more developed
…he PathsFile class (and because it just makes more sense)
…instead of the superclass' line_num in error catching. This is more meaningful to the user since CSV records can technically span multiple lines
…row_num so it is redundant to add that info to AssertionErrors. Also adding more robust regex for matching normalization definitions
@taimontgomery
Copy link
Collaborator

Tested with Lib303 comparing sam and bam input files from tinyRNA output (collapsed or decollapsed). We might want to note in the documentation that the "Assigned Reads" reported depends on genomic and feature hit normalization.

@taimontgomery taimontgomery merged commit 5aac71b into master May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tiny-count: add BAM format support

2 participants