Skip to content

tiny-count: fix handling of gzipped tiny-collapse outputs for SummaryStats#222

Merged
taimontgomery merged 2 commits intomasterfrom
issue-221
Aug 15, 2022
Merged

tiny-count: fix handling of gzipped tiny-collapse outputs for SummaryStats#222
taimontgomery merged 2 commits intomasterfrom
issue-221

Conversation

@AlexTate
Copy link
Member

@AlexTate AlexTate commented Aug 10, 2022

The SummaryStats class now properly handles gzipped tiny-collapse outputs when the user elects to have them produced. Previously, it assumed all tiny-collapse outputs to be plaintext

Closes #221

…with a terminating wildcard. This will return the correct filename regardless of the user's compression settings for tiny-collapse.

When a gzipped filename is detected, we have to take a more brutish approach to parsing the unique sequence # from the last FASTA header. Read the last 250 bytes from the file, parse lines, remove trailing blank lines, then assume the second to last line to be the final header.
@AlexTate AlexTate added the bug Something isn't working label Aug 10, 2022
@AlexTate AlexTate requested a review from taimontgomery August 10, 2022 03:31
@AlexTate
Copy link
Member Author

Leaving this PR as a draft during final testing.

An issue remains: SummaryStats for standalone runs that source third party inputs. Currently:

  • At worst: the user follows the same filename suffix conventions as tiny-collapse and places these outputs in their CWD when executing tiny-count. This will lead to a crash. To help, we could re-implement an old policy of requiring the --is-pipeline flag before even searching for pipeline outputs in the CWD. This would work only as long as the user is honest with the flag...
  • At best: the above requirements aren't met and tiny-count assumes a non-pipeline mode, and in this case it doesn't bother parsing tiny-collapse outputs in the CWD

@AlexTate AlexTate marked this pull request as ready for review August 10, 2022 17:55
@taimontgomery taimontgomery merged commit 40874c2 into master Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tiny-count: gzipped tiny-collapse outputs crash SummaryStats' determination of unique sequence count

2 participants