Skip to content

Pipeline: support for large bowtie indexes#238

Merged
taimontgomery merged 10 commits intomasterfrom
issue-237
Oct 19, 2022
Merged

Pipeline: support for large bowtie indexes#238
taimontgomery merged 10 commits intomasterfrom
issue-237

Conversation

@AlexTate
Copy link
Member

@AlexTate AlexTate commented Oct 14, 2022

Support for large bowtie indexes (*.ebwtl) has been folded into routines that handle these files.

This PR also has some general improvements for the bowtie-build step:

  • The activation procedure has been simplified. Run Config's run_bowtie_build is now an automatically configured value that is triggered by an empty ebwt prefix in the Paths Sheet. Users only have to worry about the latter value.
  • If the user defines ebwt but index files can't be found, and they also provided their reference genome files, then the pipeline will automatically rebuild the indexes and update the Paths Sheet at the end of the end-to-end run.
  • A "post-end-to-end run" procedure has been added to the Configuration class. It verifies bowtie outputs before saving related updates to the Run Config and Paths Sheet. This addresses a long running problem where end-to-end runs with index building would not save the updated ebwt in Paths Sheet if a downstream step produced an error. Now, the updated ebwt path is written to the Paths File at the end of any run where at least bowtie-build ran successfully.

Note: this branch was started from issue-235 so until its PR is closed this PR will also include changes from that branch
Closes #237

…fig's run_bowtie_build is now an automatically configured value that is triggered by an empty ebwt prefix in the Paths Sheet.
…ild activation procedure. If the user defines ebwt but index files can't be found, and they also provided their reference genome files, then the pipeline will automatically rebuild the indexes and update the Paths Sheet at the end of the end-to-end run.

setup_ebwt_idx() has also been significantly refactored and cleaned up. It has been bugging me for a long time and it feels good to see it in better shape.
… others that verify bowtie-build outputs were produced, updates index paths if long indexes were produced, then saves updates to the Paths Sheet and Run Config.

This addresses a long running problem where end-to-end runs with a bowtie-build step would not save the updated ebwt in Paths Sheet if a downstream step produced an error. Now, the updated ebwt path is written to the Paths File at the end of any run where at least bowtie-build ran successfully.
…lity fix for cases where ["reference_genome_files"] contains empty list items
…d descriptions of the new activation procedure for the bowtie-build step.
@AlexTate AlexTate changed the base branch from master to issue-3 October 17, 2022 22:21
@AlexTate AlexTate changed the base branch from issue-3 to master October 17, 2022 22:21
@taimontgomery
Copy link
Collaborator

Minimal testing since I didn't have ebwtl index files to work with.

@taimontgomery taimontgomery merged commit 81ef5e8 into master Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pipeline: add support for large bowtie indexes

2 participants