Skip to content

Use new sequence type in alignment.#741

Merged
tmooney merged 6 commits intogenome:masterfrom
tmooney:dna_pipelines_use_sequence_type
Aug 13, 2019
Merged

Use new sequence type in alignment.#741
tmooney merged 6 commits intogenome:masterfrom
tmooney:dna_pipelines_use_sequence_type

Conversation

@tmooney
Copy link
Member

@tmooney tmooney commented Jul 23, 2019

This is #740 plus an implementation of that concept in the pipelines that include the standard alignment subworkflows. I had to make an adapter subworkflow because Cromwell was not staging the file inputs when they were pulled as arguments directly leading to errors about files not being found.

I've tested this on the somatic_exome.yaml example but not run any full-scale pipelines yet.

@tmooney tmooney force-pushed the dna_pipelines_use_sequence_type branch from 2914f78 to b2b638d Compare July 23, 2019 21:26
Copy link
Member

@apaul7 apaul7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should sequence_align_and_tag_adapter.cwl be under definitions/subworkflows/ and not definitions/tools/?

@tmooney
Copy link
Member Author

tmooney commented Aug 1, 2019

My snarky answer would be "it shouldn't exist at all!", but I moved it under subworkflows 😄.

Copy link
Member

@apaul7 apaul7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@jasonwalker80 jasonwalker80 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, +1.

One comment, the new type is sequence_data and it has two items, sequence and readgroup. In the workflows, the inputs are always named sequence even though they are sequence_data inputs. One suggestion for consideration is to use tumor_sequence_data instead of tumor_sequence to avoid the confusion between the sequence_data input vs. the actual sequence property, ie. BAM or FASTQ, and the readgroup property.

NOTE: I realize my differentiation between input vs. property may not be correct. I'm happy to discuss in slack in more detail if this is confusing.

@tmooney tmooney merged commit 60edaf6 into genome:master Aug 13, 2019
@tmooney tmooney deleted the dna_pipelines_use_sequence_type branch August 13, 2019 18:02
@chrisamiller chrisamiller mentioned this pull request Sep 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments