Merged
Conversation
- missing plotting outputs - missing summary table of contamination comparisons - not tested
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR adds a contamination quantification step to the mutation preprocessing workflow by introducing a new COMPUTE_CONTAMINATION process and wiring it into the existing pipeline.
- Added
COMPUTE_CONTAMINATIONprocess inmodules/local/contamination/main.nfto compute contamination metrics and produce TSV and optional PDF outputs. - Imported and invoked
CONTAMINATIONinsubworkflows/local/mutationpreprocessing/main.nf, with new channels for raw and somatic mutations.
Reviewed Changes
Copilot reviewed 2 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| subworkflows/local/mutationpreprocessing/main.nf | Imported CONTAMINATION and created raw_muts_all_samples channel before invoking contamination step |
| modules/local/contamination/main.nf | Defined the COMPUTE_CONTAMINATION process with inputs, outputs, script, and stub sections |
Comments suppressed due to low confidence (2)
modules/local/contamination/main.nf:10
- The variable name
meta2is ambiguous; consider renaming it to something more descriptive likesomatic_metato clearly distinguish it from the primarymeta.
tuple val(meta2), path(somatic_maf)
modules/local/contamination/main.nf:1
- There are no tests covering the new
COMPUTE_CONTAMINATIONprocess. Consider adding unit or integration tests to validate the TSV and PDF outputs under various data scenarios.
process COMPUTE_CONTAMINATION {
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add comparison of germline and somatic variants of the cohort to quantify potential intersample contamination
AI summary
This pull request introduces a new process for computing contamination in mutation data and integrates it into the mutation preprocessing workflow. The key changes include adding the
COMPUTE_CONTAMINATIONprocess, updating the workflow to use it, and modifying channels to handle the necessary data.Addition of the
COMPUTE_CONTAMINATIONprocess:COMPUTE_CONTAMINATIONinmodules/local/contamination/main.nf. This process computes contamination using input mutation files (mafandsomatic_maf), outputs contamination results as TSV files, and optionally generates contamination plots as PDFs. It also records the Python version used in aversions.ymlfile.Integration into the mutation preprocessing workflow:
COMPUTE_CONTAMINATIONprocess into the mutation preprocessing workflow by adding anincludestatement insubworkflows/local/mutationpreprocessing/main.nf.MUTATION_PREPROCESSINGworkflow to create a new channelraw_muts_all_samplesby joining metadata with named mutation files, and passed this channel along withmuts_all_samplesto theCOMPUTE_CONTAMINATIONprocess.