Conversation
This commit adds the posibility of using nanoseq masks in deepCSA. New parameters are added both in nextflow_schema and nextflow config. No major other changes are made as nanoseq masks use the same script as FILTEREXONS and FILTERPANELS.
These changes are copied from branch input-with-click, more specifically from commit 0af42a9.
When using filterbed.py, if using positive you filter positions in the bed file and when using negative you filter positions not in the bed file. This commit adjusts the parameters in nanoseq filters to adjust to this behaviour.
The click implementation is usefull to add the --positive flag for those bed files with the positive = true parameter in the modules.conf
This commits moves the import from the main workflow to the MUTATION_PREPROCESSING subworkflow. This is cleaner and easier to maintain.
FerriolCalvet
left a comment
There was a problem hiding this comment.
All looks good!
I left some comments mainly related with the style I would be following, and making one more suggestion on splitting the filtering of masks.
Improve clarity.
To avoid non-intended output, we define filtername empty if not defined instead of "covered".
The functions negative_filter_panel_regions and positive_filter_panel_regions have been unified into one function: filter_panel. The logic is exactly the same. A new function is created to remove non canonical chromosomes in the positions dataframe (from the bed file). Non canonical chromosomes were giving problems when merging with sample_maf as "chr" was not detected.
|
In the last commit I have added some modifications to filterbed.py:
If you are not sure about the first change we can remove it |
…ement The if statement for the nanoseq masks has been divided to handle them individually, in case only one is provided. Also, assigning a value to a channel twive is avoided by adding "else" statements.
|
All done (en principi), if there is something else you think could be improved tell me!! About the nanoseq masks being applied individually, we could do the same modification in deepUMIcaller as well |
FerriolCalvet
left a comment
There was a problem hiding this comment.
python update looks good pending to discuss the filtering of chromosomes
and I made some proposal for the nf script
Taking into account if nanoseq masks were applied.
Taking into account if nanoseq masks were applied.
- fix paths in test_real - update order of variables in schema
commit e7ace44 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Fri Oct 10 09:44:35 2025 +0200 v1.0.0 fixes (#380) * fix syntax of optional - fix ambiguity in features list definition * remove optional input definition commit e409639 Merge: 14640cd 1a61fc9 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Wed Oct 8 15:09:55 2025 +0200 Merge pull request #377 from bbglab/dev New release: v1.0.0 Ter commit 1a61fc9 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Wed Oct 8 11:25:01 2025 +0200 update documentation tackling several issues commit cc808de Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Wed Oct 8 10:23:45 2025 +0200 update naming of summary mutation plots commit 2ab4e65 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Tue Oct 7 23:15:35 2025 +0200 fix typos and make inputs of expand regions optional commit 6bb325e Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Tue Oct 7 23:03:33 2025 +0200 apply review suggestions commit 872809d Merge: 2ac1bdb 14640cd Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Sun Oct 5 16:07:02 2025 +0200 Merge branch 'main' into dev commit 2ac1bdb Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Sat Oct 4 17:52:17 2025 +0200 add tools' explanation in docs - add adjusted mutation density explanation - rename subworkflow directory commit 275bd68 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Fri Oct 3 23:16:31 2025 +0200 update features groups documentation commit 67a902e Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Tue Sep 30 09:12:55 2025 +0200 add nanoseq masks to default filtering - add also gnomAD_SNP - add documentation on Nanoseq masks commit b06e900 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Mon Sep 29 09:17:47 2025 +0200 add test setup and first tests (#375) * first definition of tests - to be tested * add first semi-working version of pipeline level tests * add first module testing for EXPANDREGIONS - testing focussed in PPM1D gene - confirm preferred behaviour for this process if omega_withingene is true, but no option of subgenic element definition is activated it fails - stub mode set up pending * tests working for EXPANDREGIONS * update snapshot * minor python fixes * changes after PR review commit 7caf653 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Sep 25 16:43:38 2025 +0200 update deepCSA diagram commit 8fa7998 Author: Marta Huertas <97596516+m-huertasp@users.noreply.github.com> Date: Thu Sep 25 16:31:39 2025 +0200 Add Nanoseq masks as filters (#374) * feature: add nanoseq masks to FILTERS This commit adds the posibility of using nanoseq masks in deepCSA. New parameters are added both in nextflow_schema and nextflow config. No major other changes are made as nanoseq masks use the same script as FILTEREXONS and FILTERPANELS. * feature: add click to handle inputs These changes are copied from branch input-with-click, more specifically from commit 0af42a9. * refactor: add positive parameter When using filterbed.py, if using positive you filter positions in the bed file and when using negative you filter positions not in the bed file. This commit adjusts the parameters in nanoseq filters to adjust to this behaviour. * refactor: implement with click and add positive parameter The click implementation is usefull to add the --positive flag for those bed files with the positive = true parameter in the modules.conf * refactor: import nanoseq files in subworkflow This commits moves the import from the main workflow to the MUTATION_PREPROCESSING subworkflow. This is cleaner and easier to maintain. * docs: add nanoseq masks paths * refactor: remove debug printing * refactor: move publish dir instructions Improve clarity. * refactor: move nanoseq masks paths to cluster configuration * refactor: simplify definitions and avoid non-intended output To avoid non-intended output, we define filtername empty if not defined instead of "covered". * refactor: unify filters into one and remove non canonical chromosomes The functions negative_filter_panel_regions and positive_filter_panel_regions have been unified into one function: filter_panel. The logic is exactly the same. A new function is created to remove non canonical chromosomes in the positions dataframe (from the bed file). Non canonical chromosomes were giving problems when merging with sample_maf as "chr" was not detected. * refactor: apply nanoseq masks individually with cleaner channel management The if statement for the nanoseq masks has been divided to handle them individually, in case only one is provided. Also, assigning a value to a channel twive is avoided by adding "else" statements. * refactor: add one liner to create filtered maf panels variable Taking into account if nanoseq masks were applied. * refactor: add one liner to create filtered maf panels variable Taking into account if nanoseq masks were applied. * minor update in mut preprocessing style - fix paths in test_real - update order of variables in schema --------- Co-authored-by: FerriolCalvet <ferriolcalvet@gmail.com> commit 42c85f6 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Thu Sep 25 12:37:10 2025 +0200 Update container for HDP signature extraction (#362) * update hdp_wrapper container * add ignore strategy to compare signatures step * add tmp fixes configs commit 5b9ed08 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Fri Sep 19 21:45:57 2025 +0200 fix bug in redefinition of panel with subgenic elements (#373) * fix bug in redefinition of exons and domains - now if a subgenic element is partially covered, it is still included in the expanded file, before it was not Missing: -documentation * add docs * fix bug in end coordinate when matching * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> commit 3d8fe1f Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Wed Sep 17 20:00:41 2025 +0200 Add globalloc synonymous numbers QC (#370) * add globalloc synonymous numbers qc - added all the plots and correlation computations of obs. vs estimated numbers of synonymous mutations * update omega syn qc - working version with plots and tsv outputs commit 03a69f6 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Tue Sep 16 10:39:30 2025 +0200 fix bug that outputted empty maf files (#367) * fix remove creation of empty MAFs * address #337 commit 3fb032d Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Sat Sep 13 11:14:47 2025 +0200 add minor fix to plotting needles for groups commit 9adbe74 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Sat Sep 13 11:05:11 2025 +0200 Allow the option to plot selection and saturation at the level of groups (#366) * init plotting groups * needle plots and selection working for groups - add param to plot only cohort or all custom groups - update groups.json generation missing: - pass site comparison plots & test saturation * fix saturation plots working for groups - fix domain selection plotting as png not pdf commit 14640cd Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Mon Jul 28 18:01:31 2025 +0200 minor updates documentation related
* add profile concatenation and cosine sim plotting -not tested * fix concat profiles working - separated plots for samples and groups * Squashed commit of the following: commit e7ace44 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Fri Oct 10 09:44:35 2025 +0200 v1.0.0 fixes (#380) * fix syntax of optional - fix ambiguity in features list definition * remove optional input definition commit e409639 Merge: 14640cd 1a61fc9 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Wed Oct 8 15:09:55 2025 +0200 Merge pull request #377 from bbglab/dev New release: v1.0.0 Ter commit 1a61fc9 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Wed Oct 8 11:25:01 2025 +0200 update documentation tackling several issues commit cc808de Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Wed Oct 8 10:23:45 2025 +0200 update naming of summary mutation plots commit 2ab4e65 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Tue Oct 7 23:15:35 2025 +0200 fix typos and make inputs of expand regions optional commit 6bb325e Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Tue Oct 7 23:03:33 2025 +0200 apply review suggestions commit 872809d Merge: 2ac1bdb 14640cd Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Sun Oct 5 16:07:02 2025 +0200 Merge branch 'main' into dev commit 2ac1bdb Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Sat Oct 4 17:52:17 2025 +0200 add tools' explanation in docs - add adjusted mutation density explanation - rename subworkflow directory commit 275bd68 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Fri Oct 3 23:16:31 2025 +0200 update features groups documentation commit 67a902e Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Tue Sep 30 09:12:55 2025 +0200 add nanoseq masks to default filtering - add also gnomAD_SNP - add documentation on Nanoseq masks commit b06e900 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Mon Sep 29 09:17:47 2025 +0200 add test setup and first tests (#375) * first definition of tests - to be tested * add first semi-working version of pipeline level tests * add first module testing for EXPANDREGIONS - testing focussed in PPM1D gene - confirm preferred behaviour for this process if omega_withingene is true, but no option of subgenic element definition is activated it fails - stub mode set up pending * tests working for EXPANDREGIONS * update snapshot * minor python fixes * changes after PR review commit 7caf653 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Sep 25 16:43:38 2025 +0200 update deepCSA diagram commit 8fa7998 Author: Marta Huertas <97596516+m-huertasp@users.noreply.github.com> Date: Thu Sep 25 16:31:39 2025 +0200 Add Nanoseq masks as filters (#374) * feature: add nanoseq masks to FILTERS This commit adds the posibility of using nanoseq masks in deepCSA. New parameters are added both in nextflow_schema and nextflow config. No major other changes are made as nanoseq masks use the same script as FILTEREXONS and FILTERPANELS. * feature: add click to handle inputs These changes are copied from branch input-with-click, more specifically from commit 0af42a9. * refactor: add positive parameter When using filterbed.py, if using positive you filter positions in the bed file and when using negative you filter positions not in the bed file. This commit adjusts the parameters in nanoseq filters to adjust to this behaviour. * refactor: implement with click and add positive parameter The click implementation is usefull to add the --positive flag for those bed files with the positive = true parameter in the modules.conf * refactor: import nanoseq files in subworkflow This commits moves the import from the main workflow to the MUTATION_PREPROCESSING subworkflow. This is cleaner and easier to maintain. * docs: add nanoseq masks paths * refactor: remove debug printing * refactor: move publish dir instructions Improve clarity. * refactor: move nanoseq masks paths to cluster configuration * refactor: simplify definitions and avoid non-intended output To avoid non-intended output, we define filtername empty if not defined instead of "covered". * refactor: unify filters into one and remove non canonical chromosomes The functions negative_filter_panel_regions and positive_filter_panel_regions have been unified into one function: filter_panel. The logic is exactly the same. A new function is created to remove non canonical chromosomes in the positions dataframe (from the bed file). Non canonical chromosomes were giving problems when merging with sample_maf as "chr" was not detected. * refactor: apply nanoseq masks individually with cleaner channel management The if statement for the nanoseq masks has been divided to handle them individually, in case only one is provided. Also, assigning a value to a channel twive is avoided by adding "else" statements. * refactor: add one liner to create filtered maf panels variable Taking into account if nanoseq masks were applied. * refactor: add one liner to create filtered maf panels variable Taking into account if nanoseq masks were applied. * minor update in mut preprocessing style - fix paths in test_real - update order of variables in schema --------- Co-authored-by: FerriolCalvet <ferriolcalvet@gmail.com> commit 42c85f6 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Thu Sep 25 12:37:10 2025 +0200 Update container for HDP signature extraction (#362) * update hdp_wrapper container * add ignore strategy to compare signatures step * add tmp fixes configs commit 5b9ed08 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Fri Sep 19 21:45:57 2025 +0200 fix bug in redefinition of panel with subgenic elements (#373) * fix bug in redefinition of exons and domains - now if a subgenic element is partially covered, it is still included in the expanded file, before it was not Missing: -documentation * add docs * fix bug in end coordinate when matching * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> commit 3d8fe1f Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Wed Sep 17 20:00:41 2025 +0200 Add globalloc synonymous numbers QC (#370) * add globalloc synonymous numbers qc - added all the plots and correlation computations of obs. vs estimated numbers of synonymous mutations * update omega syn qc - working version with plots and tsv outputs commit 03a69f6 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Tue Sep 16 10:39:30 2025 +0200 fix bug that outputted empty maf files (#367) * fix remove creation of empty MAFs * address #337 commit 3fb032d Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Sat Sep 13 11:14:47 2025 +0200 add minor fix to plotting needles for groups commit 9adbe74 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Sat Sep 13 11:05:11 2025 +0200 Allow the option to plot selection and saturation at the level of groups (#366) * init plotting groups * needle plots and selection working for groups - add param to plot only cohort or all custom groups - update groups.json generation missing: - pass site comparison plots & test saturation * fix saturation plots working for groups - fix domain selection plotting as png not pdf commit 14640cd Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Mon Jul 28 18:01:31 2025 +0200 minor updates documentation related * minor updates in plotting settings * fix naming
Description
This pull request introduces support for applying Nanoseq-based genomic masks (SNP and noise masks) to mutation preprocessing in the pipeline, specifically for human samples. It adds new parameters, updates configuration and workflow logic, and modifies the
FILTERBEDprocess to support a positive selection flag.Nanoseq mask integration:
nanoseq_snpandnanoseq_noiseto the pipeline configuration (nextflow.config,nextflow_schema.json,conf/test_real.config) to specify BED files for common SNPs and noisy genomic sites from the Nanoseq pipeline. [1] [2] [3]mutationpreprocessingsubworkflow to conditionally apply Nanoseq SNP and noise masks using newFILTERNANOSEQSNPandFILTERNANOSEQNOISEprocesses, only for human samples and when both files are provided. [1] [2]Process and module updates:
FILTERNANOSEQSNPandFILTERNANOSEQNOISEinmodules.configand ensured their outputs are not published. [1] [2]FILTERBEDprocess to support a--positiveflag, enabling positive selection filtering as required by the Nanoseq masks.Related
Documentation about nanoseq should be added. We can do it in this pull request or in following ones.
Closes #355