Skip to content

Add Nanoseq masks as filters#374

Merged
FerriolCalvet merged 17 commits intodevfrom
feature/355-add-nanoseq-masks
Sep 25, 2025
Merged

Add Nanoseq masks as filters#374
FerriolCalvet merged 17 commits intodevfrom
feature/355-add-nanoseq-masks

Conversation

@m-huertasp
Copy link
Collaborator

Description

This pull request introduces support for applying Nanoseq-based genomic masks (SNP and noise masks) to mutation preprocessing in the pipeline, specifically for human samples. It adds new parameters, updates configuration and workflow logic, and modifies the FILTERBED process to support a positive selection flag.

Nanoseq mask integration:

  • Added new parameters nanoseq_snp and nanoseq_noise to the pipeline configuration (nextflow.config, nextflow_schema.json, conf/test_real.config) to specify BED files for common SNPs and noisy genomic sites from the Nanoseq pipeline. [1] [2] [3]
  • Updated the mutationpreprocessing subworkflow to conditionally apply Nanoseq SNP and noise masks using new FILTERNANOSEQSNP and FILTERNANOSEQNOISE processes, only for human samples and when both files are provided. [1] [2]

Process and module updates:

  • Registered new process names FILTERNANOSEQSNP and FILTERNANOSEQNOISE in modules.config and ensured their outputs are not published. [1] [2]
  • Modified the FILTERBED process to support a --positive flag, enabling positive selection filtering as required by the Nanoseq masks.
  • Imported the new filter processes into the mutation preprocessing subworkflow.

Related

Documentation about nanoseq should be added. We can do it in this pull request or in following ones.

Closes #355

This commit adds the posibility of using nanoseq masks
in deepCSA.
New parameters are added both in nextflow_schema and nextflow
config.
No major other  changes are made as nanoseq masks use the same
script as FILTEREXONS and FILTERPANELS.
These changes are copied from branch input-with-click,
more specifically from commit 0af42a9.
When using filterbed.py, if using positive you filter
positions in the bed file and when using negative
you filter positions not in the bed file. This commit
adjusts the parameters in nanoseq filters to adjust
to this behaviour.
The click implementation is usefull to add the
--positive flag for those bed files with the
positive = true parameter in the modules.conf
This commits moves the import from the main
workflow to the MUTATION_PREPROCESSING subworkflow.
This is cleaner and easier to maintain.
Copy link
Collaborator

@FerriolCalvet FerriolCalvet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good!
I left some comments mainly related with the style I would be following, and making one more suggestion on splitting the filtering of masks.

To avoid non-intended output, we define filtername empty if not
defined instead of "covered".
The functions negative_filter_panel_regions and positive_filter_panel_regions
have been unified into one function: filter_panel. The logic is exactly the
same.
A new function is created to remove non canonical chromosomes
in the positions dataframe (from the bed file). Non canonical
chromosomes were giving problems when merging with sample_maf
as "chr" was not detected.
@m-huertasp
Copy link
Collaborator Author

In the last commit I have added some modifications to filterbed.py:

  • Unify negative_filter_panel_regions and positive_filter_panel_regions into filter_panel. It keeps the exact same functionality into one function, which may be easier to maintain.

  • Add a function to remove non-canonical chromosomes from the bed file. I didn't notice before but this was giving problems (mainly with nanoseq noise) when merging with the sample maf.

If you are not sure about the first change we can remove it

…ement

The if statement for the nanoseq masks has been divided
to handle them individually, in case only one is provided.

Also, assigning a value to a channel twive is avoided by
adding "else" statements.
@m-huertasp
Copy link
Collaborator Author

All done (en principi), if there is something else you think could be improved tell me!!

About the nanoseq masks being applied individually, we could do the same modification in deepUMIcaller as well

Copy link
Collaborator

@FerriolCalvet FerriolCalvet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python update looks good pending to discuss the filtering of chromosomes

and I made some proposal for the nf script

m-huertasp and others added 4 commits September 23, 2025 16:07
Taking into account if nanoseq masks were applied.
Taking into account if nanoseq masks were applied.
- fix paths in test_real
- update order of variables in schema
@FerriolCalvet FerriolCalvet merged commit 8fa7998 into dev Sep 25, 2025
@FerriolCalvet FerriolCalvet mentioned this pull request Sep 25, 2025
30 tasks
@FerriolCalvet FerriolCalvet deleted the feature/355-add-nanoseq-masks branch September 25, 2025 14:37
FerriolCalvet added a commit that referenced this pull request Oct 14, 2025
commit e7ace44
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Fri Oct 10 09:44:35 2025 +0200

    v1.0.0 fixes (#380)

    * fix syntax of optional

    - fix ambiguity in features list definition

    * remove optional input definition

commit e409639
Merge: 14640cd 1a61fc9
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Wed Oct 8 15:09:55 2025 +0200

    Merge pull request #377 from bbglab/dev

    New release: v1.0.0 Ter

commit 1a61fc9
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Wed Oct 8 11:25:01 2025 +0200

    update documentation tackling several issues

commit cc808de
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Wed Oct 8 10:23:45 2025 +0200

    update naming of summary mutation plots

commit 2ab4e65
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Tue Oct 7 23:15:35 2025 +0200

    fix typos and make inputs of expand regions optional

commit 6bb325e
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Tue Oct 7 23:03:33 2025 +0200

    apply review suggestions

commit 872809d
Merge: 2ac1bdb 14640cd
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Sun Oct 5 16:07:02 2025 +0200

    Merge branch 'main' into dev

commit 2ac1bdb
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Sat Oct 4 17:52:17 2025 +0200

    add tools' explanation in docs

    - add adjusted mutation density explanation
    - rename subworkflow directory

commit 275bd68
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Fri Oct 3 23:16:31 2025 +0200

    update features groups documentation

commit 67a902e
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Tue Sep 30 09:12:55 2025 +0200

    add nanoseq masks to default filtering

    - add also gnomAD_SNP
    - add documentation on Nanoseq masks

commit b06e900
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Mon Sep 29 09:17:47 2025 +0200

    add test setup and first tests (#375)

    * first definition of tests

    - to be tested

    * add first semi-working version of pipeline level tests

    * add first module testing for EXPANDREGIONS

    - testing focussed in PPM1D gene
    - confirm preferred behaviour for this process if omega_withingene is true, but no option of subgenic element definition is activated it fails
    - stub mode set up pending

    * tests working for EXPANDREGIONS

    * update snapshot

    * minor python fixes

    * changes after PR review

commit 7caf653
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Thu Sep 25 16:43:38 2025 +0200

    update deepCSA diagram

commit 8fa7998
Author: Marta Huertas <97596516+m-huertasp@users.noreply.github.com>
Date:   Thu Sep 25 16:31:39 2025 +0200

    Add Nanoseq masks as filters (#374)

    * feature: add nanoseq masks to FILTERS

    This commit adds the posibility of using nanoseq masks
    in deepCSA.
    New parameters are added both in nextflow_schema and nextflow
    config.
    No major other  changes are made as nanoseq masks use the same
    script as FILTEREXONS and FILTERPANELS.

    * feature: add click to handle inputs

    These changes are copied from branch input-with-click,
    more specifically from commit 0af42a9.

    * refactor: add positive parameter

    When using filterbed.py, if using positive you filter
    positions in the bed file and when using negative
    you filter positions not in the bed file. This commit
    adjusts the parameters in nanoseq filters to adjust
    to this behaviour.

    * refactor: implement with click and add positive parameter

    The click implementation is usefull to add the
    --positive flag for those bed files with the
    positive = true parameter in the modules.conf

    * refactor: import nanoseq files in subworkflow

    This commits moves the import from the main
    workflow to the MUTATION_PREPROCESSING subworkflow.
    This is cleaner and easier to maintain.

    * docs: add nanoseq masks paths

    * refactor: remove debug printing

    * refactor: move publish dir instructions

    Improve clarity.

    * refactor: move nanoseq masks paths to cluster configuration

    * refactor: simplify definitions and avoid non-intended output

    To avoid non-intended output, we define filtername empty if not
    defined instead of "covered".

    * refactor: unify filters into one and remove non canonical chromosomes

    The functions negative_filter_panel_regions and positive_filter_panel_regions
    have been unified into one function: filter_panel. The logic is exactly the
    same.
    A new function is created to remove non canonical chromosomes
    in the positions dataframe (from the bed file). Non canonical
    chromosomes were giving problems when merging with sample_maf
    as "chr" was not detected.

    * refactor: apply nanoseq masks individually with cleaner channel management

    The if statement for the nanoseq masks has been divided
    to handle them individually, in case only one is provided.

    Also, assigning a value to a channel twive is avoided by
    adding "else" statements.

    * refactor: add one liner to create filtered maf panels variable

    Taking into account if nanoseq masks were applied.

    * refactor: add one liner to create filtered maf panels variable

    Taking into account if nanoseq masks were applied.

    * minor update in mut preprocessing style

    - fix paths in test_real
    - update order of variables in schema

    ---------

    Co-authored-by: FerriolCalvet <ferriolcalvet@gmail.com>

commit 42c85f6
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Thu Sep 25 12:37:10 2025 +0200

    Update container for HDP signature extraction (#362)

    * update hdp_wrapper container

    * add ignore strategy to compare signatures step

    * add tmp fixes configs

commit 5b9ed08
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Fri Sep 19 21:45:57 2025 +0200

    fix bug in redefinition of panel with subgenic elements (#373)

    * fix bug in redefinition of exons and domains

    - now if a subgenic element is partially covered, it is still included in the expanded file, before it was not
    Missing:
    -documentation

    * add docs

    * fix bug in end coordinate when matching

    * Apply suggestions from code review

    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

    ---------

    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

commit 3d8fe1f
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Wed Sep 17 20:00:41 2025 +0200

    Add globalloc synonymous numbers QC (#370)

    * add globalloc synonymous numbers qc
    - added all the plots and correlation computations of obs. vs estimated numbers of synonymous mutations

    * update omega syn qc
    - working version with plots and tsv outputs

commit 03a69f6
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Tue Sep 16 10:39:30 2025 +0200

    fix bug that outputted empty maf files (#367)

    * fix remove creation of empty MAFs

    * address #337

commit 3fb032d
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Sat Sep 13 11:14:47 2025 +0200

    add minor fix to plotting needles for groups

commit 9adbe74
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Sat Sep 13 11:05:11 2025 +0200

    Allow the option to plot selection and saturation at the level of groups (#366)

    * init plotting groups

    * needle plots and selection working for groups

    - add param to plot only cohort or all custom groups
    - update groups.json generation

    missing:
    - pass site comparison plots & test saturation

    * fix saturation plots working for groups

    - fix domain selection plotting as png not pdf

commit 14640cd
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Mon Jul 28 18:01:31 2025 +0200

    minor updates documentation related
FerriolCalvet added a commit that referenced this pull request Oct 23, 2025
* add profile concatenation and cosine sim plotting

-not tested

* fix concat profiles working
- separated plots for samples and groups

* Squashed commit of the following:

commit e7ace44
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Fri Oct 10 09:44:35 2025 +0200

    v1.0.0 fixes (#380)

    * fix syntax of optional

    - fix ambiguity in features list definition

    * remove optional input definition

commit e409639
Merge: 14640cd 1a61fc9
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Wed Oct 8 15:09:55 2025 +0200

    Merge pull request #377 from bbglab/dev

    New release: v1.0.0 Ter

commit 1a61fc9
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Wed Oct 8 11:25:01 2025 +0200

    update documentation tackling several issues

commit cc808de
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Wed Oct 8 10:23:45 2025 +0200

    update naming of summary mutation plots

commit 2ab4e65
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Tue Oct 7 23:15:35 2025 +0200

    fix typos and make inputs of expand regions optional

commit 6bb325e
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Tue Oct 7 23:03:33 2025 +0200

    apply review suggestions

commit 872809d
Merge: 2ac1bdb 14640cd
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Sun Oct 5 16:07:02 2025 +0200

    Merge branch 'main' into dev

commit 2ac1bdb
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Sat Oct 4 17:52:17 2025 +0200

    add tools' explanation in docs

    - add adjusted mutation density explanation
    - rename subworkflow directory

commit 275bd68
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Fri Oct 3 23:16:31 2025 +0200

    update features groups documentation

commit 67a902e
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Tue Sep 30 09:12:55 2025 +0200

    add nanoseq masks to default filtering

    - add also gnomAD_SNP
    - add documentation on Nanoseq masks

commit b06e900
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Mon Sep 29 09:17:47 2025 +0200

    add test setup and first tests (#375)

    * first definition of tests

    - to be tested

    * add first semi-working version of pipeline level tests

    * add first module testing for EXPANDREGIONS

    - testing focussed in PPM1D gene
    - confirm preferred behaviour for this process if omega_withingene is true, but no option of subgenic element definition is activated it fails
    - stub mode set up pending

    * tests working for EXPANDREGIONS

    * update snapshot

    * minor python fixes

    * changes after PR review

commit 7caf653
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Thu Sep 25 16:43:38 2025 +0200

    update deepCSA diagram

commit 8fa7998
Author: Marta Huertas <97596516+m-huertasp@users.noreply.github.com>
Date:   Thu Sep 25 16:31:39 2025 +0200

    Add Nanoseq masks as filters (#374)

    * feature: add nanoseq masks to FILTERS

    This commit adds the posibility of using nanoseq masks
    in deepCSA.
    New parameters are added both in nextflow_schema and nextflow
    config.
    No major other  changes are made as nanoseq masks use the same
    script as FILTEREXONS and FILTERPANELS.

    * feature: add click to handle inputs

    These changes are copied from branch input-with-click,
    more specifically from commit 0af42a9.

    * refactor: add positive parameter

    When using filterbed.py, if using positive you filter
    positions in the bed file and when using negative
    you filter positions not in the bed file. This commit
    adjusts the parameters in nanoseq filters to adjust
    to this behaviour.

    * refactor: implement with click and add positive parameter

    The click implementation is usefull to add the
    --positive flag for those bed files with the
    positive = true parameter in the modules.conf

    * refactor: import nanoseq files in subworkflow

    This commits moves the import from the main
    workflow to the MUTATION_PREPROCESSING subworkflow.
    This is cleaner and easier to maintain.

    * docs: add nanoseq masks paths

    * refactor: remove debug printing

    * refactor: move publish dir instructions

    Improve clarity.

    * refactor: move nanoseq masks paths to cluster configuration

    * refactor: simplify definitions and avoid non-intended output

    To avoid non-intended output, we define filtername empty if not
    defined instead of "covered".

    * refactor: unify filters into one and remove non canonical chromosomes

    The functions negative_filter_panel_regions and positive_filter_panel_regions
    have been unified into one function: filter_panel. The logic is exactly the
    same.
    A new function is created to remove non canonical chromosomes
    in the positions dataframe (from the bed file). Non canonical
    chromosomes were giving problems when merging with sample_maf
    as "chr" was not detected.

    * refactor: apply nanoseq masks individually with cleaner channel management

    The if statement for the nanoseq masks has been divided
    to handle them individually, in case only one is provided.

    Also, assigning a value to a channel twive is avoided by
    adding "else" statements.

    * refactor: add one liner to create filtered maf panels variable

    Taking into account if nanoseq masks were applied.

    * refactor: add one liner to create filtered maf panels variable

    Taking into account if nanoseq masks were applied.

    * minor update in mut preprocessing style

    - fix paths in test_real
    - update order of variables in schema

    ---------

    Co-authored-by: FerriolCalvet <ferriolcalvet@gmail.com>

commit 42c85f6
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Thu Sep 25 12:37:10 2025 +0200

    Update container for HDP signature extraction (#362)

    * update hdp_wrapper container

    * add ignore strategy to compare signatures step

    * add tmp fixes configs

commit 5b9ed08
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Fri Sep 19 21:45:57 2025 +0200

    fix bug in redefinition of panel with subgenic elements (#373)

    * fix bug in redefinition of exons and domains

    - now if a subgenic element is partially covered, it is still included in the expanded file, before it was not
    Missing:
    -documentation

    * add docs

    * fix bug in end coordinate when matching

    * Apply suggestions from code review

    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

    ---------

    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

commit 3d8fe1f
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Wed Sep 17 20:00:41 2025 +0200

    Add globalloc synonymous numbers QC (#370)

    * add globalloc synonymous numbers qc
    - added all the plots and correlation computations of obs. vs estimated numbers of synonymous mutations

    * update omega syn qc
    - working version with plots and tsv outputs

commit 03a69f6
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Tue Sep 16 10:39:30 2025 +0200

    fix bug that outputted empty maf files (#367)

    * fix remove creation of empty MAFs

    * address #337

commit 3fb032d
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Sat Sep 13 11:14:47 2025 +0200

    add minor fix to plotting needles for groups

commit 9adbe74
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Sat Sep 13 11:05:11 2025 +0200

    Allow the option to plot selection and saturation at the level of groups (#366)

    * init plotting groups

    * needle plots and selection working for groups

    - add param to plot only cohort or all custom groups
    - update groups.json generation

    missing:
    - pass site comparison plots & test saturation

    * fix saturation plots working for groups

    - fix domain selection plotting as png not pdf

commit 14640cd
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Mon Jul 28 18:01:31 2025 +0200

    minor updates documentation related

* minor updates in plotting settings

* fix naming
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants