Skip to content

add plotting of profiles similarity #384

Merged
FerriolCalvet merged 6 commits intodevfrom
plot-profiles-similarity
Oct 23, 2025
Merged

add plotting of profiles similarity #384
FerriolCalvet merged 6 commits intodevfrom
plot-profiles-similarity

Conversation

@FerriolCalvet
Copy link
Collaborator

@FerriolCalvet FerriolCalvet commented Oct 23, 2025

  • add generation of heatmaps to compare profiles between groups using cosine similarity

AI summary

This pull request introduces a new process for concatenating mutational profiles and integrates it into the existing mutational profiling workflow. The changes enable the aggregation of mutation profiles with group information and update the main workflow to support this functionality.

New process integration and workflow updates:

  • Added a new CONCAT_PROFILES process in modules/local/concatprofiles/main.nf that aggregates mutation profiles and generates summary outputs, including heatmaps, clustermaps, cosine similarity tables, and compiled profiles.
  • Integrated the CONCAT_PROFILES process into the MUTATIONAL_PROFILE subworkflow by importing it and updating the workflow to pass the required all_groups parameter and emit the compiled profiles output. [1] [2] [3]

Main workflow parameter and invocation changes:

  • Updated the DEEPCSA workflow in workflows/deepcsa.nf to pass the new TABLE2GROUP.out.json_allgroups parameter to all mutational profile subworkflows, ensuring group information is available for profile aggregation. [1] [2]

- separated plots for samples and groups
commit e7ace44
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Fri Oct 10 09:44:35 2025 +0200

    v1.0.0 fixes (#380)

    * fix syntax of optional

    - fix ambiguity in features list definition

    * remove optional input definition

commit e409639
Merge: 14640cd 1a61fc9
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Wed Oct 8 15:09:55 2025 +0200

    Merge pull request #377 from bbglab/dev

    New release: v1.0.0 Ter

commit 1a61fc9
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Wed Oct 8 11:25:01 2025 +0200

    update documentation tackling several issues

commit cc808de
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Wed Oct 8 10:23:45 2025 +0200

    update naming of summary mutation plots

commit 2ab4e65
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Tue Oct 7 23:15:35 2025 +0200

    fix typos and make inputs of expand regions optional

commit 6bb325e
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Tue Oct 7 23:03:33 2025 +0200

    apply review suggestions

commit 872809d
Merge: 2ac1bdb 14640cd
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Sun Oct 5 16:07:02 2025 +0200

    Merge branch 'main' into dev

commit 2ac1bdb
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Sat Oct 4 17:52:17 2025 +0200

    add tools' explanation in docs

    - add adjusted mutation density explanation
    - rename subworkflow directory

commit 275bd68
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Fri Oct 3 23:16:31 2025 +0200

    update features groups documentation

commit 67a902e
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Tue Sep 30 09:12:55 2025 +0200

    add nanoseq masks to default filtering

    - add also gnomAD_SNP
    - add documentation on Nanoseq masks

commit b06e900
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Mon Sep 29 09:17:47 2025 +0200

    add test setup and first tests (#375)

    * first definition of tests

    - to be tested

    * add first semi-working version of pipeline level tests

    * add first module testing for EXPANDREGIONS

    - testing focussed in PPM1D gene
    - confirm preferred behaviour for this process if omega_withingene is true, but no option of subgenic element definition is activated it fails
    - stub mode set up pending

    * tests working for EXPANDREGIONS

    * update snapshot

    * minor python fixes

    * changes after PR review

commit 7caf653
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Thu Sep 25 16:43:38 2025 +0200

    update deepCSA diagram

commit 8fa7998
Author: Marta Huertas <97596516+m-huertasp@users.noreply.github.com>
Date:   Thu Sep 25 16:31:39 2025 +0200

    Add Nanoseq masks as filters (#374)

    * feature: add nanoseq masks to FILTERS

    This commit adds the posibility of using nanoseq masks
    in deepCSA.
    New parameters are added both in nextflow_schema and nextflow
    config.
    No major other  changes are made as nanoseq masks use the same
    script as FILTEREXONS and FILTERPANELS.

    * feature: add click to handle inputs

    These changes are copied from branch input-with-click,
    more specifically from commit 0af42a9.

    * refactor: add positive parameter

    When using filterbed.py, if using positive you filter
    positions in the bed file and when using negative
    you filter positions not in the bed file. This commit
    adjusts the parameters in nanoseq filters to adjust
    to this behaviour.

    * refactor: implement with click and add positive parameter

    The click implementation is usefull to add the
    --positive flag for those bed files with the
    positive = true parameter in the modules.conf

    * refactor: import nanoseq files in subworkflow

    This commits moves the import from the main
    workflow to the MUTATION_PREPROCESSING subworkflow.
    This is cleaner and easier to maintain.

    * docs: add nanoseq masks paths

    * refactor: remove debug printing

    * refactor: move publish dir instructions

    Improve clarity.

    * refactor: move nanoseq masks paths to cluster configuration

    * refactor: simplify definitions and avoid non-intended output

    To avoid non-intended output, we define filtername empty if not
    defined instead of "covered".

    * refactor: unify filters into one and remove non canonical chromosomes

    The functions negative_filter_panel_regions and positive_filter_panel_regions
    have been unified into one function: filter_panel. The logic is exactly the
    same.
    A new function is created to remove non canonical chromosomes
    in the positions dataframe (from the bed file). Non canonical
    chromosomes were giving problems when merging with sample_maf
    as "chr" was not detected.

    * refactor: apply nanoseq masks individually with cleaner channel management

    The if statement for the nanoseq masks has been divided
    to handle them individually, in case only one is provided.

    Also, assigning a value to a channel twive is avoided by
    adding "else" statements.

    * refactor: add one liner to create filtered maf panels variable

    Taking into account if nanoseq masks were applied.

    * refactor: add one liner to create filtered maf panels variable

    Taking into account if nanoseq masks were applied.

    * minor update in mut preprocessing style

    - fix paths in test_real
    - update order of variables in schema

    ---------

    Co-authored-by: FerriolCalvet <ferriolcalvet@gmail.com>

commit 42c85f6
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Thu Sep 25 12:37:10 2025 +0200

    Update container for HDP signature extraction (#362)

    * update hdp_wrapper container

    * add ignore strategy to compare signatures step

    * add tmp fixes configs

commit 5b9ed08
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Fri Sep 19 21:45:57 2025 +0200

    fix bug in redefinition of panel with subgenic elements (#373)

    * fix bug in redefinition of exons and domains

    - now if a subgenic element is partially covered, it is still included in the expanded file, before it was not
    Missing:
    -documentation

    * add docs

    * fix bug in end coordinate when matching

    * Apply suggestions from code review

    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

    ---------

    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

commit 3d8fe1f
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Wed Sep 17 20:00:41 2025 +0200

    Add globalloc synonymous numbers QC (#370)

    * add globalloc synonymous numbers qc
    - added all the plots and correlation computations of obs. vs estimated numbers of synonymous mutations

    * update omega syn qc
    - working version with plots and tsv outputs

commit 03a69f6
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Tue Sep 16 10:39:30 2025 +0200

    fix bug that outputted empty maf files (#367)

    * fix remove creation of empty MAFs

    * address #337

commit 3fb032d
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Sat Sep 13 11:14:47 2025 +0200

    add minor fix to plotting needles for groups

commit 9adbe74
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Sat Sep 13 11:05:11 2025 +0200

    Allow the option to plot selection and saturation at the level of groups (#366)

    * init plotting groups

    * needle plots and selection working for groups

    - add param to plot only cohort or all custom groups
    - update groups.json generation

    missing:
    - pass site comparison plots & test saturation

    * fix saturation plots working for groups

    - fix domain selection plotting as png not pdf

commit 14640cd
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Mon Jul 28 18:01:31 2025 +0200

    minor updates documentation related
@FerriolCalvet FerriolCalvet requested a review from Copilot October 23, 2025 08:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds functionality for plotting profile similarities and includes several improvements to the deepCSA pipeline configuration and documentation. The changes introduce new plotting capabilities, enhanced filtering options with Nanoseq masks, and improved process management.

  • Added profile similarity plotting with heatmaps and clustering analysis
  • Introduced Nanoseq mask filtering for SNPs and noisy genomic regions
  • Enhanced group-based plotting capabilities with configurable options

Reviewed Changes

Copilot reviewed 44 out of 60 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
workflows/deepcsa.nf Added group key extraction logic, updated mutation profile and plotting subworkflow calls to support group-based analysis
workflows/tests/deepcsa.nf.test Added new workflow test for minimal features
subworkflows/local/mutationprofile/main.nf Added profile concatenation and similarity analysis
subworkflows/local/mutationpreprocessing/main.nf Implemented Nanoseq mask filtering for human samples
subworkflows/local/plottingsummary/main.nf Added group-based plotting with configurable all-samples-only mode
subworkflows/local/omega/main.nf Added QC evaluation for omega global/local estimation
modules/local/concatprofiles/main.nf New module for concatenating profiles and computing similarity metrics
modules/local/plot/qc/globalloc_synonymous/main.nf New module for omega synonymous QC plotting
modules/local/filterbed/main.nf Enhanced filtering with positive/negative flag support
modules/local/plot/saturation/main.nf Updated input parameters by combining site comparison with results
nextflow.config Added plot_only_allsamples parameter and Nanoseq mask parameters
conf/modules.config Added Nanoseq filtering process configurations
conf/general_files_IRB.config Added Nanoseq mask file paths
conf/tmp_quick_fixes.config Added error handling for specific processes
docs/usage.md Added Nanoseq masks documentation section
docs/tools.md New documentation explaining adjusted mutation density and other tools
test_data/modules/*.bed Added test data files for PPM1D exons and domains
Comments suppressed due to low confidence (1)

docs/file_formatting.md:1

  • Corrected spelling of 'potenitally' to 'potentially'.
# File formats of inputs

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@FerriolCalvet FerriolCalvet requested a review from Copilot October 23, 2025 09:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 5 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@FerriolCalvet FerriolCalvet merged commit af02a12 into dev Oct 23, 2025
@FerriolCalvet FerriolCalvet deleted the plot-profiles-similarity branch October 23, 2025 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants