Skip to content

Add more complete docs#306

Merged
FerriolCalvet merged 21 commits intodevfrom
complete-docs
Jul 7, 2025
Merged

Add more complete docs#306
FerriolCalvet merged 21 commits intodevfrom
complete-docs

Conversation

@FerriolCalvet
Copy link
Collaborator

@FerriolCalvet FerriolCalvet commented Jun 17, 2025

The basics would be something similar to what is available for the nf-core pipelines, with the most mandatory summary would be an explanation of the usage, and an explanation on the outputs.

  • Usage
    • Different pipeline modes
    • Different parameters for each run
  • Outputs
  • Explain different file formats required for the inputs
  • Explanation of internal logic and methods

- list params
- list structural parameters and files
@FerriolCalvet
Copy link
Collaborator Author

we are missing a bit more of explanation on how to download Ensembl VEP (i.e. 111 version that is the one we are using everywhere)

@FerriolCalvet
Copy link
Collaborator Author

we are missing a bit more of explanation on how to download Ensembl VEP (i.e. 111 version that is the one we are using everywhere)

@migrau can you add this? whenever you have time, I am continuing with the rest of things

@migrau
Copy link
Member

migrau commented Jun 20, 2025

I don't think we need to repeat the instructions from the official VEP documentation; it is already quite good. I edited the link to point directly to the VEP cache section and specified a bit the params to change.

On the other hand, and totally off-topic, is it worth testing the last VEP version, v114?

@FerriolCalvet
Copy link
Collaborator Author

I think that for a first round of complete documentation the current status of this branch is complete enough.

More detailed explanation on the outputs and the methods will be available initially in the supplementary material of the normal bladder paper and then more extensively explained in a standalone article.

We can discuss if we want to keep the basic Nextflow parameters explanation in the usage document.
I would personally keep it there at the bottom of the file just in case people find it useful.

@FerriolCalvet FerriolCalvet changed the title [DRAFT] Add more complete docs Add more complete docs Jun 23, 2025
@FerriolCalvet FerriolCalvet requested a review from Copilot June 23, 2025 10:47

This comment was marked as outdated.

@FerriolCalvet FerriolCalvet requested a review from Copilot July 3, 2025 08:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enriches and expands the project documentation and adjusts the pipeline configuration to support new parameters and clean up deprecated blocks.

  • Reorganized nextflow.config, adding new profiling flags, reordering parameters, and removing unused validation settings
  • Removed the standalone Ensembl VEP download module (meta.yml and main.nf) and its related config
  • Added and fleshed out detailed documentation in docs/ (usage, outputs, file formatting) and updated root README

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
nextflow.config Reordered params, added new profiling flags, removed validation.
modules/nf-core/ensemblvep/download/meta.yml Removed obsolete module metadata file.
modules/nf-core/ensemblvep/download/main.nf Removed obsolete download process definition.
docs/usage.md Expanded usage guide with command examples and run modes.
docs/output.md Detailed pipeline output directory structure and step descriptions.
docs/file_formatting.md New document describing required and optional input file formats.
docs/README.md Updated documentation TOC and overview descriptions.
conf/modules.config Removed deprecated ENSEMBLVEP_DOWNLOAD process settings.
assets/useful_scripts/deepcsa_maf2samplevcfs.py Cleaned up comment blocks around usage instructions.
README.md Refined project introduction, usage instructions, and warnings.
.markdownlint.json Added custom markdownlint rules.
Comments suppressed due to low confidence (5)

docs/file_formatting.md:32

  • The parameter use_custom_bedfile is documented here but does not exist in nextflow.config; reconcile the docs or add the missing config parameter.
    use_custom_bedfile          = false

docs/usage.md:157

  • The regressions parameter is referenced but isn't defined in nextflow.config; add the parameter or update the docs accordingly.
    regressions                 = true

docs/README.md:6

  • [nitpick] Nested list indentation contains an extra hyphen; use a single - for sub-items to improve readability.
  - An overview of how the pipeline works and how to run it.

nextflow.config Outdated
dag {
enabled = true
file = "${params.outdir}/pipeline_info/pipeline_dag_${new java.util.Date().format('yyyy-MM-dd_HH-mm-ss')}.html"
file = "${params.outdir}/pipeline_info/pipeline_dag_${new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')}.mmd"
Copy link

Copilot AI Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the DAG output extension to .mmd may break downstream viewers or expectations; consider using a more widely supported format (e.g., .html or .svg) or documenting this change.

Suggested change
file = "${params.outdir}/pipeline_info/pipeline_dag_${new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')}.mmd"
file = "${params.outdir}/pipeline_info/pipeline_dag_${new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')}.svg"

Copilot uses AI. Check for mistakes.
docs/usage.md Outdated
@@ -137,23 +421,23 @@ They are loaded in sequence, so later profiles can overwrite earlier profiles.

If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended, since it can lead to different results on different machines dependent on the computer enviroment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would point to the nf-core documentation instead adding this info here: https://nf-co.re/docs/usage/getting_started/configuration#basic-configuration-profiles

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am only worried that there might be some definitions there that are not exactly the same as here, but it is also true that I did not curate well the description of the profiles below...

if you want to give it a try and remove all this last part on "Additional Nextflow documentation" I am happy to look at it. If this means removing it all I guess it should be fine, we can maybe do this and if then whenever someone starts using this (if anyone does) if there is some feedback on what does not work we can add some more information, do you agree?

If you do, we remove all these last section and add the link you mentioned here.

docs/usage.md Outdated
### Resource requests

Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped.
Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here (example nf-core/rnaseq)](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link is to rnaseq instead deepCSA

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are likely removeing this part

docs/usage.md Outdated

### Initial run. Data exploration

* Definition of regions to analyze
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are the things that you need to prepare for the initial RUN right? it includes some params, i.e.:

params {
    plot_depths   = true
    signatures    = true
    profileall    = true
}

but also some inputs, e.g. "Definition of regions to analyze"? It is not clear to me.
I think I would separate it. For the params part, you can include a comment within the code example:

# Enables plotting depth per sample and/or per gene, mutational profile/signatures, 
# and needle plots for somatic mutations
params {
    plot_depths   = true
    signatures    = true
    profileall    = true
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the list of definition of regions to analyze and all these things are the "analysis" provided if you run the pipeline with these params described below. it is kind of a summary of the outputs/goals that you can achieve with this run mode

@FerriolCalvet FerriolCalvet merged commit 628f282 into dev Jul 7, 2025
@FerriolCalvet FerriolCalvet deleted the complete-docs branch July 7, 2025 20:43
FerriolCalvet added a commit that referenced this pull request Jul 28, 2025
commit 10c12aa
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Thu Jul 24 12:35:40 2025 +0200

    update gnomAD threshold to 0.001

    - ignore errors in omega plot

commit 31e741e
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Tue Jul 22 18:34:11 2025 +0200

    update description and fix broken link

commit 7b0fd9b
Merge: 72dc4d9 cb18adc
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Sun Jul 20 22:34:46 2025 +0200

    Merge pull request #315 from bbglab/dev

commit cb18adc
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Sat Jul 19 12:40:39 2025 +0200

    fix bug in mut density & update omega container

commit 1b4a52d
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Tue Jul 15 15:52:07 2025 +0200

    fix broken path for test_real

commit db6d640
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Fri Jul 11 15:50:45 2025 +0200

    Allow gene selection in consensus (#316)

    * update consensus building to filter genes

    - add consensus compliance param
    - add list of genes param
    - NOT tested

    * tested gene filter implementation

    - consensus panels implemented in polars
    - allowing subset for specific genes

commit b92c2b9
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Thu Jul 10 23:56:47 2025 +0200

    add metro map

commit 11fb1c5
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Thu Jul 10 08:25:31 2025 +0200

    update description in main README

commit 628f282
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Mon Jul 7 19:14:21 2025 +0200

    Add more complete docs (#306)

    * first doc update

    * update in usage documentation

    - list params
    - list structural parameters and files

    * backbone of output docs

    * update usage description with custom sets of mutations

    * fix headers

    * docs: Update usage with vep information

    * update output description

    * update order of usage information

    * update distribution of information in the docs

    * fix typo in docs/output.md

    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

    * fix markdown linting

    * remove unnecessary validation params from config

    - REQUIRES TESTING

    * remove remaining references to download VEP cache

    * update dag format to mmd

    * update in usage

    * update documentation of file formatting and some params

    * add examples in file formatting docs

    * apply review comments

    * remove Nextflow parameters section

    * minor fix in nextflow.config

    ---------

    Co-authored-by: Miquel L. Grau <miguel.grau@irbbarcelona.org>
    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

commit 72dc4d9
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Mon Jul 7 15:53:47 2025 +0200

    temporary LICENSE definition

commit 14c5246
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Fri Jun 27 11:19:45 2025 +0200

    fix bug in panel_annotation (#313)

    reimplement it with click solved the problem
    - only_canonical boolean working
    - not tested

commit 764782a
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Thu Jun 26 23:23:05 2025 +0200

    Update mutation rate to mutation density (#307)

    * rename mutrate to mut density

    - reimplement with click
    - partial renaming
    - simplification of sample_name logic

    * full update of mutation rate to mutation density

    * define other_sample_SNP based on all VAF

    * update mutation density functions

    - clean code
    - add explanation on mutation density

    * apply review changes

commit aae8f0b
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Thu Jun 26 10:49:00 2025 +0200

    Ensure POSTPROCESSVEPPANEL in output (#311)

    * fix relative mutabilities output

    * explicitly define postprocessveppanel outdir

    * force outputting postprocesspanel

    * update storing fixes

commit dd25b9b
Merge: 05c80ee 26d5d9b
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Wed Jun 4 23:09:29 2025 +0200

    Merge pull request #303 from bbglab/dev

    First pre-release merge

commit 05c80ee
Merge: ea9a301 e06218a
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Tue Apr 29 10:30:23 2025 +0200

    Merge pull request #289 from bbglab/tmp-dev

    First release

commit e06218a
Merge: 7559d7f ea9a301
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Tue Apr 29 10:11:05 2025 +0200

    Merge branch 'main' into tmp-dev

commit ea9a301
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Thu Jul 25 16:05:10 2024 +0200

    update schema
FerriolCalvet added a commit that referenced this pull request Aug 1, 2025
* plotting wishlist

- subworkflow definition
- shortlisting plots to add

missing:
- nf scripts for the modules
- python scripts for the plots

* add raw version of supplementary figure plotting

* update omega plotting

* plotting update: omega & needles & stacked

* update plotting cohort plots working

* clean code
list inputs

* saturation data loading working

* gene saturation all tracks working with TP53

* tested additional complementary plots

missing:
- handle sample information input files
- handle reference datasets
- handle multiple genes

* update gene saturation inputs from pipeline

- not tested
- pending to decide creation of unique_splice_sites

* Squashed commit of the following:

commit 10c12aa
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Thu Jul 24 12:35:40 2025 +0200

    update gnomAD threshold to 0.001

    - ignore errors in omega plot

commit 31e741e
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Tue Jul 22 18:34:11 2025 +0200

    update description and fix broken link

commit 7b0fd9b
Merge: 72dc4d9 cb18adc
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Sun Jul 20 22:34:46 2025 +0200

    Merge pull request #315 from bbglab/dev

commit cb18adc
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Sat Jul 19 12:40:39 2025 +0200

    fix bug in mut density & update omega container

commit 1b4a52d
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Tue Jul 15 15:52:07 2025 +0200

    fix broken path for test_real

commit db6d640
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Fri Jul 11 15:50:45 2025 +0200

    Allow gene selection in consensus (#316)

    * update consensus building to filter genes

    - add consensus compliance param
    - add list of genes param
    - NOT tested

    * tested gene filter implementation

    - consensus panels implemented in polars
    - allowing subset for specific genes

commit b92c2b9
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Thu Jul 10 23:56:47 2025 +0200

    add metro map

commit 11fb1c5
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Thu Jul 10 08:25:31 2025 +0200

    update description in main README

commit 628f282
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Mon Jul 7 19:14:21 2025 +0200

    Add more complete docs (#306)

    * first doc update

    * update in usage documentation

    - list params
    - list structural parameters and files

    * backbone of output docs

    * update usage description with custom sets of mutations

    * fix headers

    * docs: Update usage with vep information

    * update output description

    * update order of usage information

    * update distribution of information in the docs

    * fix typo in docs/output.md
    * fix markdown linting

    * remove unnecessary validation params from config

    - REQUIRES TESTING

    * remove remaining references to download VEP cache

    * update dag format to mmd

    * update in usage

    * update documentation of file formatting and some params

    * add examples in file formatting docs

    * apply review comments

    * remove Nextflow parameters section

    * minor fix in nextflow.config

    ---------

    Co-authored-by: Miquel L. Grau <miguel.grau@irbbarcelona.org>

commit 72dc4d9
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Mon Jul 7 15:53:47 2025 +0200

    temporary LICENSE definition

commit 14c5246
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Fri Jun 27 11:19:45 2025 +0200

    fix bug in panel_annotation (#313)

    reimplement it with click solved the problem
    - only_canonical boolean working
    - not tested

commit 764782a
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Thu Jun 26 23:23:05 2025 +0200

    Update mutation rate to mutation density (#307)

    * rename mutrate to mut density

    - reimplement with click
    - partial renaming
    - simplification of sample_name logic

    * full update of mutation rate to mutation density

    * define other_sample_SNP based on all VAF

    * update mutation density functions

    - clean code
    - add explanation on mutation density

    * apply review changes

commit aae8f0b
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Thu Jun 26 10:49:00 2025 +0200

    Ensure POSTPROCESSVEPPANEL in output (#311)

    * fix relative mutabilities output

    * explicitly define postprocessveppanel outdir

    * force outputting postprocesspanel

    * update storing fixes

commit dd25b9b
Merge: 05c80ee 26d5d9b
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Wed Jun 4 23:09:29 2025 +0200

    Merge pull request #303 from bbglab/dev

    First pre-release merge

commit 05c80ee
Merge: ea9a301 e06218a
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Tue Apr 29 10:30:23 2025 +0200

    Merge pull request #289 from bbglab/tmp-dev

    First release

commit e06218a
Merge: 7559d7f ea9a301
Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com>
Date:   Tue Apr 29 10:11:05 2025 +0200

    Merge branch 'main' into tmp-dev

commit ea9a301
Author: FerriolCalvet <ferriolcalvet@gmail.com>
Date:   Thu Jul 25 16:05:10 2024 +0200

    update schema

* update stacked plot needles

* minor fixes after merge

- NOT WORKING

* update linewidth and size

- add mutation types

* fix plot saturation within pipeline
- collect sitecomparisons
- remove reference to ddg. requires external
- fix input unique to keep header and minimal information
- tested and works

* fix o3d logs output

* plot multiple genes, not only TP53

* apply review suggestions

- temporary solution to domain information loading

* update domain definition and plotting

- subset domains to in_panel ones
- update domain name definition
- plotting modules works
- missing autoexons plot
- update signatures output

* - separate generation of depth per exon
- add general error handling to all plotting modules
- generate exons bedfile within panel

* batch update of exon definitions

- use correct exons definition
- update custom bedfile name
- not tested

* fix exons panel generation

* add error handling to omega plot
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants