Add more complete docs by FerriolCalvet · Pull Request #306 · bbglab/deepCSA

FerriolCalvet · 2025-06-17T23:38:35Z

The basics would be something similar to what is available for the nf-core pipelines, with the most mandatory summary would be an explanation of the usage, and an explanation on the outputs.

Usage
- Different pipeline modes
- Different parameters for each run
Outputs
Explain different file formats required for the inputs
Explanation of internal logic and methods

- list params - list structural parameters and files

FerriolCalvet · 2025-06-17T23:39:46Z

we are missing a bit more of explanation on how to download Ensembl VEP (i.e. 111 version that is the one we are using everywhere)

FerriolCalvet · 2025-06-19T10:55:37Z

we are missing a bit more of explanation on how to download Ensembl VEP (i.e. 111 version that is the one we are using everywhere)

@migrau can you add this? whenever you have time, I am continuing with the rest of things

migrau · 2025-06-20T05:25:55Z

I don't think we need to repeat the instructions from the official VEP documentation; it is already quite good. I edited the link to point directly to the VEP cache section and specified a bit the params to change.

On the other hand, and totally off-topic, is it worth testing the last VEP version, v114?

FerriolCalvet · 2025-06-23T10:28:21Z

I think that for a first round of complete documentation the current status of this branch is complete enough.

More detailed explanation on the outputs and the methods will be available initially in the supplementary material of the normal bladder paper and then more extensively explained in a standalone article.

We can discuss if we want to keep the basic Nextflow parameters explanation in the usage document.
I would personally keep it there at the bottom of the file just in case people find it useful.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

- REQUIRES TESTING

Copilot

Pull Request Overview

This PR enriches and expands the project documentation and adjusts the pipeline configuration to support new parameters and clean up deprecated blocks.

Reorganized nextflow.config, adding new profiling flags, reordering parameters, and removing unused validation settings
Removed the standalone Ensembl VEP download module (meta.yml and main.nf) and its related config
Added and fleshed out detailed documentation in docs/ (usage, outputs, file formatting) and updated root README

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
nextflow.config	Reordered params, added new profiling flags, removed validation.
modules/nf-core/ensemblvep/download/meta.yml	Removed obsolete module metadata file.
modules/nf-core/ensemblvep/download/main.nf	Removed obsolete download process definition.
docs/usage.md	Expanded usage guide with command examples and run modes.
docs/output.md	Detailed pipeline output directory structure and step descriptions.
docs/file_formatting.md	New document describing required and optional input file formats.
docs/README.md	Updated documentation TOC and overview descriptions.
conf/modules.config	Removed deprecated `ENSEMBLVEP_DOWNLOAD` process settings.
assets/useful_scripts/deepcsa_maf2samplevcfs.py	Cleaned up comment blocks around usage instructions.
README.md	Refined project introduction, usage instructions, and warnings.
.markdownlint.json	Added custom markdownlint rules.

Comments suppressed due to low confidence (5)

docs/file_formatting.md:32

The parameter use_custom_bedfile is documented here but does not exist in nextflow.config; reconcile the docs or add the missing config parameter.

    use_custom_bedfile          = false

docs/usage.md:157

The regressions parameter is referenced but isn't defined in nextflow.config; add the parameter or update the docs accordingly.

    regressions                 = true

docs/README.md:6

[nitpick] Nested list indentation contains an extra hyphen; use a single - for sub-items to improve readability.

  - An overview of how the pipeline works and how to run it.

Copilot · 2025-07-03T08:20:47Z

nextflow.config

 dag {
    enabled = true
-    file    = "${params.outdir}/pipeline_info/pipeline_dag_${new java.util.Date().format('yyyy-MM-dd_HH-mm-ss')}.html"
+    file    = "${params.outdir}/pipeline_info/pipeline_dag_${new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')}.mmd"


Changing the DAG output extension to .mmd may break downstream viewers or expectations; consider using a more widely supported format (e.g., .html or .svg) or documenting this change.

Suggested change

file = "${params.outdir}/pipeline_info/pipeline_dag_${new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')}.mmd"

file = "${params.outdir}/pipeline_info/pipeline_dag_${new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')}.svg"

docs/usage.md

README.md

migrau · 2025-07-03T08:54:55Z

docs/usage.md

@@ -137,23 +421,23 @@ They are loaded in sequence, so later profiles can overwrite earlier profiles.

 If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended, since it can lead to different results on different machines dependent on the computer enviroment.


I would point to the nf-core documentation instead adding this info here: https://nf-co.re/docs/usage/getting_started/configuration#basic-configuration-profiles

I am only worried that there might be some definitions there that are not exactly the same as here, but it is also true that I did not curate well the description of the profiles below...

if you want to give it a try and remove all this last part on "Additional Nextflow documentation" I am happy to look at it. If this means removing it all I guess it should be fine, we can maybe do this and if then whenever someone starts using this (if anyone does) if there is some feedback on what does not work we can add some more information, do you agree?

If you do, we remove all these last section and add the link you mentioned here.

migrau · 2025-07-03T08:56:04Z

docs/usage.md

 ### Resource requests

-Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped.
+Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here (example nf-core/rnaseq)](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped.


link is to rnaseq instead deepCSA

we are likely removeing this part

docs/usage.md

migrau · 2025-07-03T09:25:26Z

docs/usage.md

+
+### Initial run. Data exploration
+
+* Definition of regions to analyze


these are the things that you need to prepare for the initial RUN right? it includes some params, i.e.:

params { plot_depths = true signatures = true profileall = true }

but also some inputs, e.g. "Definition of regions to analyze"? It is not clear to me.
I think I would separate it. For the params part, you can include a comment within the code example:

# Enables plotting depth per sample and/or per gene, mutational profile/signatures, # and needle plots for somatic mutations params { plot_depths = true signatures = true profileall = true }

the list of definition of regions to analyze and all these things are the "analysis" provided if you run the pipeline with these params described below. it is kind of a summary of the outputs/goals that you can achieve with this run mode

docs/output.md

commit 10c12aa Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Jul 24 12:35:40 2025 +0200 update gnomAD threshold to 0.001 - ignore errors in omega plot commit 31e741e Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Tue Jul 22 18:34:11 2025 +0200 update description and fix broken link commit 7b0fd9b Merge: 72dc4d9 cb18adc Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Sun Jul 20 22:34:46 2025 +0200 Merge pull request #315 from bbglab/dev commit cb18adc Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Sat Jul 19 12:40:39 2025 +0200 fix bug in mut density & update omega container commit 1b4a52d Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Tue Jul 15 15:52:07 2025 +0200 fix broken path for test_real commit db6d640 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Fri Jul 11 15:50:45 2025 +0200 Allow gene selection in consensus (#316) * update consensus building to filter genes - add consensus compliance param - add list of genes param - NOT tested * tested gene filter implementation - consensus panels implemented in polars - allowing subset for specific genes commit b92c2b9 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Jul 10 23:56:47 2025 +0200 add metro map commit 11fb1c5 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Jul 10 08:25:31 2025 +0200 update description in main README commit 628f282 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Mon Jul 7 19:14:21 2025 +0200 Add more complete docs (#306) * first doc update * update in usage documentation - list params - list structural parameters and files * backbone of output docs * update usage description with custom sets of mutations * fix headers * docs: Update usage with vep information * update output description * update order of usage information * update distribution of information in the docs * fix typo in docs/output.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix markdown linting * remove unnecessary validation params from config - REQUIRES TESTING * remove remaining references to download VEP cache * update dag format to mmd * update in usage * update documentation of file formatting and some params * add examples in file formatting docs * apply review comments * remove Nextflow parameters section * minor fix in nextflow.config --------- Co-authored-by: Miquel L. Grau <miguel.grau@irbbarcelona.org> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> commit 72dc4d9 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Mon Jul 7 15:53:47 2025 +0200 temporary LICENSE definition commit 14c5246 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Fri Jun 27 11:19:45 2025 +0200 fix bug in panel_annotation (#313) reimplement it with click solved the problem - only_canonical boolean working - not tested commit 764782a Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Thu Jun 26 23:23:05 2025 +0200 Update mutation rate to mutation density (#307) * rename mutrate to mut density - reimplement with click - partial renaming - simplification of sample_name logic * full update of mutation rate to mutation density * define other_sample_SNP based on all VAF * update mutation density functions - clean code - add explanation on mutation density * apply review changes commit aae8f0b Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Thu Jun 26 10:49:00 2025 +0200 Ensure POSTPROCESSVEPPANEL in output (#311) * fix relative mutabilities output * explicitly define postprocessveppanel outdir * force outputting postprocesspanel * update storing fixes commit dd25b9b Merge: 05c80ee 26d5d9b Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Wed Jun 4 23:09:29 2025 +0200 Merge pull request #303 from bbglab/dev First pre-release merge commit 05c80ee Merge: ea9a301 e06218a Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Tue Apr 29 10:30:23 2025 +0200 Merge pull request #289 from bbglab/tmp-dev First release commit e06218a Merge: 7559d7f ea9a301 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Tue Apr 29 10:11:05 2025 +0200 Merge branch 'main' into tmp-dev commit ea9a301 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Jul 25 16:05:10 2024 +0200 update schema

* plotting wishlist - subworkflow definition - shortlisting plots to add missing: - nf scripts for the modules - python scripts for the plots * add raw version of supplementary figure plotting * update omega plotting * plotting update: omega & needles & stacked * update plotting cohort plots working * clean code list inputs * saturation data loading working * gene saturation all tracks working with TP53 * tested additional complementary plots missing: - handle sample information input files - handle reference datasets - handle multiple genes * update gene saturation inputs from pipeline - not tested - pending to decide creation of unique_splice_sites * Squashed commit of the following: commit 10c12aa Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Jul 24 12:35:40 2025 +0200 update gnomAD threshold to 0.001 - ignore errors in omega plot commit 31e741e Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Tue Jul 22 18:34:11 2025 +0200 update description and fix broken link commit 7b0fd9b Merge: 72dc4d9 cb18adc Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Sun Jul 20 22:34:46 2025 +0200 Merge pull request #315 from bbglab/dev commit cb18adc Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Sat Jul 19 12:40:39 2025 +0200 fix bug in mut density & update omega container commit 1b4a52d Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Tue Jul 15 15:52:07 2025 +0200 fix broken path for test_real commit db6d640 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Fri Jul 11 15:50:45 2025 +0200 Allow gene selection in consensus (#316) * update consensus building to filter genes - add consensus compliance param - add list of genes param - NOT tested * tested gene filter implementation - consensus panels implemented in polars - allowing subset for specific genes commit b92c2b9 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Jul 10 23:56:47 2025 +0200 add metro map commit 11fb1c5 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Jul 10 08:25:31 2025 +0200 update description in main README commit 628f282 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Mon Jul 7 19:14:21 2025 +0200 Add more complete docs (#306) * first doc update * update in usage documentation - list params - list structural parameters and files * backbone of output docs * update usage description with custom sets of mutations * fix headers * docs: Update usage with vep information * update output description * update order of usage information * update distribution of information in the docs * fix typo in docs/output.md * fix markdown linting * remove unnecessary validation params from config - REQUIRES TESTING * remove remaining references to download VEP cache * update dag format to mmd * update in usage * update documentation of file formatting and some params * add examples in file formatting docs * apply review comments * remove Nextflow parameters section * minor fix in nextflow.config --------- Co-authored-by: Miquel L. Grau <miguel.grau@irbbarcelona.org> commit 72dc4d9 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Mon Jul 7 15:53:47 2025 +0200 temporary LICENSE definition commit 14c5246 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Fri Jun 27 11:19:45 2025 +0200 fix bug in panel_annotation (#313) reimplement it with click solved the problem - only_canonical boolean working - not tested commit 764782a Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Thu Jun 26 23:23:05 2025 +0200 Update mutation rate to mutation density (#307) * rename mutrate to mut density - reimplement with click - partial renaming - simplification of sample_name logic * full update of mutation rate to mutation density * define other_sample_SNP based on all VAF * update mutation density functions - clean code - add explanation on mutation density * apply review changes commit aae8f0b Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Thu Jun 26 10:49:00 2025 +0200 Ensure POSTPROCESSVEPPANEL in output (#311) * fix relative mutabilities output * explicitly define postprocessveppanel outdir * force outputting postprocesspanel * update storing fixes commit dd25b9b Merge: 05c80ee 26d5d9b Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Wed Jun 4 23:09:29 2025 +0200 Merge pull request #303 from bbglab/dev First pre-release merge commit 05c80ee Merge: ea9a301 e06218a Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Tue Apr 29 10:30:23 2025 +0200 Merge pull request #289 from bbglab/tmp-dev First release commit e06218a Merge: 7559d7f ea9a301 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Tue Apr 29 10:11:05 2025 +0200 Merge branch 'main' into tmp-dev commit ea9a301 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Jul 25 16:05:10 2024 +0200 update schema * update stacked plot needles * minor fixes after merge - NOT WORKING * update linewidth and size - add mutation types * fix plot saturation within pipeline - collect sitecomparisons - remove reference to ddg. requires external - fix input unique to keep header and minimal information - tested and works * fix o3d logs output * plot multiple genes, not only TP53 * apply review suggestions - temporary solution to domain information loading * update domain definition and plotting - subset domains to in_panel ones - update domain name definition - plotting modules works - missing autoexons plot - update signatures output * - separate generation of depth per exon - add general error handling to all plotting modules - generate exons bedfile within panel * batch update of exon definitions - use correct exons definition - update custom bedfile name - not tested * fix exons panel generation * add error handling to omega plot

FerriolCalvet added 2 commits June 16, 2025 21:46

first doc update

ec314e9

update in usage documentation

be2b16c

- list params - list structural parameters and files

FerriolCalvet added 3 commits June 19, 2025 11:29

backbone of output docs

ce80234

update usage description with custom sets of mutations

955f6e6

fix headers

4d3f609

docs: Update usage with vep information

48c8f9f

FerriolCalvet added 2 commits June 20, 2025 22:45

update output description

f67899e

update order of usage information

897453b

FerriolCalvet changed the title ~~[DRAFT] Add more complete docs~~ Add more complete docs Jun 23, 2025

update distribution of information in the docs

739ef82

FerriolCalvet requested a review from Copilot June 23, 2025 10:47

This comment was marked as outdated.

Sign in to view

FerriolCalvet and others added 8 commits June 23, 2025 14:26

fix typo in docs/output.md

3d5f750

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix markdown linting

e6a0481

remove unnecessary validation params from config

172c44c

- REQUIRES TESTING

remove remaining references to download VEP cache

977a3a8

update dag format to mmd

f627011

update in usage

cbc8e2d

update documentation of file formatting and some params

516342b

add examples in file formatting docs

cfd5fec

FerriolCalvet requested a review from Copilot July 3, 2025 08:19

Copilot AI reviewed Jul 3, 2025

View reviewed changes

migrau reviewed Jul 3, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

migrau reviewed Jul 3, 2025

View reviewed changes

docs/usage.md Outdated Show resolved Hide resolved

migrau reviewed Jul 3, 2025

View reviewed changes

docs/usage.md Show resolved Hide resolved

migrau reviewed Jul 3, 2025

View reviewed changes

docs/output.md Show resolved Hide resolved

FerriolCalvet and others added 4 commits July 3, 2025 23:55

apply review comments

31fb03e

remove Nextflow parameters section

9c3d3c6

Merge branch 'dev' into complete-docs

aa1d532

minor fix in nextflow.config

10806b7

FerriolCalvet merged commit 628f282 into dev Jul 7, 2025

FerriolCalvet deleted the complete-docs branch July 7, 2025 20:43

	file = "${params.outdir}/pipeline_info/pipeline_dag_${new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')}.mmd"
	file = "${params.outdir}/pipeline_info/pipeline_dag_${new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')}.svg"

		@@ -137,23 +421,23 @@ They are loaded in sequence, so later profiles can overwrite earlier profiles.

		If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended, since it can lead to different results on different machines dependent on the computer enviroment.


		### Initial run. Data exploration

		* Definition of regions to analyze

Conversation

FerriolCalvet commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FerriolCalvet commented Jun 17, 2025

Uh oh!

FerriolCalvet commented Jun 19, 2025

Uh oh!

migrau commented Jun 20, 2025

Uh oh!

FerriolCalvet commented Jun 23, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

migrau Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

FerriolCalvet Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

migrau Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

FerriolCalvet Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

migrau Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

FerriolCalvet Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FerriolCalvet commented Jun 17, 2025 •

edited

Loading