Conversation
- list params - list structural parameters and files
|
we are missing a bit more of explanation on how to download Ensembl VEP (i.e. 111 version that is the one we are using everywhere) |
@migrau can you add this? whenever you have time, I am continuing with the rest of things |
|
I don't think we need to repeat the instructions from the official VEP documentation; it is already quite good. I edited the link to point directly to the VEP cache section and specified a bit the params to change. On the other hand, and totally off-topic, is it worth testing the last VEP version, v114? |
|
I think that for a first round of complete documentation the current status of this branch is complete enough. More detailed explanation on the outputs and the methods will be available initially in the supplementary material of the normal bladder paper and then more extensively explained in a standalone article. We can discuss if we want to keep the basic Nextflow parameters explanation in the usage document. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- REQUIRES TESTING
There was a problem hiding this comment.
Pull Request Overview
This PR enriches and expands the project documentation and adjusts the pipeline configuration to support new parameters and clean up deprecated blocks.
- Reorganized
nextflow.config, adding new profiling flags, reordering parameters, and removing unused validation settings - Removed the standalone Ensembl VEP download module (
meta.ymlandmain.nf) and its related config - Added and fleshed out detailed documentation in
docs/(usage, outputs, file formatting) and updated root README
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| nextflow.config | Reordered params, added new profiling flags, removed validation. |
| modules/nf-core/ensemblvep/download/meta.yml | Removed obsolete module metadata file. |
| modules/nf-core/ensemblvep/download/main.nf | Removed obsolete download process definition. |
| docs/usage.md | Expanded usage guide with command examples and run modes. |
| docs/output.md | Detailed pipeline output directory structure and step descriptions. |
| docs/file_formatting.md | New document describing required and optional input file formats. |
| docs/README.md | Updated documentation TOC and overview descriptions. |
| conf/modules.config | Removed deprecated ENSEMBLVEP_DOWNLOAD process settings. |
| assets/useful_scripts/deepcsa_maf2samplevcfs.py | Cleaned up comment blocks around usage instructions. |
| README.md | Refined project introduction, usage instructions, and warnings. |
| .markdownlint.json | Added custom markdownlint rules. |
Comments suppressed due to low confidence (5)
docs/file_formatting.md:32
- The parameter
use_custom_bedfileis documented here but does not exist innextflow.config; reconcile the docs or add the missing config parameter.
use_custom_bedfile = false
docs/usage.md:157
- The
regressionsparameter is referenced but isn't defined innextflow.config; add the parameter or update the docs accordingly.
regressions = true
docs/README.md:6
- [nitpick] Nested list indentation contains an extra hyphen; use a single
-for sub-items to improve readability.
- An overview of how the pipeline works and how to run it.
nextflow.config
Outdated
| dag { | ||
| enabled = true | ||
| file = "${params.outdir}/pipeline_info/pipeline_dag_${new java.util.Date().format('yyyy-MM-dd_HH-mm-ss')}.html" | ||
| file = "${params.outdir}/pipeline_info/pipeline_dag_${new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')}.mmd" |
There was a problem hiding this comment.
Changing the DAG output extension to .mmd may break downstream viewers or expectations; consider using a more widely supported format (e.g., .html or .svg) or documenting this change.
| file = "${params.outdir}/pipeline_info/pipeline_dag_${new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')}.mmd" | |
| file = "${params.outdir}/pipeline_info/pipeline_dag_${new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')}.svg" |
docs/usage.md
Outdated
| @@ -137,23 +421,23 @@ They are loaded in sequence, so later profiles can overwrite earlier profiles. | |||
|
|
|||
| If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended, since it can lead to different results on different machines dependent on the computer enviroment. | |||
There was a problem hiding this comment.
I would point to the nf-core documentation instead adding this info here: https://nf-co.re/docs/usage/getting_started/configuration#basic-configuration-profiles
There was a problem hiding this comment.
I am only worried that there might be some definitions there that are not exactly the same as here, but it is also true that I did not curate well the description of the profiles below...
if you want to give it a try and remove all this last part on "Additional Nextflow documentation" I am happy to look at it. If this means removing it all I guess it should be fine, we can maybe do this and if then whenever someone starts using this (if anyone does) if there is some feedback on what does not work we can add some more information, do you agree?
If you do, we remove all these last section and add the link you mentioned here.
docs/usage.md
Outdated
| ### Resource requests | ||
|
|
||
| Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped. | ||
| Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here (example nf-core/rnaseq)](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped. |
There was a problem hiding this comment.
link is to rnaseq instead deepCSA
There was a problem hiding this comment.
we are likely removeing this part
docs/usage.md
Outdated
|
|
||
| ### Initial run. Data exploration | ||
|
|
||
| * Definition of regions to analyze |
There was a problem hiding this comment.
these are the things that you need to prepare for the initial RUN right? it includes some params, i.e.:
params {
plot_depths = true
signatures = true
profileall = true
}
but also some inputs, e.g. "Definition of regions to analyze"? It is not clear to me.
I think I would separate it. For the params part, you can include a comment within the code example:
# Enables plotting depth per sample and/or per gene, mutational profile/signatures,
# and needle plots for somatic mutations
params {
plot_depths = true
signatures = true
profileall = true
}
There was a problem hiding this comment.
the list of definition of regions to analyze and all these things are the "analysis" provided if you run the pipeline with these params described below. it is kind of a summary of the outputs/goals that you can achieve with this run mode
commit 10c12aa Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Jul 24 12:35:40 2025 +0200 update gnomAD threshold to 0.001 - ignore errors in omega plot commit 31e741e Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Tue Jul 22 18:34:11 2025 +0200 update description and fix broken link commit 7b0fd9b Merge: 72dc4d9 cb18adc Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Sun Jul 20 22:34:46 2025 +0200 Merge pull request #315 from bbglab/dev commit cb18adc Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Sat Jul 19 12:40:39 2025 +0200 fix bug in mut density & update omega container commit 1b4a52d Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Tue Jul 15 15:52:07 2025 +0200 fix broken path for test_real commit db6d640 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Fri Jul 11 15:50:45 2025 +0200 Allow gene selection in consensus (#316) * update consensus building to filter genes - add consensus compliance param - add list of genes param - NOT tested * tested gene filter implementation - consensus panels implemented in polars - allowing subset for specific genes commit b92c2b9 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Jul 10 23:56:47 2025 +0200 add metro map commit 11fb1c5 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Jul 10 08:25:31 2025 +0200 update description in main README commit 628f282 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Mon Jul 7 19:14:21 2025 +0200 Add more complete docs (#306) * first doc update * update in usage documentation - list params - list structural parameters and files * backbone of output docs * update usage description with custom sets of mutations * fix headers * docs: Update usage with vep information * update output description * update order of usage information * update distribution of information in the docs * fix typo in docs/output.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix markdown linting * remove unnecessary validation params from config - REQUIRES TESTING * remove remaining references to download VEP cache * update dag format to mmd * update in usage * update documentation of file formatting and some params * add examples in file formatting docs * apply review comments * remove Nextflow parameters section * minor fix in nextflow.config --------- Co-authored-by: Miquel L. Grau <miguel.grau@irbbarcelona.org> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> commit 72dc4d9 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Mon Jul 7 15:53:47 2025 +0200 temporary LICENSE definition commit 14c5246 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Fri Jun 27 11:19:45 2025 +0200 fix bug in panel_annotation (#313) reimplement it with click solved the problem - only_canonical boolean working - not tested commit 764782a Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Thu Jun 26 23:23:05 2025 +0200 Update mutation rate to mutation density (#307) * rename mutrate to mut density - reimplement with click - partial renaming - simplification of sample_name logic * full update of mutation rate to mutation density * define other_sample_SNP based on all VAF * update mutation density functions - clean code - add explanation on mutation density * apply review changes commit aae8f0b Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Thu Jun 26 10:49:00 2025 +0200 Ensure POSTPROCESSVEPPANEL in output (#311) * fix relative mutabilities output * explicitly define postprocessveppanel outdir * force outputting postprocesspanel * update storing fixes commit dd25b9b Merge: 05c80ee 26d5d9b Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Wed Jun 4 23:09:29 2025 +0200 Merge pull request #303 from bbglab/dev First pre-release merge commit 05c80ee Merge: ea9a301 e06218a Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Tue Apr 29 10:30:23 2025 +0200 Merge pull request #289 from bbglab/tmp-dev First release commit e06218a Merge: 7559d7f ea9a301 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Tue Apr 29 10:11:05 2025 +0200 Merge branch 'main' into tmp-dev commit ea9a301 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Jul 25 16:05:10 2024 +0200 update schema
* plotting wishlist - subworkflow definition - shortlisting plots to add missing: - nf scripts for the modules - python scripts for the plots * add raw version of supplementary figure plotting * update omega plotting * plotting update: omega & needles & stacked * update plotting cohort plots working * clean code list inputs * saturation data loading working * gene saturation all tracks working with TP53 * tested additional complementary plots missing: - handle sample information input files - handle reference datasets - handle multiple genes * update gene saturation inputs from pipeline - not tested - pending to decide creation of unique_splice_sites * Squashed commit of the following: commit 10c12aa Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Jul 24 12:35:40 2025 +0200 update gnomAD threshold to 0.001 - ignore errors in omega plot commit 31e741e Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Tue Jul 22 18:34:11 2025 +0200 update description and fix broken link commit 7b0fd9b Merge: 72dc4d9 cb18adc Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Sun Jul 20 22:34:46 2025 +0200 Merge pull request #315 from bbglab/dev commit cb18adc Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Sat Jul 19 12:40:39 2025 +0200 fix bug in mut density & update omega container commit 1b4a52d Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Tue Jul 15 15:52:07 2025 +0200 fix broken path for test_real commit db6d640 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Fri Jul 11 15:50:45 2025 +0200 Allow gene selection in consensus (#316) * update consensus building to filter genes - add consensus compliance param - add list of genes param - NOT tested * tested gene filter implementation - consensus panels implemented in polars - allowing subset for specific genes commit b92c2b9 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Jul 10 23:56:47 2025 +0200 add metro map commit 11fb1c5 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Jul 10 08:25:31 2025 +0200 update description in main README commit 628f282 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Mon Jul 7 19:14:21 2025 +0200 Add more complete docs (#306) * first doc update * update in usage documentation - list params - list structural parameters and files * backbone of output docs * update usage description with custom sets of mutations * fix headers * docs: Update usage with vep information * update output description * update order of usage information * update distribution of information in the docs * fix typo in docs/output.md * fix markdown linting * remove unnecessary validation params from config - REQUIRES TESTING * remove remaining references to download VEP cache * update dag format to mmd * update in usage * update documentation of file formatting and some params * add examples in file formatting docs * apply review comments * remove Nextflow parameters section * minor fix in nextflow.config --------- Co-authored-by: Miquel L. Grau <miguel.grau@irbbarcelona.org> commit 72dc4d9 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Mon Jul 7 15:53:47 2025 +0200 temporary LICENSE definition commit 14c5246 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Fri Jun 27 11:19:45 2025 +0200 fix bug in panel_annotation (#313) reimplement it with click solved the problem - only_canonical boolean working - not tested commit 764782a Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Thu Jun 26 23:23:05 2025 +0200 Update mutation rate to mutation density (#307) * rename mutrate to mut density - reimplement with click - partial renaming - simplification of sample_name logic * full update of mutation rate to mutation density * define other_sample_SNP based on all VAF * update mutation density functions - clean code - add explanation on mutation density * apply review changes commit aae8f0b Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Thu Jun 26 10:49:00 2025 +0200 Ensure POSTPROCESSVEPPANEL in output (#311) * fix relative mutabilities output * explicitly define postprocessveppanel outdir * force outputting postprocesspanel * update storing fixes commit dd25b9b Merge: 05c80ee 26d5d9b Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Wed Jun 4 23:09:29 2025 +0200 Merge pull request #303 from bbglab/dev First pre-release merge commit 05c80ee Merge: ea9a301 e06218a Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Tue Apr 29 10:30:23 2025 +0200 Merge pull request #289 from bbglab/tmp-dev First release commit e06218a Merge: 7559d7f ea9a301 Author: Ferriol Calvet <38539786+FerriolCalvet@users.noreply.github.com> Date: Tue Apr 29 10:11:05 2025 +0200 Merge branch 'main' into tmp-dev commit ea9a301 Author: FerriolCalvet <ferriolcalvet@gmail.com> Date: Thu Jul 25 16:05:10 2024 +0200 update schema * update stacked plot needles * minor fixes after merge - NOT WORKING * update linewidth and size - add mutation types * fix plot saturation within pipeline - collect sitecomparisons - remove reference to ddg. requires external - fix input unique to keep header and minimal information - tested and works * fix o3d logs output * plot multiple genes, not only TP53 * apply review suggestions - temporary solution to domain information loading * update domain definition and plotting - subset domains to in_panel ones - update domain name definition - plotting modules works - missing autoexons plot - update signatures output * - separate generation of depth per exon - add general error handling to all plotting modules - generate exons bedfile within panel * batch update of exon definitions - use correct exons definition - update custom bedfile name - not tested * fix exons panel generation * add error handling to omega plot
The basics would be something similar to what is available for the nf-core pipelines, with the most mandatory summary would be an explanation of the usage, and an explanation on the outputs.