NextClone

NextClone is a Nextflow pipeline to facilitate rapid extraction and quantification of clonal barcodes from both DNA-seq and scRNAseq data. DNA-seq data refers to dedicated DNA barcoding data which exclusively sequences the synthetic lineage tracing clone barcode reads using Next Generation Sequencing.

The pipeline comprises two distinct workflows, one for DNA-seq data and the other for scRNAseq data. Both workflows are highly modular and adaptable, with software that can easily be substituted as required, and with parameters that can be tailored through the nextflow.config file to suit diverse needs. It is heavily optimised for usage in high-performance computing (HPC) platforms.

Documentation

For instructions on how to use NextClone, please visit the user guide.

Modes

Whitelist mode (default)

Provide a list of known barcode sequences. Flexiplex maps all reads against the whitelist.

nextflow run main.nf --clone_barcodes_reference /path/to/barcodes.txt

Discovery mode

NextClone supports discovery mode, which identifies barcodes directly from the data without a pre-defined whitelist. This is useful when:

The exact barcode sequences are unknown
You are working with a new or custom clonal barcoding system
You want to validate or supplement a known barcode list

Discovery mode uses a two-pass approach powered by Flexiplex:

Pass 1 (Discovery): Run Flexiplex without a barcode list (-k flag) using strict flanking sequence matching (-f 0) to identify candidate barcodes.
Pass 2 (Mapping): Run Flexiplex with the discovered barcode list using standard edit distance parameters.

nextflow run main.nf --discovery_mode true

Barcode filtering in discovery mode

By default (filter_discovered_barcodes = false), all barcodes discovered in Pass 1 are passed to Pass 2, including singletons. This is recommended for lineage tracing experiments where rare clones are biologically meaningful.

Setting filter_discovered_barcodes = true applies flexiplex-filter knee-plot inflection filtering, which removes low-count barcodes. Use this only for noisy datasets — it will discard singleton and low-count clones:

nextflow run main.nf --discovery_mode true --filter_discovered_barcodes true

Parameters

Parameter	Default	Description
`mode`	`"scRNAseq"`	Workflow mode: `"scRNAseq"` or `"DNAseq"`
`clone_barcodes_reference`	—	Path to known barcode whitelist (required when `discovery_mode = false`)
`discovery_mode`	`false`	Enable two-pass barcode discovery mode
`filter_discovered_barcodes`	`false`	Apply knee-plot filtering to discovered barcodes (see above)
`barcode_edit_distance`	`2`	Maximum edit distance for barcode matching
`adapter_edit_distance`	`6`	Maximum edit distance for flanking adapter matching
`adapter_5prime`	—	5′ flanking adapter sequence
`adapter_3prime`	—	3′ flanking adapter sequence
`barcode_length`	`20`	Expected barcode length (bp)
`n_chunks`	`2`	Number of read chunks for parallel processing
`publish_dir`	`output/`	Output directory
`report_title`	—	Custom title for the HTML report (defaults to date-stamped title)

Output Files

NextClone generates the following files in your publish_dir:

File	Description
`all_barcodes.txt`	All discovered barcodes with counts (no filtering). Header: `#barcode\tcount`
`filtered_barcodes.txt`	Barcodes after filtering. Same as `all_barcodes.txt` if `filter_discovered_barcodes=false`
`clone_barcodes.csv`	Final clone assignments to cells (for downstream analysis)
`nextclone_qc_report.html`	Interactive QC dashboard
`run_log.txt`	Run parameters and command line (for reproducibility)

Note: all_barcodes.txt contains ALL barcodes discovered in Pass 1, including singletons. This is useful for debugging and QC.

HTML Reports

Standard report (auto-generated)

NextClone automatically generates an interactive HTML dashboard at the end of every run, saved to your publish_dir as nextclone_qc_report.html.

New in v2 (2026-04-09):

Clone overlap table — shared clones across samples at different thresholds (≥5, 10, 15, 20, 50, 100 cells)
Heterogeneity metrics — Gini coefficient and Shannon index for each sample
Clone size density plot — KDE-style curve showing clone size distribution
Reversed top 20 clones — largest clones now at top (easier to read)

All charts included:

Sample overview table (reads, cells, clones, Gini, Shannon)
Clone overlap across samples (new!)
Heterogeneity metrics summary (new!)
Ranked clone abundance (log scale, top 3 annotated)
Clone size density curve (new!)
Top 20 clones (horizontal bar, reversed, with % labels)
Edit distance QC (FlankEditDist & BarcodeEditDist)
Cross-sample clonality comparison

To set a custom title:

nextflow run main.nf --report_title "My Experiment — ZR751 2026"

Manual report generation (CLI)

You can also generate reports manually from any clone_barcodes.csv file:

# Basic usage
cd /path/to/nextclone/output
python3 /path/to/NextClone/reports/generate_report.py clone_barcodes.csv

# Custom output and title
python3 reports/generate_report.py clone_barcodes.csv \
  --output my_report.html \
  --title "ZR751 Clonal Analysis — 2026-04-09"

Command-line options:

python3 generate_report.py <input_csv> [OPTIONS]

Positional:
  input_csv              Path to clone_barcodes.csv from NextClone output

Options:
  --output FILE          Output HTML file (default: report.html)
  --title TEXT           Report title (default: "NextClone Report")
  --help                 Show help message

For full documentation, see reports/README.md.

Output Management

Recommended Usage

Always use timestamped output directories to prevent overwriting previous runs:

# DNA-seq mode
nextflow run main.nf \\
    --mode DNAseq \\
    --dnaseq_fastq_files /path/to/fastq \\
    --discovery_mode true \\
    --filter_discovered_barcodes false \\
    --publish_dir "results_DNAseq_$(date +%Y-%m-%d_%H-%M-%S)"

# scRNA-seq mode
nextflow run main.nf \\
    --mode scRNAseq \\
    --scrnaseq_bam_files /path/to/bams \\
    --discovery_mode true \\
    --filter_discovered_barcodes false \\
    --publish_dir "results_scRNAseq_$(date +%Y-%m-%d_%H-%M-%S)"

Example output:

results_DNAseq_2026-04-10_11-45-22/
├── all_barcodes.txt          # All discovered barcodes
├── filtered_barcodes.txt     # Filtered barcodes (same as above if filter=false)
├── clone_barcodes.csv        # Final clone assignments
├── nextclone_qc_report.html  # Interactive QC dashboard
└── run_log.txt               # Run parameters + software versions

When to Clear Work Directory

Clear work/ directory only when:

Updating NextClone code (to avoid cached old results)
Conda environments are corrupted
Debugging unexpected behavior

# Clear work directory
rm -rf work/

# Clear conda cache (if needed)
rm -rf /path/to/nextflow_local/conda_cache/

For routine runs: Keep work/ to save compute time (Nextflow caches task results).

Comparison report (manual)

To compare two runs side by side (e.g. reference mode vs discovery mode), use the comparison script after both runs are complete:

python3 reports/generate_comparison_report.py \
    /path/to/run_a/clone_barcodes.csv \
    /path/to/run_b/clone_barcodes.csv \
    --label-a "Reference" \
    --label-b "Discovery" \
    --output comparison_report.html \
    --title "Reference vs Discovery — My Experiment"

The comparison report shows:

Δ reads, cells, and clones between the two runs
Per-sample ranked abundance overlay (both modes, log-scale)
Clone size distribution side by side
Top clone overlap (concordance between modes)
Clonality metrics comparison (top1%, top3%, top10%)
Cell recovery validation across samples

No pip installs required. Both report scripts use Python stdlib only, with Chart.js loaded via CDN.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
bin		bin
conda_env		conda_env
data		data
modules		modules
reports		reports
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Nextclone_diagram_v5.png		Nextclone_diagram_v5.png
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NextClone

Documentation

Modes

Whitelist mode (default)

Discovery mode

Barcode filtering in discovery mode

Parameters

Output Files

HTML Reports

Standard report (auto-generated)

Manual report generation (CLI)

Output Management

Recommended Usage

When to Clear Work Directory

Comparison report (manual)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NextClone

Documentation

Modes

Whitelist mode (default)

Discovery mode

Barcode filtering in discovery mode

Parameters

Output Files

HTML Reports

Standard report (auto-generated)

Manual report generation (CLI)

Output Management

Recommended Usage

When to Clear Work Directory

Comparison report (manual)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages