- This directory contains scripts used to analyse scRNAseq data from the Cambridge and TUM cohorts.
- Single cell sequencing data were generated using 10X 5' and VDJ single cell sequencing technology from CSF and PBMC samples in a cohort of people with Multiple Sclerosis and other neurological disease controls.
- All analyses were run on the Cambridge Slurm HPC.
- Unless specified, all scripts were run in R/4.0.3 or R/4.1.0.
- You can access the paper in Cell Reports Medicine here.
- Raw data (Seurat objects containing 5' gene expression and VDJ receptor sequencing data for B and T cells) can be downloaded via Zenodo here.
- Code contributors: Ben Jacobs and Christiane Gasperi
sbatch gex_deconvolution_qc_step1.sh
This script runs Rscript gex_deconvolution_qc.R which performs the following QC steps on each batch of GEX data
- Filtering by RNA count & MT%
- Ambient RNA correction with SoupX
- Doublet identification
- Per-batch normalisation with SCTransform
sbatch integration_icelake.sh
This script runs Rscript integration.R which integrates the post-qc datasets using Harmony.
sbatch umap_icelake.sh
This script runs Rscript umap.R which performs UMAP and Louvain clustering across a range of parameters.
sbatch cluster_biomarkers_icelake.sh
This script runs Rscript find_cluster_biomarkers_and_update_pheno.R which does the following
- Cleans phenotypes
- Makes some plots exploring clustering
- Compares annotations of cell types across different methods (SingleR, Azimuth, Celltypist)
- Calculates cluster-specific biomarkers
sbatch celltypist.sh
- Which splits clusters with
Rscript celltypist_prep.Rand then runs celltypist in each cluster.
sbatch update_clusters.sh
- Which runs
Rscript update_cluster_labels.Rto update cluster IDs
To run DE and DA using the broad clusters:
sbatch de_icelake.sh
- Runs DE tests with
Rscript de_da_tests_phenotypes.R - Summary plots then made with
Rscript de_summary_plots.R
Then to run GSEA on the broad clusters with those DE results:
sbatch gsea.sh
sbatch pathway_analysis.sh
sbatch ccc_per_sample.sh
- Which runs LIANA on a per-sample basis
Rscript Rscript ccc_overall.Rthen combines and explores these results
These scripts prepare the VDJ data for QC with dandelion
Rscript dandelion_preparation.R TCR
Rscript dandelion_preparation.R BCR
Rscript dandelion_preparation2.R TCR
Rscript dandelion_preparation2.R BCR
Rscript make_dandelion_metafile.R
And then to run dandelion pre-processing:
sbatch dandelion_bcr.sh
sbatch dandelion_tcr.sh
These scripts then filter the GEX data based on the QC'd VDJ data:
sbatch dandelion_filtering_tcr.sh
sbatch dandelion_filtering.sh
Rscript bcr_analysis.R
Rscript tcr_analysis.R
./eQTL_analysis/MasterScript.sh contains the eQTL pipeline and refers to scripts in ./eQTL_analysis/.