PreDigsR is an automated method for cell type assignment of human digestive system with single cell RNA sequencing data
-
Species: support both
humanandmouse. -
Level: support query with either
single cellsor knowncell group labels.
This is an R package.
You can install the released version of predigs from CRAN with:
install.packages("predigsr")And the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("jymeng-monica/predigsr")suppressMessages(library(Seurat))
suppressMessages(library(harmony)) # Version 0.1.0 is recommended
suppressMessages(library(dplyr))
suppressMessages(library(predigsr))
suppressMessages(library(ComplexHeatmap))
suppressMessages(library(ggalluvial)) # optional
suppressMessages(library(circlize))The required query data format of predigs recommends normalized Seurat object,i.e., with seurat@assays$RNA@data.
load("./NormalizedSeuratObject.RData") # querydataThe seurat object with only count matrix are supported to run Seurat Workflow:
load("./RawSeuratObject.RData") # querydata
querydata <- get_seurat_workflow(querydata,stim = "batch_group_label")The input includes querydata and the colnames of cell_group_label in querydata@meta.data(optional)
# 1.For human query data at single cell levels
query_df <- get_query_preprocess(querydata,species = "human",clusterlevel = FALSE)
# 2.For human query data at group levels
query_df <- get_query_preprocess(querydata,species = "human",cellgroup="cell_ontology_class",clusterlevel = TRUE)
# 3.For mouse query data at single cell levels
query_df <- get_query_preprocess(querydata,species = "mouse",clusterlevel = FALSE)
# 4.For mouse query data at group levels
query_df <- get_query_preprocess(querydata,species = "mouse",cellgroup="cell_ontology_class",clusterlevel = TRUE)
User should generate both a query_df list and a querydata list who have multiple query datasets
querydata_list <_ list(querydata1,querydata2)
query_df_list <- list(query_df_1,query_df_2)User are required to specify the Organ Type name and Tissue Type name for their scRNAseq query data.
-
Organ Type : intestine/pancrea/stomach/liver/esophagus
-
Tissue Type : cancer/health
User can obtain corresponding internal reference dataset name of predigs and load it in R
reference_dataset_name <- get_reference_setname("pancrea","health")
> "ref_health_pancrea_cor_list"
data(ref_health_pancrea_cor_list,package="predigs") query_predigs_score_list <- lapply(1:length(querydata_list),function(x){
get_predigs_score_df(reference_cor_list = ref_health_pancrea_cor_list,
querydata_cor_list = querydata_list,i=x)# For data query at single cell level
query_predigs_label_table_list <- get_predigs_label_table(query_predigs_score_list,clusterlevel = FALSE)
# For data query at group level
query_predigs_label_table_list <- get_predigs_label_table(query_predigs_score_list,clusterlevel = TRUE)- Heatmap Plot
get_predigs_label_visualization(query_predigs_score_list[[1]])- Sankey Digram Plot
In this process,user are required to prepare a two-column data frame with the first column of
predicted cell labeland the second column of theirinterested groupsuch asground truthof cells. And the length ofrownamesshould be the same ascell numbersof query data.
head(sankry_df)
#> predicted_type ground_truth
#> 1 Acinar.cell pancreatic acinar cell
#> 2 CD8T.cell t cell
#> 3 Endothelial.cell endothelial cell
# plot
get_predigs_label_visualization_sankey(sankey_df,GroupNumberBalance = FALSE,cols= NULL)User can browse the Sankey plot with a balanced group number like this:
get_predigs_label_visualization_sankey(sankey_df,GroupNumberBalance = TRUE,cols= NULL)User can specify cols if they need by :
cols <- c("#E5D2DD","#53A85F","#F1BB72","#F3B1A0","#D6E7A3"....."#57C3F3")
#get_predigs_label_visualization_sankey(sankey_df,GroupNumberBalance = TRUE,cols= cols)
#get_predigs_label_visualization_sankey(sankey_df,GroupNumberBalance = FALSE,cols= NULL)© 2022.12.31 Monica Meng.Zhang Lab. All rights reserved.



