Skip to content

BioMedBigDataCenter/predigsr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PreDigsR

Lifecycle: experimental Project Status: Active - The project has reached a stable, usable state and is being actively developed. CRAN_Status_Badge

PreDigsR is an automated method for cell type assignment of human digestive system with single cell RNA sequencing data

  • Species: support both human and mouse.

  • Level: support query with either single cells or known cell group labels.

Installation

This is an R package.

You can install the released version of predigs from CRAN with:

install.packages("predigsr")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("jymeng-monica/predigsr")

Library

suppressMessages(library(Seurat))
suppressMessages(library(harmony)) # Version 0.1.0 is recommended
suppressMessages(library(dplyr))
suppressMessages(library(predigsr))
suppressMessages(library(ComplexHeatmap))
suppressMessages(library(ggalluvial)) # optional
suppressMessages(library(circlize))

Load Query and Reference Data

Load Query Seurat Object

The required query data format of predigs recommends normalized Seurat object,i.e., with seurat@assays$RNA@data.

load("./NormalizedSeuratObject.RData") # querydata

The seurat object with only count matrix are supported to run Seurat Workflow:

load("./RawSeuratObject.RData") # querydata
querydata <- get_seurat_workflow(querydata,stim = "batch_group_label")

Preprocess Query Data before Predigs Prediction

The input includes querydata and the colnames of cell_group_label in querydata@meta.data(optional)

# 1.For human query data at single cell levels
query_df <- get_query_preprocess(querydata,species = "human",clusterlevel = FALSE)

# 2.For human query data at group levels
query_df <- get_query_preprocess(querydata,species = "human",cellgroup="cell_ontology_class",clusterlevel = TRUE)

# 3.For mouse query data at single cell levels
query_df <- get_query_preprocess(querydata,species = "mouse",clusterlevel = FALSE)

# 4.For mouse query data at group levels
query_df <- get_query_preprocess(querydata,species = "mouse",cellgroup="cell_ontology_class",clusterlevel = TRUE)

User should generate both a query_df list and a querydata list who have multiple query datasets

querydata_list <_ list(querydata1,querydata2)
query_df_list <- list(query_df_1,query_df_2)

Load Preprocessed Reference data

User are required to specify the Organ Type name and Tissue Type name for their scRNAseq query data.

  • Organ Type : intestine/pancrea/stomach/liver/esophagus

  • Tissue Type : cancer/health

User can obtain corresponding internal reference dataset name of predigs and load it in R

reference_dataset_name <- get_reference_setname("pancrea","health")
> "ref_health_pancrea_cor_list"
data(ref_health_pancrea_cor_list,package="predigs")

Cell Type Assignment with PreDigs

Cell Similarity Score Calculation

 query_predigs_score_list <- lapply(1:length(querydata_list),function(x){
     get_predigs_score_df(reference_cor_list = ref_health_pancrea_cor_list,
                                          querydata_cor_list = querydata_list,i=x)

Ultimate Predicted Cell Type Label Output

# For data query at single cell level
 query_predigs_label_table_list <- get_predigs_label_table(query_predigs_score_list,clusterlevel = FALSE)
 
# For data query at group level
 query_predigs_label_table_list <- get_predigs_label_table(query_predigs_score_list,clusterlevel = TRUE)

The visualization of the Assignment Result

  • Heatmap Plot
get_predigs_label_visualization(query_predigs_score_list[[1]])

  • Sankey Digram Plot In this process,user are required to prepare a two-column data frame with the first column of predicted cell label and the second column of their interested group such as ground truth of cells. And the length of rownames should be the same as cell numbers of query data.
head(sankry_df)
#>          predicted_type           ground_truth
#> 1      Acinar.cell pancreatic acinar cell
#> 2        CD8T.cell                 t cell
#> 3 Endothelial.cell       endothelial cell


# plot 
get_predigs_label_visualization_sankey(sankey_df,GroupNumberBalance = FALSE,cols= NULL)

User can browse the Sankey plot with a balanced group number like this:

get_predigs_label_visualization_sankey(sankey_df,GroupNumberBalance = TRUE,cols= NULL)

User can specify cols if they need by :

cols <- c("#E5D2DD","#53A85F","#F1BB72","#F3B1A0","#D6E7A3"....."#57C3F3")
#get_predigs_label_visualization_sankey(sankey_df,GroupNumberBalance = TRUE,cols= cols)
#get_predigs_label_visualization_sankey(sankey_df,GroupNumberBalance = FALSE,cols= NULL)

© 2022.12.31 Monica Meng.Zhang Lab. All rights reserved.

About

Automated Cell Annotation of the scRNAseq Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages