Hi,
I have been working on replicating the deconvolution results presented in your paper using the DT-Integration method. Despite my efforts, which included experimenting with and without quality control (QC) steps, as well as utilizing the dt.count_cell_annotations function to evaluate the outcomes, I have not achieved results of comparable quality to those reported in your study.
Could you provide more detailed information on the data processing pipeline used in your research? Specifically, I am interested in understanding:
- The parameters and criteria used for QC.
- The specific settings and parameters applied during the deconvolution process.
- If available, the code or workflow that was utilized to generate Figure 2 from the dataset described in your paper.
This information would be invaluable in helping me understand any differences between my approach and the methods employed in your study. Thank you very much for your time and assistance.
import sys
import os
import random
import scanpy as sc
import squidpy as sq
import numpy as np
import pandas as pd
from anndata import AnnData
import pathlib
import matplotlib.pyplot as plt
import matplotlib as mpl
import skimage
import json
import seaborn as sns
import copy
from orderedset import OrderedSet
import DeepTalk_ST as dt
sc_data=sc.read_h5ad("./data/res/starmap/starmap_sc_adata.h5ad")
st_data=sc.read_h5ad("./data/res/starmap/starmap_st_adata.h5ad")
print(sc_data)
print(st_data)
sc_data.var["mt"] = sc_data.var_names.str.startswith(r'^(mt-|MT-)')
sc.pp.calculate_qc_metrics(sc_data, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
sc_data = sc_data[sc_data.obs['pct_counts_mt'] < 5]
sc_data = sc_data[sc_data.obs['n_genes_by_counts'] < 12000]
sc_data = sc_data[sc_data.obs['total_counts'] < 3*1e6]
sc.pp.filter_cells(sc_data, min_genes=200)
sc.pp.filter_genes(sc_data, min_cells=3)
sc_data_raw = copy.deepcopy(sc_data)
sc.pp.normalize_total(sc_data)
sc.pp.log1p(sc_data)
st_data_raw = copy.deepcopy(st_data)
sc.pp.normalize_total(st_data)
sc.pp.log1p(st_data)
sc.tl.pca(sc_data, svd_solver='arpack')
sc.pp.neighbors(sc_data, n_neighbors=10, n_pcs=40)
sc.tl.umap(sc_data)
st_data.obs["cell_count"] = st_data.obs["cellCounts"]
sc.tl.rank_genes_groups(sc_data, groupby="celltype", use_raw=False)
markers_df = pd.DataFrame(sc_data.uns["rank_genes_groups"]["names"]).iloc[0:200, :]
markers = list(np.unique(markers_df.melt().value.values))
len(markers)
dt.pp_adatas(sc_data, st_data, genes=markers)
st_data.obsm['spatial']=st_data.obs[['x','y']]
ad_map = dt.map_cells_to_space(
adata_sc=sc_data,
adata_sp=st_data,
learning_rate=0.0005,
num_epochs=10000,
#device='cpu',
device='cuda:0',
)
dt.project_cell_annotations(ad_map, st_data, annotation="celltype")

Hi,
I have been working on replicating the deconvolution results presented in your paper using the DT-Integration method. Despite my efforts, which included experimenting with and without quality control (QC) steps, as well as utilizing the dt.count_cell_annotations function to evaluate the outcomes, I have not achieved results of comparable quality to those reported in your study.
Could you provide more detailed information on the data processing pipeline used in your research? Specifically, I am interested in understanding:
This information would be invaluable in helping me understand any differences between my approach and the methods employed in your study. Thank you very much for your time and assistance.