-
Notifications
You must be signed in to change notification settings - Fork 155
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Hello,
When using subsetArchRProject to subset by cells, I noticed that if I used the resulting ArrowFiles there were more cells than I expected. After investigating, I found that this was caused by repeated 10X barcodes between samples - these had unique cell names of the form [sample]#[barcode], but this information is stripped when the ArrowFiles are copied over (see this line of code).
Current behavior: ArrowFiles are subsetted to any cells that have matching barcodes, even if that is between samples.
Expected behavior: ArrowFiles are subsetted to exactly the cells you specify
See below for a reproducible example with the tutorial dataset.
library(ArchR)
library(parallel)
inputFiles <- getTutorialData("Hematopoiesis")
addArchRGenome("hg19")
addArchRThreads(threads = 16)
ArrowFiles <- createArrowFiles(
inputFiles = inputFiles,
sampleNames = names(inputFiles),
minTSS = 4,
minFrags = 1000,
addTileMat = FALSE,
addGeneScoreMat = FALSE
)
projHeme1 <- ArchRProject(
ArrowFiles = ArrowFiles,
outputDirectory = "HemeTutorial",
copyArrows = TRUE
)
barcodes <- sapply(strsplit(getCellNames(projHeme1),'#'), '[')[2,]
cell_subset <- getCellNames(projHeme1)[barcodes %>% duplicated(fromLast = TRUE)]
print(length(cell_subset)) # 43
projSubset <- subsetArchRProject(
ArchRProj = projHeme1,
cells = cell_subset,
outputDirectory = "ArchRSubset",
dropCells = TRUE,
force = TRUE
)
print(nCells(projSubset)) # has the expected 43 cells
print(lapply(getArrowFiles(projSubset), nCells) %>% unlist %>% sum) # 70
# ArrowFiles of this project have 70 cells
test_subset <- ArchRProject(getArrowFiles(projSubset), outputDirectory = 'test_proj/')
print(nCells(test_subset))
# also has 70 cells
sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /lila/home/godovid/miniconda3/envs/workshop_2024/lib/libopenblasp-r0.3.25.so; LAPACK version 3.11.0
Random number generation:
RNG: L'Ecuyer-CMRG
Normal: Inversion
Sample: Rejection
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] parallel stats4 grid stats graphics grDevices utils
[8] datasets methods base
other attached packages:
[1] BSgenome.Hsapiens.UCSC.hg19_1.4.3 BSgenome_1.70.2
[3] rtracklayer_1.62.0 BiocIO_1.12.0
[5] Biostrings_2.70.3 XVector_0.42.0
[7] rhdf5_2.46.1 SummarizedExperiment_1.32.0
[9] Biobase_2.62.0 RcppArmadillo_0.12.8.3.0
[11] Rcpp_1.0.13 Matrix_1.6-5
[13] GenomicRanges_1.54.1 GenomeInfoDb_1.38.8
[15] IRanges_2.36.0 S4Vectors_0.40.2
[17] BiocGenerics_0.48.1 sparseMatrixStats_1.14.0
[19] MatrixGenerics_1.14.0 matrixStats_1.3.0
[21] data.table_1.15.4 stringr_1.5.1
[23] plyr_1.8.9 magrittr_2.0.3
[25] ggplot2_3.5.1 gtable_0.3.5
[27] gtools_3.9.5 gridExtra_2.3
[29] devtools_2.4.5 usethis_2.2.2
[31] ArchR_1.0.3
loaded via a namespace (and not attached):
[1] bitops_1.0-7 remotes_2.4.2.1 rlang_1.1.4
[4] compiler_4.3.2 callr_3.7.3 vctrs_0.6.5
[7] profvis_0.3.8 pkgconfig_2.0.3 crayon_1.5.2
[10] fastmap_1.2.0 ellipsis_0.3.2 utf8_1.2.4
[13] Rsamtools_2.18.0 promises_1.3.0 sessioninfo_1.2.2
[16] ps_1.7.6 purrr_1.0.2 zlibbioc_1.48.2
[19] cachem_1.0.8 jsonlite_1.8.8 later_1.3.2
[22] rhdf5filters_1.14.1 DelayedArray_0.28.0 BiocParallel_1.36.0
[25] uuid_1.2-0 Rhdf5lib_1.24.2 prettyunits_1.2.0
[28] R6_2.5.1 stringi_1.8.4 pkgload_1.3.4
[31] IRkernel_1.3.2 base64enc_0.1-3 httpuv_1.6.15
[34] tidyselect_1.2.1 yaml_2.3.8 abind_1.4-5
[37] codetools_0.2-19 miniUI_0.1.1.1 processx_3.8.3
[40] pkgbuild_1.4.2 lattice_0.22-5 tibble_3.2.1
[43] shiny_1.8.1.1 withr_3.0.1 evaluate_0.23
[46] urlchecker_1.0.1 pillar_1.9.0 generics_0.1.3
[49] RCurl_1.98-1.14 IRdisplay_1.1 munsell_0.5.1
[52] scales_1.3.0 xtable_1.8-4 glue_1.7.0
[55] tools_4.3.2 GenomicAlignments_1.38.2 pbdZMQ_0.3-11
[58] fs_1.6.4 XML_3.99-0.16.1 colorspace_2.1-1
[61] GenomeInfoDbData_1.2.11 repr_1.1.7 restfulr_0.0.15
[64] cli_3.6.3 fansi_1.0.6 S4Arrays_1.2.1
[67] dplyr_1.1.4 digest_0.6.35 SparseArray_1.2.4
[70] rjson_0.2.21 htmlwidgets_1.6.4 memoise_2.0.1
[73] htmltools_0.5.8.1 lifecycle_1.0.4 mime_0.12
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working