I'm excited to use this tool, but it's been a struggle to get it to work for me. I think the issue is my input files because I can run the example dataset.
I am working with a Seurat object. I've exported the markers, umap dimensions and cluster calls as tab separated text. I make a call to Comet from the command line it and looks like it's running. It certainly is from top. It goes for about 3 hours. An output folder is created. But the folder is empty. It doesn't throw any specific errors but it does show the following during run-time:
Creating discrete expression matrix...
Insufficient floating point precision for calculating or reporting the exact XL-mHG test statistic; the true value is too small. Using "0" instead.(The XL-mHG p-value will also be reported as "0".)
Insufficient floating point precision for calculating or reporting the exact XL-mHG test statistic; the true value is too small. Using "0" instead.(The XL-mHG p-value will also be reported as "0".)
I am not sure what the problem is. Is there a stderr or log file to see what's going on?
Also, relatedly, the docs would benefit greatly from a tutorial showing how to get the input files out from a Seurat object, since that is such a common procedure.
Here is my code to get the input files out from Seurat and to the command line.
# matrix
matrix_cometsc <- GetAssayData(so) # so is Seurat Object
write.table(as.matrix(matrix_cometsc), file=here("data", "COMETSC", "markers.txt"), row.names=TRUE, col.names=TRUE, sep = "\t", quote = FALSE)
#UMAP embeddings
umap_cometsc <- Embeddings(so, reduction = "umap")
write.table(umap_cometsc, file=here("data", "COMETSC", "vis.txt"), row.names=TRUE, col.names=FALSE, sep = "\t", quote = FALSE)
#cluster IDs
cluster_cometsc <- noquote(as.matrix(Idents(so)))
write.table(cluster_cometsc, file=here("data", "COMETSC", "cluster.txt"), row.names=TRUE, col.names=FALSE, sep = "\t", quote = FALSE)
Part of the issue is with the marker (matrix) because of that first tab above the row names. I had to manually add it like this:
sed '1s/.*/\t&/' markers.txt > markers2.txt
Also, my command to Comet is the following:
#! /bin/bash
source ~/comet/bin/activate
Comet markers2.txt vis.txt cluster.txt -C 16 -K 4 -Count true output/
And for some reference, here is a sample of markers2.txt with the tabs indicated by ^I
^ID1_TTCAGGATCAAGCCAT^ID1_GTGGAGATCTGCTTAT^ID1_GCACGGTCACTCAGAT^ID1_TATACCTGTCTTACTT
MIR1302-2HG^I0^I0.0766241526725224^I0^I0
FAM138A^I0^I0^I0^I0
OR4F5^I0^I0^I0^I0
AL627309.1^I0.103146952196364^I0.0766241526725224^I0.0823802232731239^I0.0918193591402592
AL627309.3^I0^I0^I0^I0
AL627309.2^I0^I0^I0^I0
AL627309.4^I0^I0^I0^I0
AL732372.1^I0^I0^I0^I0
I'm excited to use this tool, but it's been a struggle to get it to work for me. I think the issue is my input files because I can run the example dataset.
I am working with a Seurat object. I've exported the markers, umap dimensions and cluster calls as tab separated text. I make a call to
Cometfrom the command line it and looks like it's running. It certainly is fromtop. It goes for about 3 hours. An output folder is created. But the folder is empty. It doesn't throw any specific errors but it does show the following during run-time:I am not sure what the problem is. Is there a
stderror log file to see what's going on?Also, relatedly, the docs would benefit greatly from a tutorial showing how to get the input files out from a Seurat object, since that is such a common procedure.
Here is my code to get the input files out from Seurat and to the command line.
Part of the issue is with the marker (matrix) because of that first tab above the row names. I had to manually add it like this:
Also, my command to
Cometis the following:And for some reference, here is a sample of
markers2.txtwith the tabs indicated by^I