Hi! Love this tool and how easy it is to use. I've previously used Gtotree to make large phylogenies of 1000s of genomes. As long as I had the compute capacity in the past, I've never had any issues. Recently I wanted to rerun some old analysis with a larger genome dataset (about 100,000 genomes). I tried to run Gtotree with the genomes (all from NCBI refseq) on our hpc (should more be than enough) and it completes in a reasonable time 36-48 hours. However, both times I tried, the individual alignments appear empty and the concatenated alignments are just a bunch of XXXs despite Genbank_genomes_summary_info.tsv providing info on gene hits and filtering.
GToTree -g filenames.txt -H Bacteria -n 24 -j 10 -k -X -G 0.2 -N -o GToTree_output
My best guess is maybe a small number of genomes are trash (bad cds or something) and causing the issue. But figured it would be good to check and see if you have any insight.
Thanks!!
Hi! Love this tool and how easy it is to use. I've previously used Gtotree to make large phylogenies of 1000s of genomes. As long as I had the compute capacity in the past, I've never had any issues. Recently I wanted to rerun some old analysis with a larger genome dataset (about 100,000 genomes). I tried to run Gtotree with the genomes (all from NCBI refseq) on our hpc (should more be than enough) and it completes in a reasonable time 36-48 hours. However, both times I tried, the individual alignments appear empty and the concatenated alignments are just a bunch of XXXs despite Genbank_genomes_summary_info.tsv providing info on gene hits and filtering.
My best guess is maybe a small number of genomes are trash (bad cds or something) and causing the issue. But figured it would be good to check and see if you have any insight.
Thanks!!