Phaseless is designed for genotype imputation and admixture inference using low coverage sequencing data. Firstly, the imputation model is in the spirit of fastPHASE model but with genotype likelihood as input, and likewise STITCH works on raw reads. Next, the admixture inference is modeled on the haplotype cluster information from the fastphase model.
git clone https://github.com/Zilong-Li/phaseless
make -j6phaseless owns subcommands. please use phaseless -h to check it out.
The parallelism of phaseless impute is designed for impute the whole genome at once, which means it run multiple chunks in parallel with each taken over by a thread. Check out the --chunksize option.
phaseless impute -g data/bgl.gz -c 10 -n 4 -s 100000However, one might only be interested in imputing a single chunk for whatever reason. To change the behavior of parallelism and make it running in parallel for single chunk, we can use --single-chunk option to toggle the behavior.
phaseless impute -g data/bgl.gz -c 10 -n 4 -SWith the binary file outputted by the above impute command, we can run admixture inference for different k ancestry.
phaseless admix -b impute.pars.bin -k 3 -n 4Besides, we can investigate and manipulate the parameters from fastPHASE model using the binary file outputted by impute command.
phaseless parse -b impute.pars.bin -c 0 ## single chunk, all samples
phaseless parse -b impute.pars.bin -c -1 -s samples.txt ## all chunks, specifc samplesNow, we can do some interesting plotting.
./misc/plot_haplotype_cluster.RWithout specifying the output prefix -o, the output filenames of the above commands are as follows:
❯ tree -L 1
.
├── admix.Q
├── admix.log
├── parse.haplike.bin
├── parse.log
├── impute.recomb
├── impute.pi
├── impute.vcf.gz
├── impute.pars.bin
└── impute.logcheck out the news file.
