I ran a multi-sequence alignment (MSA) using Progressive Cactus (of ~30 species in two closely-related genera), converted the .hal to a .maf file (using --noAncestors, --dupeMode single, --filterGapCausingDupes), and split the .maf file by 1 Mb windows using PHAST msa_split on the reference genome (output was .fasta file). Then, I ran phyloFit (specifying --EM) on each 1 Mb .fasta file of the multi-sequence alignment and phastCons on the output (--msa-format FASTA --target-coverage 0.3 --expected-length 45 --rho 0.3 --viterbi). The most conserved bed file created from flag --viterbi covers most of the reference, which is unexpected. Since the most conserved regions are context-dependent since it's calculated using the hidden markov model, would the most conserved bed file have less sequences if I ran phastCons on the full alignment? Can I rely on the results I get from running it on 1 Mb windows?
My most conserved .bed files identifies short regions (a couple hundred kb) that are overlapping.