MRGV: Mouse Reference Gut Virome
MRGV provides 109,778 high-confidence viral genomes representing 28,824 species-level vOTUs, together with over 46% of 1.3 million non-redundant viral protein sequences annotated using structure-informed PHROG assignments.
You can access and browse all MRGV data and information in https://www.decodebiome.org/MRGV/
Kim, H.J. et al (2026). Incorporating viral genome binning in a mouse gut virome catalog enables accurate age prediction in preparation
MRGV consists of 109,778 high-confidence vMAGs, represented by 28,824 species-level vOTUs
MRGV construction pipeline
03. MRGV Protein clusters
Data
Description
Link
MRGV_PC_ID100.tar.gz
A total of 1,376,499 CDS and metadata, clusterd with 100% AAI
Click to download (223.4MB)
MRGV_PC_ID90 DB.tar.gz
A total of 954,585 CDS and metadata, clusterd with 90% AAI
Click to download (147.5.4MB)
MRGV_PC_ID70 DB.tar.gz
A total of 746,733 CDS and metadata, clusterd with 70% AAI
Click to download (115.9.4MB)
MRGV_PC_ID50 DB.tar.gz
A total of 652,176 CDS and metadata, clusterd with 70% AAI
Click to download (102.0MB)
MRGV_PC_ID30 DB.tar.gz
A total of 625,774 CDS and metadata, clusterd with 70% AAI
Click to download (97.3MB)
Scripts for MRGV anlaysis (Directory: Codes)
HumanDecontamination.py : Removal human reads using bowtie2
Trimmomatic.py : Trimming adaptors and filter low qualited reads using Trimmomatic
MEGAHIT.py : Running MEGAHIT for read assembly
MetaSPAdes.py : Running MetaSPAdes fro read assembly
DeepVirFinder.py : Running DeepVirFinder and filtering confident viral contigs
Phigaro.py : Running Phigaro to predict Prophage from assemblies
VIBRANT.py : Running VIBRANT to predict viral contigs and lifestyle
Vclust.py : Running VClust for sample-wise deduplication of viral contigs from DeepVirFinder, Phigaro and VIBRANT, using UCLUST
GeNomad.py : Running GeNomad on the deduplicated viral contigs for revalidation
VirRep.py : Running VirRep on the deduplicated viral contigs for revalidation
GetCoverage.py : Computing sample-wise read coverage profile using bowtie2
GetMetabat2Depth.py : Generating Metabat2 Depth format tables
GenerateCovTable.py : Generating vRhyme coverage table from Metabat2 Depth table
MetaBat2.py : Running Metabat2 for viral binning on viral contigs
Semibin2.py : Running Semibin2 for viral binning on viral contigs
vRhyme.py : Running vRhyme for viral binning on viral contigs
BinConsolidate.py : Sample-wise consolidation of bins from Metabat2, Semibin2 and vRhyme
Pharokka.py : Running Pharokka to generate initial annotated GenBank table
PholdPredict.py : Running Phold Predict to predict 3Di embeddings using FrostT5 model
PholdCompare.py : Running Phold Compare to find the hits using foldseek
LinClust.py : Running Linclust in MMSeq2 to generate protein clusters
Minimap2.py : Running minimap to align short reads to viral genomes
CoverM.py : Running CoverM to calculate alignment coverage
UPGMA.rs : Conduct UPGMA clustering of genomes based on taxonomic rank delineation criteria
KendallTau.py : Compute Kendall Tau and pvalue, and generating Kendall distance matrix
Uniqueness.py : Calculate Uniqueness based on distance matrix, with/without cage mates
Maaslin2.R : Running Masslin2 to extract significantly differential viral taxa
XGBoostRegressor.py : Running XGBoostRegressor to predict mice ages using viral genus abundance table