PRSedm

Overview

PRSedm (Polygenic Risk Score Extension for Diabetes Mellitus) is a flexible and extendable open-source package for efficient local and remote (All of Us, UK Biobank, etc.) generation of published Polygenic Risk Scores (PRS) for Diabetes Mellitus (DM) and related cardiometabolic phenotypes.

PRSedm introduces a new parallelized "one-liner" method to generate standardized PRS and pPS for DM robust to variables such as genotyping method, quality control, and imputation panel.

Updates (v1.3.0)

Improved multiallelic SNP handling and fixed related bugs
Significant performance improvements from optimized SNP batching and parallelism
Simplified command-line interface and argument structure
Fixed bugs and syntax issues in the SNP database backend
PRS metadata is now fetched automatically alongside the SNP database
Per-PRS variant log files are now generated, capturing metrics such as INFO/R², missing variants, and allele frequency
Renamed "imputation" feature to "estimate" to distinguish from genotype imputation
Added support for custom proxy variant substitution via --proxy

New proxy feature

PRSedm supports optional variant substitution via a user-supplied proxy file (--proxy). Required format (whitespace-delimited):

target_rsid target_contig_id target_position target_effect_allele sub_rsid sub_contig_id sub_position sub_effect_allele
rs12345     chr1             1234567         A                     rs54321  chr1           1234999      A
rs23456     chr2             7654321         G                     rs65432  chr2           7654000      G

Installation

Dependencies

PRSedm requires the following packages:

Python (>=3.9), Joblib (>=1.3.2), Pandas (>=2.2.3), Pysam (>=0.22.0), Numpy* (2.x/1.x)

*Build with 1.x when deploying to RAP platforms with 1.x dependencies.

User Installation

PIP: pip install prsedm
Anaconda: conda install sethsh7::prsedm
Build from source: python -m build

Usage

PRSEDM can be called from the command line:

prsedm --vcf <path_to_vcf_file> [options]

PRSEDM can be also be called from Python:

import prsedm
df = prsedm.gen_dm(
    vcf=vcf,
    ... 
)

Options

--vcf (required): Path to an indexed VCF or BCF file, or a text file mapping one VCF/BCF per contig.
--col: Genotype column to score (default: GT, options: GT for WGS, GP for imputed data).
--build: Genome build to use (default: hg38, options: hg19, hg38).
--scores (required): Comma-separated list of PRS to generate, e.g., t1dgrs2-luckett25,t2dp-udler18.
--estimate (optional): Path to indexed reference VCF/BCF, or text file mapping one VCF/BCF per contig. Used to estimate missing variants and enable normalization when variants are absent.
--ntasks (optional): Number of tasks to use (default: 1).
--batch-size (optional): Number of variants per batch (default: 5000).
--output: Path to save the output file (default: results.csv).
--full: Include individual variant scores with PRS name prepended.
--getsql: Download or locate the PRS SQL database (variants.db) and metadata JSON (prs_meta.json) and exit.

Single file per-chromosome loading

For --vcf and --estimate you can point to a single text file mapping per contiguous region formatted as such (whitespace delimited):

chr1   file1.chr1.vcf.gz
chr2   file2.chr2.vcf.gz
...

PRS Database and metadata

The database containing PRS designs and metadata is hosted at https://zenodo.org/records/17903390 and downloaded automatically. PRSedm will first check the environment variables PRSEDM_SQL_PATH and PRSEDM_META_PATH for custom databases.

Research Analysis Platforms (RAP's)

Remote deployment to remote Research Analysis Platforms (RAP's) is possible via notebook wrappers:

All of Us (WGS) - Notebook Here
UK Biobank (imputed WGS) - Notebook Here
UK Biobank (imputed array) - Notebook Here

List of available PRS

Type 1 Diabetes

Flag	Method	Variants	Description	PMID
`t1dgrs2-luckett25`	HLA Interaction + Partitioned	67	"GRS2x" updated PRS with widest compatibility and HLA-based risk pPS.	40267362
`t1dgrs2-qu22`	HLA Interaction + Partitioned	71	Original "GRS2" PRS with the addition of 4 African ancestry SNPs from Onengut, proposed in Qu et al and utilized in eMERGE.	34997821
`t1dgrs2-sharp21`	HLA Interaction + Partitioned	67	Version of "GRS2" PRS designed for "TOPMED-R2" from 2021 GitHub.	35312757
`t1d-onengut19-afr`	Additive	6	African-ancestry PRS proposed by Onengut in 2019, updated for modern compatibility.	30659077
`t1dgrs2-sharp19`	HLA Interaction + Partitioned	67	Original 1000 Genomes version of "GRS2" PRS as published, with limited modern compatibility.	30655379

Type 2 Diabetes

Flag	Method	Variants	Description	PMID
`t2d-suzuki24-prscsx-ma`	Additive + Partitioned	~1 M	Full genome-wide multi-ancestry PRS for Suzuki (PRS-CSx meta).	38374256
`t2d-suzuki24-prscsx-<ancestry>`	Additive + Partitioned	>500k	Full genome-wide ancestry-specific PRS for Suzuki (PRS-CSx), where `<ancestry>` is one of `eur`, `afr`, `eas`, `sas`, `safr`, or `his`.	38374256
`t2dp-suzuki24-ma`	Additive + Partitioned	1289	Multiancestry weighted Suzuki T2D index variant PRS, and pPS from hard-clustering analyses.	38374256
`t2dp-suzuki24-<ancestry>`	Additive + Partitioned	1128 - 1285	As above but weighted for specific ancestries `<eur/afr/safr/eas/sas/his>`.	38374256
`t2dp-smith24-ma`	Additive + Partitioned	353	Multiancestry cluster-weighted Smith T2D index variant PRS, and pPS from soft-clustering analyses.	38443691
`t2dp-smith24-<ancestry>`	Additive + Partitioned	25 - 490	As above but from ancestry-specific soft clustering `<eur/afr/eas/amr>`.	38443691
`t2d-mahajan22-ma`	Additive	338	Older PRS from Mahajan et al composed of multiancestry index variants.	35551307
`t2d-mahajan22-prscsx-eur`	Additive + Partitioned	>500k	Genome-wide European ancestry PRS for Mahajan (PRS-CSx).	38374256
`t2dp-udler18`	Additive + Partitioned	67	T2D pPS from first soft-clustering analysis.	30240442

Other

Flag	Phenotype	Method	Variants	Description	PMID
`cdgrs-sharp25`	Celiac Disease	HLA Interaction + Partitioned	42	Modernized Celiac disease PRS and pPS with similar model to "GRS2x", utilized for combined screening.	32790217

Features

HLA Interaction PRS (+GRS2x Update)

PRSedm features a complete algorithm for GRS which incorporate HLA interaction terms as previously published by us such as T1D-GRS2 (or just GRS2). A number of advancements have been added to improve the generation of HLA interaction, described as GRS2x.

HLA Type Estimation and LD Tiebreak

HLA alleles can be estimated by proxy (or tag) single nucleotide polymorphisms alone and predictions are output e.g. (DR3-DQ2.5/DR3-DQ2.5). Due to imperfect proxy SNPs, >2 HLA calls can be made in interaction scores such as GRS2, and a probabilistic tiebreaker algorithm using HLA reference frequencies (Klitz et al) now resolves impossible numbers of calls without excluding any samples.

Missing variant mean effect estimation (optional)

PRSedm optionally uses Hardy-Weinberg Equilibrium with a reference VCF/BCF legend (ensure you have variant frequency coded as AF, genotypes not required) to estimate the mean effect size for missing SNPs, handle missing variants, and enable static normalization.

dbSNP hg38 - TOPMED Bravo Freeze 8, or NCBI (AF field added) are recommended.
dbSNP hg19 - 1000 Genomes, Haplotype Reference Consortium, NCBI GRCh37 recommended.

Minimum and Maximum Normalization

PRSedm hardcodes static normalization of minimum and maximum potential risk contribution (no risk alleles vs all risk alleles) creating a scale of 0-1. Static normalization with estimation ensures that PRS values translate to a common relative risk scale across datasets. If variants are missing and no estimation reference is supplied, normalization is skipped automatically and a warning is emitted.

Development

Developed and maintained by Seth A. Sharp (ssharp@stanford.edu) at the Translational Genomics of Diabetes, Stanford University, with collaboration from colleagues at the University of Exeter and MGH/Broad Institute. Lu Zhang and Han Sun contributed to v1.1.0 onwards.

Citation

If you use PRSedm in your research, please cite both the software release (10.5281/zenodo.17903985) and the accompanying article (10.2337/dc25-0142).

License

This project is licensed under the MIT License (Non-Commercial).

Academic, research, and personal use are allowed.
Commercial use is prohibited without prior permission.

See the LICENSE file for full details.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
misc		misc
notebooks		notebooks
prsedm		prsedm
recipe		recipe
screening		screening
snplists		snplists
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PRSedm

Overview

Updates (v1.3.0)

New proxy feature

Installation

Dependencies

User Installation

Usage

Options

Single file per-chromosome loading

PRS Database and metadata

Research Analysis Platforms (RAP's)

List of available PRS

Type 1 Diabetes

Type 2 Diabetes

Other

Features

HLA Interaction PRS (+GRS2x Update)

HLA Type Estimation and LD Tiebreak

Missing variant mean effect estimation (optional)

Minimum and Maximum Normalization

Development

Citation

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PRSedm

Overview

Updates (v1.3.0)

New proxy feature

Installation

Dependencies

User Installation

Usage

Options

Single file per-chromosome loading

PRS Database and metadata

Research Analysis Platforms (RAP's)

List of available PRS

Type 1 Diabetes

Type 2 Diabetes

Other

Features

HLA Interaction PRS (+GRS2x Update)

HLA Type Estimation and LD Tiebreak

Missing variant mean effect estimation (optional)

Minimum and Maximum Normalization

Development

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages