Genetic Ancestry Prediction (GAP)

Overview

GAP is a Python package, developed at the Translational Genomics Lab lead by Dr. Anna Gloyn at Stanford University, for predicting genetic ancestry from genotyping data using machine learning techniques. It provides tools for data preprocessing, model training, and evaluation to facilitate accurate ancestry inference. It shows superior performance compared to existing methods in ADMIXTURE and KING package when benchmarking against self-reported races from Integrated Islet Distribution Program (IIDP) and The Human Pancreas Analysis Program (HPAP).

GAP has been supported by both IIDP and the Stanford Accelerate Innovation in Diabetes LeVeraging Unique PAthways iN Asians (ADVANCE) Program.

Pipeline

Installation

using conda

git clone git@github.com:HaniceSun/gap.git
cd gap
conda env create -f environment.yml
conda activate gap

Quick Start

input_vcf='INPUT.vcf.gz'

gap get-reference-data --output_dir=data (only needed the first time)

gap merge-dataset-with-reference --dataset $input_vcf
gap feature-engineering

gap add-labels
gap split-train-test --test_size 0.2

gap train-model --task Superpopulation
gap train-model --task Population --conditional true

gap eval-model --task Superpopulation
gap eval-model --task Population --conditional true

gap predict --task Superpopulation
gap predict --task Population --conditional true

gap summarize --conditional true

Benchmark Results

Superpopulation Prediction vs Self-reported Race in the IIDP cohorts

Citation

If you use GAP in your research, please cite the DOI: 10.5281/zenodo.18157870

Author and License

Author: Han Sun

Email: hansun@stanford.edu

License: MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
assets		assets
src/gap		src/gap
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genetic Ancestry Prediction (GAP)

Overview

Pipeline

Installation

Quick Start

Benchmark Results

Citation

Author and License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Genetic Ancestry Prediction (GAP)

Overview

Pipeline

Installation

Quick Start

Benchmark Results

Citation

Author and License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages