nps/FileFormats.md at master · sgchun/nps

Input files for NPS

To run NPS, you need the following set of files:

GWAS summary statistics. This is a tab-delimited text file format with the following seven required columns:
- chr: chromosome name starting with "chr." Currently, NPS expects only chromosomes 1-22. Chromosome names should be designated by "chr1", ..., "chr22".
- pos: base position of SNP.
- ref and alt: reference and alternative alleles of SNP, respectively.
- reffreq: allele frequency of reference allele in the discovery GWAS cohort.
- pval: p-value of association.
- effalt: estimated per-allele effect size of the alternative allele. For case/control GWAS, log(Odds Ratio) should be used. NPS will convert effalt to effect sizes relative to the standardized genotype internally using reffreq.
```
chr	pos	ref	alt	reffreq	pval	effalt
chr1	11008	C	G	0.9041	0.1126	-0.0251
chr1	11012	C	G	0.9041	0.1126	-0.0251
chr1	13116	T	G	0.8307	0.615	0.0071
chr1	13118	A	G	0.8307	0.615	0.0071
chr1	14464	A	T	0.8386	0.476	-0.0105
...
```
Training genotype files. Training genotype files should be in the qctool dosage format and named as "chromN.DatasetID.dosage.gz" for each chromosome. Genotype files in bgen format can be converted to the dosage files by running qctool with the -ofiletype dosage option. NPS allows only biallelic variants.
Training sample file (.fam). Sample information of training cohort should be provided in PLINK FAM format. The samples in the .fam file should appear in the exactly same order as in the genotype dosage files. The sex of sample (5-th column) is optional ("0" or "-9" for missing; "1" for male; "2" for female). If the sex is provided, NPS will incorporate the sex covariate in the PRS model. The 6-th column is for phenotype data and can be specified here or in a separeate phenotype file.
```
trainF2  trainI2  0  0  1 -9
trainF3  trainI3  0  0  2 -9
trainF39 trainI39 0  0  1 -9
trainF41 trainI41 0  0  2 -9
trainF58 trainI58 0  0  1 -9
```
Training phenotype file (.phen). Phenotypes of the .fam file can be overridden by a .phen file (use nps_init.R --train-phen option). This is a tab-delimited file with three columns: "FID", "IID", and "Outcome". FID and IID correspond to the family and individual IDs in the .fam file. The name of phenotype should be "Outcome". Binary phenotypes (case/control) are specified by "1" and "2", respectively; "0" and "-9" denote missing phenotype. For quantitative phenotypes, "-9" represents a missing phenotype value.
```
FID   IID    Outcome
trainF2  trainI2  1
trainF39 trainI39 1
trainF3  trainI3  2
trainF41 trainI41 2
trainF58 trainI58 1
```
Validation genotype files. Validation genotypes can be in the dosage or .bgen format. If they are in .bgen format, the files should be named as "chromN.DatasetID.bgen".
Validation sample file (.fam). Similar to the training .fam file.
Training phenotype file (.phen). Similar to training .phen file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input files for NPS

FilesExpand file tree

FileFormats.md

Latest commit

History

FileFormats.md

File metadata and controls

Input files for NPS