Skip to content
brakitsch edited this page Mar 16, 2015 · 37 revisions

Preprocessing

Before getting started, we have to compute the sample-to-sample genetic covariance matrix, assign the markers to windows and estimate the trait-to-trait covariance matrices on the null model.

Computing the Covariance Matrix

The covariance matrix can be pre-computed as follows:

 ./mtSet_preprocess --compute_covariance --plink_path plink_path  --bfile bfile  --cfile cfile

where

  • plink_path (default: plink) is a pointer to the plink software (Version 1.9 or greater must be installed). If not set, a python covariance reader is employed. We strongly recommend using the plink reader for large datasets.
  • bfile is the base name of of the binary bed file (bfile.bed,bfile.bim,bfile.fam are required).
  • cfile is the base name of the output file. The relatedness matrix will be written to cfile.cov while the identifiers of the individuals are written to the file cfile.cov.id. The eigenvalue decomposition of the matrix is saved in the files cfile.cov.eval (eigenvalues) and cfile.cov.evec (eigenvectors). If cfile is not specified, the files will be exported to the current directory with the following filenames bfile.cov, bfile.cov.id, bfile.cov.eval, bfile.cov.evec.

Precomputing the Principal Components

The principal components can be pre-computed as follows:

 ./mtSet_preprocess --compute_PCs k --plink_path plink_path --ffile ffile  --bfile bfile

where

  • k is the number of top principal components that are saved
  • plink_path (default: plink) is a pointer to the plink software (Version 1.9 or greater must be installed). If not set, a python genotype reader is employed. We strongly recommend using the plink reader for large datasets.
  • ffile is the name of the fixed effects file, to which the principal components are written to.
  • bfile is the base name of of the binary bed file (bfile.bed,bfile.bim,bfile.fam are required).

Fitting the null model

To efficiently apply mtSet, it is neccessary to compute the null model beforehand. This can be done with the following command:

 ./mtSet_preprocess --fit_null --bfile bfile --cfile cfile --nfile nfile --pfile pfile --ffile ffile --trait_idx trait_idx

where

  • bfile is the base name of of the binary bed file (bfile.bed,bfile.bim,bfile.fam are required).
  • cfile is the base name of the covariance file and its eigen decomposition (cfile.cov, cfile.cov.eval and cfile.cov.evec). If cfile is not set, the relatedness component is omitted from the model.
  • nfile is the base name of the output file. The estimated parameters are saved in nfile.p0, the negative log likelihood ratio in nfile.nll0, the trait-to-trait genetic covariance matrix in nfile.cg0 and the trait-to-trait residual covariance matrix in nfile.cn0.
  • pfile is the base name of the phenotype file.
  • ffile is the name of the file containing the covariates. Each covariate is saved in one column
  • trait_idx can be used to specify a subset of the phenotypes. If more than one phenotype is selected, the phenotypes have to be seperated by commas. For instance --trait_idx 3,4 selects the phenotypes saved in the forth and fifth column (indexing starts with zero).

Notice that phenotypes are standardized prior to model fitting.

Precomputing the windows

For applying our set test, the markers have to be assigned to windows. We provide a method that splits the genome in windows of fixed sizes:

./mtSet_preprocess --precompute_windows --bfile bfile --wfile wfile --window_size window_size --plot_windows 

where

  • bfile is the base name of of the binary bed file (bfile.bim is required).
  • window_size is the size of the window (in basepairs). The default value is 30kb.
  • wfile is the base name of the output file. If not specified, the file is saved as bfile.window_size.wnd in the current folder. Each window is stored in one line having the following format: index, chromosome, start position, stop position, index of startposition and number of SNPs.
  • plot_windows if the flag is set, a histogram over the number of markers within a window is generated and saved as wfile.pdf.

Merging the preprocessing steps

Here, we provided the commands to execute the three preprocessing operations individually. However, it is also possible to combine all steps in a single command:

./mtSet_preprocess --compute_covariance --fit_null --precompute_windows ...

Clone this wiki locally