\name{snp.imputation} \alias{snp.imputation} \title{Calculate imputation rules} \description{ Given two set of SNPs typed in the same subjects, this function calculates rules which can be used to impute one set from the other in a subsequent sample. } \usage{ snp.imputation(X, Y, pos.X, pos.Y, phase=FALSE, try=50, stopping=c(0.95, 4, 0.05), use.hap=c(0.95, 0.1), em.cntrl=c(50,0.01), minA=5) } \arguments{ \item{X}{An object of class \code{"snpMatrix"} or \code{"X.snp.matrix"} containing observations of the SNPs to be used for imputation ("regressor SNPs")} \item{Y}{An object of same class as \code{X} containing observations of the SNPs to be imputed in a future sample ("target SNPs")} \item{pos.X}{The positions of the regressor SNPs} \item{pos.Y}{The positions of the target SNPs} \item{phase}{See "Details" below} \item{try}{The number of potential regressor SNPs to be considered in the stepwise regression procedure around each target SNP . The nearest \code{try} regressor SNPs to each target SNP will be considered} \item{stopping}{Parameters of the stopping rule for the stepwise regression (see below)} \item{use.hap}{Parameters to control use of the haplotype imputation method (see below)} \item{em.cntrl}{Parameters to control test for convergence of EM algorithm for fitting phased haplotypes (see below)} \item{minA}{A minimum data quantity measure for estimating pairwise linkage disequilibrium (see below)} } \details{ The routine first carries out a series of step-wise regression analyses in which each Y SNP is regressed on the nearest \code{try} regressor (X) SNPs. If \code{phase} is \code{TRUE}, the regressions will be calculated at the chromosome (haplotype) level, variances being simply \eqn{p(1-p)} and covariances estimated using the same algorithm used in \code{\link{ld.snp}} (this option is not yet implemented). Otherwise, the analysis is carried out at the genotype level based on conventional variance and covariance estimates using the \code{"pairwise.complete.obs"} missing value treatment (see \code{\link{cov}}). New SNPs are added to the regression until either (a) the value of \eqn{R^2} exceeds the first parameter of \code{stopping}, (b) the number of "tag" SNPs has reached the maximum set in the second parameter of \code{stopping}, or (c) the change in \eqn{R^2} does not achieve the target set by the third parameter of \code{stopping}. If the third parameter of \code{stopping} is \code{NA}, this last test is replaced by a test for improvement in the Akaike information criterion (AIC). If the prediction as measure by \eqn{R^2}, has not achieved a threshold (the first parameter of \code{use.hap}) using more than one tag SNP, then a second imputation method is tried. Phased haplotype frequencies are estimated for the Y SNP plus the tag SNPs. The \eqn{R^2} for prediction of the Y SNP using these haplotype frequencies is then calculated. If the \eqn{(1-R^2)} is reduced by a proportion exceeding the second parameter of \code{use.hap}, then the haplotype imputation rule is saved in preference to the faster regression rule. The argument \code{em.cntrl} controls convergence testing for the EM algorithm for fitting haplotype frequencies. The first parameter is the maximum number of iterations, and the second parameter is the threshold for the change in log likelihood below which the iteration is judged to have converged. All SNPs selected for imputation must have sufficient data for estimating pairwise linkage disequilibrium with each other and with the target SNP. The statistic chosen is based on the four-fold tables of two-locus haplotype frequencies. If the frequencies in such a table are labelled \eqn{a, b, c} and \eqn{d} then, if \eqn{ad>bc} then \eqn{t = min(a,d)} and, otherwise, \eqn{t = min(b,c)}. The cell frequencies \eqn{t} must exceed \code{minA} for all pairwise comparisons. } \value{ An object of class \code{"snp.reg.imputation"}. } \references{ Chapman J.M., Cooper J.D., Todd J.A. and Clayton D.G. (2003) \emph{Human Heredity}, \bold{56}:18-31. } \note{The \code{phase=TRUE} option is not yet implemented} \author{David Clayton \email{david.clayton@cimr.cam.ac.uk}} \seealso{\code{\link{snp.reg.imputation-class}}, \code{\link{ld.snp}}, \code{\link{imputation.maf}}, \code{\link{imputation.r2}}} \examples{ # Remove 5 SNPs from a datset and derive imputation rules for them library(snpMatrix) data(for.exercise) sel <- c(20, 1000, 2000, 3000, 5000) to.impute <- snps.10[,sel] impute.from <- snps.10[,-sel] pos.to <- snp.support$position[sel] pos.fr <- snp.support$position[-sel] imp <- snp.imputation(impute.from, to.impute, pos.fr, pos.to) } \keyword{models} \keyword{regression}