\name{diseq} \alias{diseq} \alias{diseq.ci} \alias{diseq.inner} \title{Estimate or Compute Confidence Interval for the Single-Marker Disequilibrium} \description{ Estimate or compute confidence interval for single-marker disequilibrium. } \usage{ diseq.ci(object, marker, R = 1000, conf = 0.95, correct = TRUE, na.rm = TRUE, ...) diseq.inner(object, marker, ...) } \arguments{ \item{object}{ geneSet object} \item{marker}{ marker names } \item{R}{ Number of bootstrap iterations to use when computing the confidence interval. Defaults to 1000.} \item{conf}{ Confidence level to use when computing the confidence level for D-hat. Defaults to 0.95, should be in (0,1).} \item{correct}{See details.} \item{na.rm}{ logical. Should missing values be removed?} \item{\dots}{ optional additional parameters passed} } \details{ For a single-gene marker, \code{diseq} computes the Hardy-Weinberg (dis)equilibrium statistic D, D', r (the correlation coefficient), and \eqn{r^2} for each pair of allele values, as well as an overall summary value for each measure across all alleles. \code{print.diseq} displays the contents of a \code{diseq} object. \code{diseq.ci} computes a bootstrap confidence interval for this estimate. For consistency, I have applied the standard definitions for D, D', and r from the Linkage Disequilibrium case, replacing all marker probabilities with the appropriate allele probabilities. Thus, for each allele pair, \itemize{ \item{D}{ is defined as the half of the raw difference in frequency between the observed number of heterozygotes and the expected number: \deqn{% D = \frac{1}{2} ( p_{ij} + p_{ji} ) - p_i p_j % }{% D = 1/2 * ( p(ij) + p(ji) ) - p(i)*p(j) % } } \item{D'}{ rescales D to span the range [-1,1] \deqn{D' = \frac{D}{D_{max} } }{D' = D / Dmax} where, if D > 0: \deqn{% D_{max} = \min{ p_i p_j, p_j p_i } = p_i p_j % }{% Dmax = min(p(i)p(j), p(j)p(i)) = p(i)p(j) % } or if D < 0: \deqn{% D_{max} = \min{ p_i (1 - p_j), p_j (1 - p_i) } % }{% Dmax = min( p(i) * (1 - p(j)), p(j)( 1 - (1-p(i) ) ) ) } } \item{r}{ is the correlation coefficient between two alleles, %ignoring all other alleles, and can be computed by \deqn{% r = \frac{-D}{\sqrt( p_i * (1-p_i) p(j) (1-p_j ) )} % }{% r = -D / sqrt( p(i)*(1-p(i)) * p(j)*(1-p(j)) ) % } } } where \itemize{ \item{-}{ \eqn{p_i}{p(i)} defined as the observed probability of allele 'i', } \item{-}{\eqn{p_j}{p(j)} defined as the observed probability of allele 'j', and } \item{-}{\eqn{p_{ij}}{p(ij)} defined as the observed probability of the allele pair 'ij'. } } When there are more than two alleles, the summary values for these statistics are obtained by computing a weighted average of the absolute value of each allele pair, where the weight is determined by the expected frequency. For example: \deqn{% D_{overall} = \sum_{i \ne j} |D_{ij}| * p_{ij} % }{% D.overall = sum |D(ij)| * p(ij) % } Bootstrapping is used to generate confidence interval in order to avoid reliance on parametric assumptions, which will not hold for alleles with low frequencies (e.g. \eqn{D'} following a a Chi-square distribution). See the function \code{HWE} from "genetics" package for testing Hardy-Weinberg Equilibrium, \eqn{D=0}. } \author{Gregory R. Warnes \email{warnes@bst.rochester.edu} and Nitin Jain \email{nitin.jain@pfizer.com}} \examples{ } \keyword{ misc}