\name{roast} \alias{roast} \alias{mroast} \alias{Roast-class} \alias{show,Roast-method} \title{Rotation Gene Set Tests} \description{ Rotation gene set testing for linear models. } \usage{ roast(iset=NULL, y, design, contrast=ncol(design), set.statistic="mean", gene.weights=NULL, array.weights=NULL, block=NULL, correlation, var.prior=NULL, df.prior=NULL, nrot=999) mroast(iset=NULL, y, design, contrast=ncol(design), set.statistic="mean", gene.weights=NULL, array.weights=NULL, block=NULL, correlation, var.prior=NULL, df.prior=NULL, nrot=999, adjust.method="BH") } \arguments{ \item{iset}{vector specifying the rows of \code{y} in the test set. This can be a vector of indices, or a logical vector of the same length as \code{statistics}, or any vector such as \code{y[iset,]} contains the values for the gene set to be tested. For \code{mroast}, a list of vectors of indices.} \item{y}{numeric matrix giving log-expression or log-ratio values. If \code{var.prior} or \code{df.prior} are null, then \code{y} should contain values for all genes on the arrays. If both prior parameters are given, then only \code{y} values for the test set are required.} \item{design}{design matrix} \item{contrast}{contrast for which the test is required. Can be an integer specifying a column of \code{design}, or else a contrast vector of length equal to the number of columns of \code{design}.} \item{set.statistic}{summary set statistic. Possibilities are \code{"mean"},\code{"floormean"},\code{"mean50"} or \code{"msq"}.} \item{gene.weights}{optional numeric vector of weights for genes in the set. Can be positive or negative.} \item{array.weights}{optional numeric vector of array weights.} \item{block}{optional vector of blocks.} \item{correlation}{correlation between blocks.} \item{var.prior}{prior value for residual variances. If not provided, this is estimated from all the data using \code{squeezeVar}.} \item{df.prior}{prior degrees of freedom for residual variances. If not provided, this is estimated using \code{squeezeVar}.} \item{nrot}{number of rotations used to estimate the p-values.} \item{adjust.method}{method used to adjust the p-values for multiple testing. See \code{\link{p.adjust}} for possible values.} } \value{ An object of class \code{"Roast"}. This consists of a list with the following components: \tabular{ll}{ \code{p.value}:\tab data.frame with columns \code{Active.Prop} and \code{P.Value}, giving the proportion of genes in the set contributing meaningfully to significance and estimated p-values, respectively. Rows correspond to the alternative hypotheses mixed, up or down.\cr \code{var.prior}:\tab prior value for residual variances.\cr \code{df.prior}:\tab prior degrees of freedom for residual variances. } There is a \code{show} method for \code{"Roast"} objects. } \details{ This function tests whether any of the genes in the set are differentially expressed. The function can be used for any microarray experiment which can be represented by a linear model. The design matrix for the experiment is specified as for the \code{\link{lmFit}} function, and the contrast of interest is specified as for the \code{\link{contrasts.fit}} function. This allows users to focus on differential expression for any coefficient or contrast in a linear model. If \code{contrast} is not specified, the last coefficient in the linear model will be tested. The arguments \code{array.weights}, \code{block} and \code{correlation} have the same meaning as they for for the \code{\link{lmFit}} function. The arguments \code{df.prior} and \code{var.prior} have the same meaning as in the output of the \code{\link{eBayes}} function. If these arguments are not supplied, they are estimated exactly as is done by \code{eBayes}. The argument \code{gene.weights} allows directions or weights to be set for individual genes in the set. The gene set statistics \code{"mean"}, \code{"floormean"}, \code{"mean50"} and \code{msq} are defined by Wu and Smyth (2010). The different gene set statistics have different sensitivities to small number of genes. If \code{set.statistic="mean"} then the set will be statistically significantly only when the majority of the genes are differentially expressed. \code{"floormean"} and \code{"mean50"} will detect as few as 25\% differentially expressed. \code{"msq"} is sensitive to even smaller proportions of differentially expressed genes, if the effects are reasonably large. The output gives p-values three possible alternative hypotheses, \code{"Up"} to test whether the genes in the set tend to be up-regulated, with positive t-statistics, \code{"Down"} to test whether the genes in the set tend to be down-regulated, with negative t-statistics, and \code{"Mixed"} to test whether the genes in the set tend to be differentially expressed, without regard for direction. \code{roast} estimates p-values by simulation, specifically by random rotations of the orthogonalized residuals, so p-values will vary slightly from run to run. To get more precise p-values, increase the number of rotations \code{nrot}. The strategy of random rotations is due to Langsrud (2005). Following Monte Carlo hypothesis testing theory (Barnard, 1963), the p-value is computed as \code{(b+1)/(nrot+1)} where \code{b} is the number of rotations giving a more extreme statistic than that observed. This means that the smallest possible p-value is \code{1/(nrot+1)}. \code{mroast} does roast tests for multiple sets, including adjustment for multiple testing. } \seealso{ \code{roast} performs a \emph{self-contained} test in the sense defined by Goeman and Buhlmann (2007). For a \emph{competitive} gene set test, see \code{\link{wilcoxGST}}. For a competitive gene set enrichment analysis using a database of gene sets, see \code{\link{romer}. } An overview of tests in limma is given in \link{08.Tests}. } \author{Gordon Smyth and Di Wu} \references{ Barnard, GA (1963). Discussion of The spectral analysis of point processes (by MS Bartlett). \emph{Journal of the Royal Statistical Society} B 25, 294. Goeman, JJ, and Buhlmann, P 2007. Analyzing gene expression data in terms of gene sets: methodological issues. \emph{Bioinformatics} 23, 980-987. Langsrud, O. (2005). Rotation tests. \emph{Statistics and Computing} 15, 53-60. Wu, D, and Smyth, GK (2010). ROAST: rotation gene set tests for complex microarray experiments. Submitted. } \examples{ y <- matrix(rnorm(100*4),100,4) design <- cbind(Intercept=1,Group=c(0,0,1,1)) iset <- 1:5 y[iset,3:4] <- y[iset,3:4]+3 roast(iset,y,design,contrast=2) iset2 <- 6:10 mroast(list(set1=iset,set2=iset2),y,design,contrast=2) # Alternative approach useful if multiple gene sets are tested: fit <- lmFit(y,design) sv <- squeezeVar(fit$sigma^2,df=fit$df.residual) iset1 <- 1:5 iset2 <- 6:10 roast(y=y[iset1,],design=design,contrast=2,var.prior=sv$var.prior,df.prior=sv$var.prior) roast(y=y[iset2,],design=design,contrast=2,var.prior=sv$var.prior,df.prior=sv$var.prior) } \keyword{htest}