\name{summarizeByAnnotation}
\alias{summarizeByAnnotation}

\title{
  Summarize data based on genome annotation.
}

\description{
  This function creates a summarization of columns of the data using specified SQLite functions,
  applying these summarization function to regions defined in an annotation data frame.
}

\usage{
summarizeByAnnotation(expData, annoData,
  what = getColnames(expData, all = FALSE), fxs = c("TOTAL"),
  groupBy = NULL, splitBy = NULL, ignoreStrand = FALSE, bindAnno = FALSE,
  preserveColnames = TRUE, verbose = getOption("verbose"))
}

\arguments{
  \item{expData}{
    An object of class \code{ExpData}.
  }
  \item{annoData}{
    A data frame which must contain the columns \code{chr}, \code{start}, \code{end} and \code{strand} which specifies
    annotation regions of interest.  
  }
  \item{what}{
    Vector of names of data columns to be summarized.
  }
  \item{fxs}{
    Vector of strings giving the names of SQLite functions to call on the data column(s).
  }
  \item{groupBy}{
    Character vector refering to a column in \code{annoData}.  Regions
    will be aggregated over distinct values of this column.  Setting this
    argument will set \code{bindAnno} to \code{TRUE}.  If \code{splitBy}
    is set, \code{meta.id} will override. 
  }
  \item{splitBy}{
    String indicating column of \code{annoData} object on which to split results.
  }
  \item{ignoreStrand}{
    Logical indicating whether strand should be taken into account in aggregation.  If \code{TRUE} strand will be ignored.
  }
  \item{bindAnno}{
    Logical indicating whether annotation information should be included in the output.
  }
  \item{preserveColnames}{
    Logical indicating whether column names should be preserved.  Only possible when a single function is being applied.
  }
  \item{verbose}{
    Logical indicating whether details should be printed.
  }
}

\details{
  Most of the computation is done using SQLite. Depending on the use
  case, this approach may be significantly faster and use much less
  memory than the alternative: use \code{splitByAnnotation} to retrieve
  a list with all the data and then use R to summarize over each element
  of the list.  It is (naturally) constrained to the use of operations
  expressible in (SQLite) SQL.

  If \code{meta.id} is set to a column in \code{annoData}, all regions
  with the same value of the \code{meta.id} will be joined together; a
  standard use case is labelleing exons of a gene.

}

\value{
  If \code{splitBy} is not specified, returns a data frame containing
  results of aggregation functions performed on each region 
  defined in \code{annoData}.  If \code{splitBy} is specified, returns a
  list of data frames with one entry for each unique value of the  
  column which was split on.
}

\references{ The SQLite website
  \url{http://www.sqlite.org/lang_aggfunc.html} has details on what
  mathematical functions are implemented.  }

\author{
  James Bullard \email{bullard@berkeley.edu}, Kasper Daniel
  Hansen \email{khansen@jhsph.edu}
}
\seealso{
  See \code{Genominator} vignette for more information, as well as the \code{\link{ExpData-class}}.
}
\examples{
ed <- ExpData(system.file(package = "Genominator", "sample.db"),
              tablename = "raw")
data("yeastAnno")
summarizeByAnnotation(ed, yeastAnno[1:50,])
}
\keyword{manip}