\name{getLikelihoods}
\alias{getLikelihoods}
\alias{getLikelihoods.Dirichlet}
\alias{getLikelihoods.NB}
\alias{getLikelihoods.NBboot}
\alias{getLikelihoods.Pois}
%- Also NEED an '\alias' for EACH other topic documented here.
\title{Finds posterior likelihoods for each count as belonging to some
  hypothesis.}
\description{
  These functions calculate posterior probabilities for each of the
  'counts' in the countDP object belonging to each of the groups
  specified. The choice of function depends on the prior belief about
  the underlying distribution of the data. It is essential that the
  method used for calculating priors matches the method used for
  calculating the posterior probabilites.

  For a comparison of the methods, see Hardcastle & Kelly, 2009.
}
\usage{
getLikelihoods(cD, prs, pET = "BIC", subset = NULL,
priorSubset = NULL, verbose = TRUE, ..., cl)
getLikelihoods.Dirichlet(cD, prs, pET = "BIC", subset = NULL,
priorSubset = NULL, verbose = TRUE, cl)
getLikelihoods.Pois(cD, prs, pET = "BIC", subset = NULL,
priorSubset = NULL, distpriors = FALSE, verbose = TRUE, cl)
getLikelihoods.NB(cD, prs, pET = "BIC", subset = NULL,
priorSubset = NULL, bootStraps = 1, conv = 1e-4, nullData = FALSE,
returnAll = FALSE, verbose = TRUE, cl)
}
%- maybe also 'usage' for other objects documented here.
\arguments{
  \item{cD}{An object of type \code{\link{countData}}, or descending
    from this class.}
  \item{prs}{(Initial) prior probabilities for each of the groups in the
    'countDP' object. Should sum to 1, unless nullData is TRUE, in which
    case it should sum to less than 1.}
  \item{pET}{What type of prior re-estimation should be
    attempted? Defaults to "BIC"; "none" and "iteratively" are also
    available.}
  \item{subset}{Numeric vector giving the subset of counts for which
    posterior likelihoods should be estimated.}
  \item{priorSubset}{Numeric vector giving the subset of counts which may be
    used to estimate prior probabilities on each of the groups. See Details.}
  \item{distpriors}{Should the Poisson method use an empirically derived
    distribution on the prior parameters of the Poisson distribution, or
    use the mean of the maximum likelihood estimates (default).}
  \item{bootStraps}{How many iterations of bootstrapping should be used
    in the (re)estimation of priors in the negative binomial method.}
  \item{conv}{If not null, bootstrapping iterations will cease if the
    mean squared difference between posterior likelihoods of consecutive
    bootstraps drops below this value.}
  \item{nullData}{If TRUE, looks for segments or counts with no true
    expression. See Details.}
  \item{returnAll}{If TRUE, and bootStraps > 1 instead of returning a
    single countData object, the function returns a list of countData
    objects; one for each bootstrap. Largely used for debugging
    purposes.}
  \item{verbose}{Should status messages be displayed? Defaults to TRUE.}
  \item{cl}{A SNOW cluster object.}
  \item{...}{Any additional information to be passed by the
    \code{'getLikelihoods'} wrapper function to the individual functions
    which calculate the likelihoods.}
}

\details{
  These functions estimate, under the assumption of various
  distributions, the (log) posterior likelihoods that each count belongs to a
  group defined by the \code{@group} slot of the \code{countData}
  object. The posterior likelihoods are stored on the natural log scale
  in the \code{@posteriors} slot of the \code{\link{countData}}
  object generated by this function. This is because the posterior
  likelihoods are calculated in this form, and ordering of the counts is
  better done on these log-likelihoods than on the likelihoods.
  
  If \code{'pET = "none"'} then no attempt is made to re-estimate the
  prior likelihoods given in the \code{'prs'} variable. However, if
  \code{'pET = "BIC"'}, then the function will attempt to estimate the
  prior likelihoods by using the Bayesian Information Criterion to
  identify the proportion of the data best explained by each 
  model and taking these proportions as prior. Alternatively, an
  iterative re-estimation of priors is possible (\code{'pET = "iteratively"'}),
  in which an inital estimate for the prior likelihoods of the models is
  used to calculated the posteriors and then the priors are updated by
  taking the mean of the posterior likelihoods for each model across all
  data. This often works well, particularly if the 'BIC' method is used
  (see Hardcastle & Kelly 2010 for details). However, if the data are
  sufficiently non-independent, this approach may substantially
  mis-estimate the true priors. If it is possible to select a
  representative subset of the data by setting the variable
  \code{'subsetPriors'} that is sufficiently independent, then better
  estimates may be acquired. 
  
  The Dirichlet and Poisson methods produce almost identical
  results in simulation. The Negative Binomial method produces results
  with much lower false discovery rates, but takes considerably longer
  to run.
  
  Filtering the data may be extremely advantageous in reducing run
  time. This can be done by passing a numeric vector to 'subset'
  defining a subset of the data for which posterior likelihoods are
  required.

  If 'nullData = TRUE', the algorithm attempts to find those counts or
  segments that have no true expression in all samples. This means that
  there is another, implied group where all samples are equal. The prior
  likelihoods given in the 'prs' object must thus sum to less than 1,
  with the residual going to this group.
  
  See Hardcastle & Kelly (2010) for a full comparison of the methods.

  A 'cluster' object is strongly recommended in order to parallelise
  the estimation of posterior likelihoods, particularly for the
  negative binomial method. However, passing NULL to the \code{cl}
  variable will allow the functions to run in non-parallel mode.

  The \code{'getLikelihoods'} function will infer the correct
  distribution to use from the information stored in the
  \code{'@priors'} slot of the \code{\link{countData}} object
  \code{'sD'} and call the appropriate function.
}
\value{
  A \code{\link{countData}} object.
}
\references{Hardcastle T.J., and Kelly, K (2010). Identifying Patterns
  of Differential Expression in Count Data. In submission.}
\author{Thomas J. Hardcastle}


\seealso{\code{\link{countData}}, \code{\link{getPriors}},
  \code{\link{topCounts}}, \code{\link{getTPs}}}
\examples{

library(baySeq)

# See vignette for more examples.

# Create a {countData} object and estimate priors for the
# Poisson methods.
data(simCount)
data(libsizes)
replicates <- c(1,1,1,1,1,2,2,2,2,2)
groups <- list(c(1,1,1,1,1,1,1,1,1,1), c(1,1,1,1,1,2,2,2,2,2))
CD <- new("countData", data = simCount, replicates = replicates, libsizes = libsizes, groups = groups)
CDP.Poi <- getPriors.Pois(CD, samplesize = 20,
takemean = TRUE, cl = NULL)

# Get likelihoods for data with Poisson method
CDPost.Poi <- getLikelihoods.Pois(CDP.Poi, prs = c(0.5, 0.5),
pET = "BIC", cl = NULL)


# Alternatively, get priors for negative binomial method
CDP.NB <- getPriors.NB(CD, samplesize = 10^5, estimation = "QL", cl = NULL)

# Get likelihoods for data with negative binomial method with bootstrapping

CDPost.NB <- getLikelihoods.NBboot(CDP.NB, prs = c(0.5, 0.5),
pET = "BIC", bootStraps = 1, cl = NULL)

# Alternatively, if we have the 'snow' package installed we
# can parallelise the functions. This will usually (not always) offer
# significant performance gain.

cl <- NULL
try(library(snow))
try(cl <- makeCluster(4, "SOCK"))

CDP.NB <- getPriors.NB(CD, samplesize = 10^5, estimation = "QL", cl = cl)
CDPost.NB <- getLikelihoods.NB(CDP.NB, prs = c(0.5, 0.5), 
pET = "BIC", cl = cl)

}

\keyword{distribution}
\keyword{models}