\author{Hao Wu}

\name{read.madata}
\alias{read.madata}

\title{Read Microarray data}

\description{
  This is the function to read Microarray experiment data from a TAB 
  delimited text file or matrix object. 
}

\section{Preparing data file}{
  Before using the package, user need to prepare the input data file.

  1) The data file can be a matrix type R object, such as the output of
     exprs() from array or beadarray package. It is assumed that the intensity
     is started from the first column and row name is probe ID. Otherwise,
     column number containing probe ID and intensity should be
     specified.   

  2) The data file can be a TAB delimited text file. In this file, each
  row corresponds to a gene. In the columns, you can put some gene
  specific information, e.g., the Probe ID, Gene Bank ID, etc. and the
  grid location of the spot. But most importantly you need to put the
  intensity data after that. Most of the Microarray gridding software
  generate one file for each slide. At this point, you need to manually
  combine them into the data file. You need to decide which data you
  want to use in analysis, e.g., mean versus median, background
  subtracted or not, etc. For N-dye array, your intensity data should have N
  columns for each array.  These N columns need to
  be adjacent to each other. You can put the spot flag as a column after
  intensity data for each array. (Note that if you have flag, you will have N+1
  columns data for each array.) If you have replicates, replicated
  measurements of the same probe (clone) on the same array should appear in
  adjacent rows.   

  For example, for a 2-dye cDNA array,
  you have four slides scanned by Gene Pix and you get four
  files. First you open your favorite Spread Sheet editor, e.g., MS
  Excel. Copy your probe ID and Cluster ID to the first 2 columns. Then
  open one of the files generated by Gene Pix, copy the grid location
  into next 4 columns (you only need to do this once because they are
  all the same for four slides). Then for all four files, copy the two
  columns of foreground median value (if you want to use it) and one
  column of flag to the file in the order of Cy5, Cy3, flag. Then select
  the whole file and row sort it according to probe ID. Save the file as
  tab delimited text file and you are done.

  The data file must be "full", that is, all rows have to have the same
  number of fields. When you have missing data in your datafile, you need to
  check the data or use  \code{\link[maanova]{fill.missing}} to fill in
  missing variable. 

  Sometimes leading and trailing TAB in the text file
  will bring problems, depends on the operating system. So user need to
  be careful about that.
}
\section{Preparing design file}{
  Design file can be data.frame or matrix R object or TAB delimited text
  file. Number of rows of this file equals number of arrays times N (the
  number of dyes) (plus one for column header, if design file is a TAB
  delimited file and header = T). The row of design file *MUST* be organized by
  the order of datafile unless the matchDataToDesign parameter is set to TRUE.
  For example, if the datafile stores the intensity from
  array1, array11, array2,..., then the row of designfile must follow this
  order. Number of columns of this file depends on
  the experimental design. For example, you can have "Strain", "Diet", "Sex",
  etc. in your design file. You *MUST* have a column named "Array" in the
  design file. For two-color array, in addition to the "Array" column, you
  must have "Sample" and "Dye" columns (case sensitive) in the design
  file. "Sample" should be integers representing biological
  individuals. Reference samples should have Sample number to be
  zero(0). Reference sample will always be treated as fixed factor in mixed
  model and it will not be involved in any test. 
  
  You must NOT have "Spot", "Label" and "covM" columns. They are reserved
  for spotting, labeling and covariance effects. 

  Note that you DO NOT have to use all factors in design file.
  You can put all factors in design file but turn them on/off in
  formula in \code{\link[maanova]{fitmaanova}}.
}
\section{Preparing covariate file}{
  If you have array specific covariate, it should be included in the design
  matrix. If you have gene specific covariate, you need to prepare matrix
  type R object or TAB delimited text file, "covM". The size of "covM" equals
  to the size of intensity data (and TAB delimited text file must have column
  header if header = T, but NO row name). Specify covM only if you have gene
  specific covariate variable. Covariate variable must be a numeric value and
  need to be specified in the \code{\link[maanova]{fitmaanova}}.
}
\usage{
read.madata(datafile=datafile, designfile=designfile, covM = covM,
arrayType=c("oneColor", "twoColor"),header=TRUE, spotflag=FALSE, n.rep=1, avgreps=0,
  log.trans=FALSE, metarow, metacol, row, col, probeid, intensity, matchDataToDesign=FALSE, ...)
}

\arguments{
  \item{datafile}{Matrix R object or data file name with path 
     name as a string.}
  \item{designfile}{Matrix or data.frame R object or design file name
     with path as a string.}
  \item{covM}{Gene specific covariate matrix. Specify this only if you have
    gene specific covariate matrix.}
  \item{arrayType}{Specify if it is one or two color array. Default is
  one color.}
  \item{header}{A logical value indicating when input files (data file, design
  file or covariate matrix) are TAB delimited file, whether they have column header.}
  \item{spotflag}{A flag to indicate whether the input file contains the
    flag for bad spot or not.}
  \item{n.rep}{An integer to represent the number of replicates.}
  \item{avgreps}{An integer to indicate whether to average or collapse the
    replicates or not. 0 means no average; 1 means to take the mean of the
    replicates; 2 means to take the median of the replicates.}
  \item{log.trans}{A logical value to indicate whether to take log2
    transformation on the raw data or not. It is FALSE by default.If this is TRUE, \code{TransformMethod} field will be set to "log2".} 
  \item{metarow}{For 2-dye array. The column number for meta row. Default
  values are 1s.}
  \item{metacol}{For 2-day array. The column number for meta column. Default
  values are 1s.}
  \item{row}{For 2-day array. The column number for row. Default value is NA.}
  \item{col}{For 2-day array. The column number for column. Default value is NA.}
  \item{probeid}{The column number storing probe (clone) id. When datafile is
  matrix R object, it assumes rowname of the data is probe id. If data does not
  have row name, then 1,2,... is used as a probe id. For TAB delimited file,
  if probeid is not provided, it assumes that the first column stores the
  probe id. If you do not have probe id then set probeid = 0.}
  \item{intensity}{The start column number of intensity. For the matrix R
  object, it assumes intensity starts from the first column and for TAB
  delimited file, it assumes intensity stars from the second column, as a default.}
  \item{matchDataToDesign}{Defaults to false. If set to TRUE
  then the datafile column headers (or colnames(datafile) in the case of a
  matrix) will be matched up to the design file's Array column. This allows you
  to ignore the input order of array data as long as the datafile's header
  values can be matched exactly to the designfile's Array values}
  \item{\dots}{Other gene information in the data file.}
}

\value{
  An object of class \code{madata}, which is a list of following
  components:
  \item{n.gene}{Total number of genes in the experiment.}
  \item{n.rep}{Number of replicates in the experiment, if .}
  \item{n.spot}{Number of spots for each gene.}
  \item{data}{data field. It is either the log2 transformed
    data (if log.trans=TRUE), or just the original data (if
    log.trans=FALSE).}
  \item{n.array}{Number of arrays in the experiment.}
  \item{n.dye}{Number of dyes.}
  \item{flag}{A matrix for spot flag. Each element corresponding to one
    spot. 0 means normal spot, all other values mean bad spot.}
  \item{metarow}{Meta row for each spot.}
  \item{metacol}{Meta column for each spot.}
  \item{row}{Row for each spot.}
  \item{col}{Column for each spot.}
  \item{ArrayName}{A list of strings to represent the names of intensity
    data.}
  \item{design}{An object to represent the experimental design.}
  \item{Others}{Other experiment information listed in the data file and
     specified by user.}
}

\examples{
# note that .CEL files are not distributed with the package, thus following
# code does not work. This shows how to read data from affy (or beadarray)
# package, when TAB delimited design file is ready.

\dontrun{
library(affy)
beforeRma <- ReadAffy()
rmaData <- rma(beforeRma)
datafile <- exprs(rmaData)
abf1 <- read.madata(datafile=datafile,designfile="design.txt")

# make and read designfile (data.frame type R object) from R
design.table <- data.frame(Array=row.names(pData(beforeRma)));
Strain <- rep(c('Aj', 'B6', 'B6xAJ'), each=6)
Sample <- rep(c(1:9), each=2)
designfile <- cbind(design.table, Strain, Sample)
abf1 <- read.madata(datafile, designfile=designfile)

# read in a TAB delimited file with spot flag - for two color array
# HAVE TO SPECIFY that the data is from two color array
 kidney.raw <- read.madata("kidney.txt", designfile="kidneydesign.txt", 
	metarow=1, metacol=2, col=3, row=4, probeid=6,
	intensity=7, arrayType='twoColor',log.trans=T, spotflag=T)
}}

\keyword{IO}