%\VignetteIndexEntry{CodelinkSet} %\VignetteKeywords{Preprocessing, Codelink} %\VignetteDepends{codelink} %\VignettePackage{codelink} \documentclass{article} \usepackage[latin1]{inputenc} \usepackage[english]{babel} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \begin{document} \title{CodelinkSet} \author{Diego Diez} \maketitle \section{Introduction} The \Robject{CodelinkSet} is an extension of the \Robject{ExpressionSet} class, that improves de user experience in terms of integration with other Bioconductor packages. Because this class is derived from \Robject{ExpressionSet}, most functions that work on \Robject{ExpressionSet} will work in \Robject{CodelinkSet} objects. Futhermore, extension of existing methods and functions is easier. \section{Loading of data} There is a new function called \Rfunction{readCodelinkSet} that will load the data in the old format and convert it into the \Robject{CodelinkSet} format. This function will replace in the future \Rfunction{readCodelink}. It is possible to include \Robject{phenoData} and \Robject{featureData} when calling \Rfunction{readCodelinkSet}. In addition, the correct annotation package is guessed from the first file and assigned to the \Robject{annotation} slot. Instead of giving the list of files to read, it is possible to give it the location of a targets file. This will be preferred and the information in the targets file will be used to fill de phenoData slot. <>= cset <- readCodelinkSet("targets.txt") @ \subsection{Feature data} The \Robject{CodelinkSet} stores intensity data in the exprs slot, much the same as Affymetrix derived data. It also contains a slot background, to accomodate background intensities. The featureData object will contain further information like ProbeName, ProbeType, meanSNR (computed when loading data), Row and Col locations in the chip, etc. The FeatureID is used to name rows, although the ProbeName is the useful information when accesing annotation data. \subsection{Annotations} The \Robject{CodelinkSet} class supports by default new style (\Rpackage{AnnotationDbi} based) annotation packages, i.e. all annotations will contain the .db subfix. To change to the old style you can assign the annotation package by hand: <>= library(codelink) data(codelink.exprset) annotation(codelink.exprset) annotation(codelink.exprset) <- "rwgcod" annotation(codelink.exprset) @ NOTE: Old style annotation packages will be deprecated for BioC 2.3 \section{The User Interface} The user interface has changed to be more consistent and easier to use. All user method for \Robject{CodelinkSet} object have the prefix "cod" followed by the corresponding method name. \subsection{Data accession} \Rfunction{getInt} \Rfunction{getBkd} \Rfunction{getSNR} \Rfunction{featureNames} \Rfunction{probeNames} \Rfunction{sampleNames} \subsection{Preprocessing} \Rfunction{codCorrect} is used for background correction and \Rfunction{codNormalize} for normalization. A convenience function called \Rfunction{codPreprocess} is also available to combine in one step background correction and normalization. The devault values found in the older functions are preserved, i.e. \emph{half} for \Rfunction{codCorrect} and \emph{quantile} \Rfunction{codNormalize} <>= cset <- codPreprocess(codelink.exprset) @ \subsection{Plots} \Rfunction{codPlotMA}, \Rfunction{codPlotDensities}, \Rfunction{codPlotCorrelation} and \Rfunction{codPlotImage} are the corresponding new functions. A convinience funcion \Rfunction{codPlot} is provided and the type of plot is choose with the parameter \emph{what}. <>= codPlot(cset, what = "density") codPlot(cset, what = "ma") codPlot(cset, what = "image") @ \section{Analysis with limma} Analysis using \Rpackage{limma} is straightforward. The design matrix can be specified using the \Robject{phenoData} information. In the fit step, the \Robject{CodelinkSet} object can be passed directly to \Rfunction{lmFit}. Making life easier for users of the Codelink platform. <>= design <- model.matrix(~ -1 + factor(cset$Treatment)) fit <- lmFit(cset, design) @ \section{Exporting data to a file} You may want to export your normalized data to use it with other analysis tools. The function \Rfunction{writeCodelink} allows to do that. By default the index, probe names, accession number, entrez gene identifiers, intensity and SNR values are output. If flag = TRUE, then the flags are also output. <>= writeCodelink(codelink.exprset, file = "intensities.txt") @ \section{Creating \Robject{CodelinkSetUnique} objects} \Robject{CodelinkSet} objects use the feature id of each probe to obtain unique identifiers. This feature id is related to the physical location of the probe in the array. This means that when we use \Rfunction{featureNames} in a \Robject{CodelinkSet} object we get these feature ids. To get the probe ids we need to use \Rfunction{probeNames} instead. This can be inconvenient when we want to use our data with other packages that use \Rfunction{featureNames} to obtain the probe names and feed this ids to the annotation package. Because the annotation packages use probe names, the probes wouldn't be found. To solve this problem a \Robject{CodelinkSet} with unique probe names as \Rfunction{featureNames} is needed. To construct such an object we can use the function \Rfunction{averageProbes}, that takes a \Robject{CodelinkSet} object and computes the \Rfunction{mean} of the intensities for replicated probes. The results are stored in a \Robject{CodelinkSetUnique} object. <>= foo.avg <- averageProbes(codelink.exprset) @ This computation takes a lot of time. It is possible to use a parallelized version (using the package \Rpackage{multicore}) by making 'parallel=TRUE' in \Rfunction{averageProbes}. See the help for \Rfunction{averageProbes} for more details on this approach. The package \Rpackage{multicore} needs to be loaded in advance by typing \Robject{library(multicore)}. \section{Session Info} <>= sessionInfo() @ \end{document}