\documentclass{article} %\VignetteIndexEntry{AffyRNADegradation Example} \usepackage{amsmath} \usepackage{amscd} \usepackage[tableposition=top]{caption} \usepackage{ifthen} \usepackage[utf8]{inputenc} \usepackage{enumerate} \usepackage{hyperref} \usepackage[authoryear,round]{natbib} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \begin{document} \title{The AffyRNADegradation Package} \author{Mario Fasold} \maketitle Affymetrix 3' expression arrays employ a specific experimental protocol and a specific probe design that allows assessment of RNA integrity based on probe signal data. Problems of RNA integrity are primarily governed to the degradation of the target transcripts. We have shown in \cite{Fasold2012a} that %\begin{enumerate} \begin{enumerate}[(i)]%for capital roman numbers. \item degradation leads to a probe positional bias that needs to be corrected in order to compare expression of samples with varying degree of degradation, and \item it is possible to estimate a robust and accurate measure of RNA integrity from the probe signals that, for example, can be used to study degradation within the large number of available microarray data. \end{enumerate} The rationale and further analysis are described in the accompanying publication by Fasold and Binder. We here show how to utilize this package for both problems. \section{Basic RNA Degradation Analysis} We here show how to use the package for the analysis of RNA degradation. Let us first load exemplar data provided by the \Rpackage{AmpAffyExample} package into the environment. <<>>= library(AffyRNADegradation) library(AmpAffyExample) data(AmpData) AmpData @ Every transcript is measured by a set of 11-16 probes. The log-average intensity difference between probes located closer to the 3' end of the target transcripts and those located further away constitutes the probe positional bias. It can be visualized using the {\it tongs plot}. <>= tongs <- GetTongs(AmpData, chip.idx = 4) PlotTongs(tongs) @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{The tongs plot shows that the intensity difference between 3' and 5' probes increases with $\Sigma=\langle \log I \rangle$. $\langle \rangle$ here denotes either averaging over all probes within the probeset, or averaging over the 3' or 5' subset of probes in $\Sigma_{subset}$.} \label{fig:tongs} \end{figure} Figure~\ref{fig:tongs} shows that the bias relates to the expression level of the transcripts. As this can vary from sample to sample, it must be considered in estimating of RNA degradation. The function \Rfunction{RNADegradation} performs the basal analysis of RNA degradation based on raw probe intensities stored in an AffyBatch object. The result is an \Robject{AffyDegradationBatch} object that contains the corrected probe intensities as well as several statistical parameters. <>= rna.deg <- RNADegradation(AmpData, location.type = "index") @ We can visualize the probe positional bias using the \Rfunction{PlotDx} function. <>= plotDx(rna.deg) @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Probe degradation plot. The points show the average probe intensity of expressed genes for each index $x=1,..11$ relative to the average intensity at position $x=1$. The lines are a fitted decay function.} \label{fig:one} \end{figure} Figure~\ref{fig:one} shows the results. Different degradation between different samples are observed. To access the parameter $d$, which provides a robust, sample-wise measure for the degree of RNA degradation, one can use the function <>= d(rna.deg) @ \section{Using Absolute Probe Locations} Instead of using the probe index within the probeset as argument of the degradation degree, one can use the actual probe locations within the transcript. We have pre-computed the distance of each probe to the 3' end of its target transcript for all Affymetrix 3' expression arrays. These probe location files are available under the URL \url{http://www.izbi.uni-leipzig.de/downloads_links/programs/rna_integrity.php}. In order to perform the analysis and correction using absolute probe locations, one must first download the probe location file for the used chip type. You can then start the analysis using \Rfunction{RNADegradation}, as above, but selecting \texttt{absolute} as \texttt{location.type}. The parameter \texttt{location.file.dir} must specify the download directory of the probe location file. % <>= % # do not run as additional file needed % rna.deg = RNADegradation(AmpData, location.type = "absolute", location.file.dir = "[SOME_DIR]") % @ \section{Creating Custom Probe Location Files} It is possible to use custom probe locations, for example if one wishes to analyze custom built microarrays or if one relies on alternative probe annotations. For this, one has to create a probe location file similar to the pre-built ones used in the previous section. Here is how to generate such a file. First, create a data frame with the name \texttt{probeDists} containing the five columns \texttt{Probe.Set.Name}, \texttt{Probe.X}, \texttt{Probe.Y}, \texttt{Probe.Distance} and \texttt{Target.Length}. \texttt{Probe.Set.Name} is of class character and contains the Affymetrix probe set id. The remaining variables are of class integer. \texttt{Probe.X} and \texttt{Probe.Y} denote the coordinates of the probe on the microarray. These are important because the coordinates are used to map the probes to the AffyBatch object using the \Rfunction{xy2indices} function from the \Rpackage{affy} package. This implies that the ordering of the table rows can be of any kind. It is however important that this information can be mapped to every probe pair in the AffyBatch intensity array (as for example shown in the \Rfunction{affy::pm} function). \texttt{Probe.Distance} contains the probe location: the number of nucleotides counted between the designated 3’-end of the transcript and the first (i.e. nearest) base of the 25meric probe sequence. The last column \texttt{Target.Length} contains the length of the target in base bairs - it is not used in this package and can be set to any value. The following table shows an example of the \texttt{probeDists} data frame: % latex table generated in R 2.15.0 by xtable 1.7-0 package % Wed Sep 26 15:05:00 2012 \begin{table}[ht] \begin{center} \begin{tabular}{rlrrrr} \hline & Probe.Set.Name & Probe.X & Probe.Y & Probe.Distance & Target.Length \\ \hline 1 & 1007\_s\_at & 467 & 181 & 608 & 3938 \\ 2 & 1007\_s\_at & 531 & 299 & 495 & 3938 \\ 3 & 1007\_s\_at & 86 & 557 & 426 & 3938 \\ ... & ... & ... & ... & ... & .. \\ \hline \end{tabular} \end{center} \end{table} This table is then stored in an R binary object file. The filename must be set to the chip type identifier as given by the \Rfunction{affy::cdfName} function with the file ending \texttt{.Rd}: <>= filename = paste(cdfName(AmpData), ".Rd", sep="") save(probeDists, file = filename) @ To use the custom probe locations, start the analysis using \Rfunction{RNADegradation}, as above with \texttt{location.type=absolute} and \texttt{location.file.dir} set to the directory containing the custom probe location file. \section{Correction of the Bias and Integration into the Microarray Calibration Pipeline} The correction of the probe positional bias is performed within the \Rfunction{Affy\-RNA\-Degradation} function. The result is a new \Robject{AffyBatch} object with corrected probe level intensities. It can be accessed using the \Rfunction{afbatch} function <>= afbatch(rna.deg) @ It is possible to replace the original raw data with this data corrected for probe positional bias, before performing further microarray normalization and summarization (e.g. using RMA). Alternatively, the correction can be performed after probe-level normalization. The following example shows how to first apply the VSN normalization method, then correct for probe positional bias to finally get summarized expression measures <>= library(vsn) affydata.vsn <- do.call(affy:::normalize, c(alist(AmpData, "vsn"), NULL)) affydata.vsn <- afbatch(RNADegradation(affydata.vsn)) expr <- computeExprSet(affydata.vsn, summary.method="medianpolish", pmcorrect.method="pmonly") @ \section{Citing AffyRNADegradation} Please cite \citep{Fasold2012b} when using the package. \section{Details} This document was written using: <
>= sessionInfo() @ \bibliographystyle{plainnat} \bibliography{AffyRNADegradation} \end{document}