%\VignetteIndexEntry{An overview of yeastRNASeq} %\VignetteDepends{yeastRNASeq,ShortRead} %\VignettePackage{yeastRNASeq} \documentclass[12pt]{article} \usepackage{times} \usepackage{hyperref} \newcommand{\RPackage}[1]{\textsf{#1}} \newcommand{\RCode}[1]{\texttt{#1}} \newcommand{\RObject}[1]{\texttt{#1}} \newcommand{\RFunction}[1]{\textsf{#1}} \newcommand{\RClass}[1]{{\textsf{#1}}} \SweaveOpts{eps=FALSE, echo=TRUE} \title{An overview of yeastRNASeq} \author{James Bullard and Kasper D.\ Hansen} \date{Modified: 15 January 2010. Compiled: \today} \begin{document} \maketitle <>= options(width = 70) @ <<>>= require(yeastRNASeq) @ This package contains data from <<>>= x <- citation(package = "yeastRNASeq")[[1]] utils:::print.citation(x, bibtex = FALSE) @ which describes some experiments in \emph{S.\ cerevisiae} comparing various mutant strains to a wild-type strain. A full BibTex entry can be obtained by <>= citation("yeastRNASeq") @ The subset of the data which this package contains is more specifically data from a wild-type and a single mutant yeast. For each condition (mutant, wild-type) there is two lanes worth of data, each lane containing a sample of 500,000 raw (unaligned) reads from each of 2 lanes each. Each of the four lanes have been aligned against the yeast genome using Bowtie. The raw reads are contained in 4 FASTQ files and the Bowtie alignment are contained in 4 Bowtie output files. There are 500,000 reads in each of the FASTQ files and fewer reads in each of the Bowtie files. The filenames are <<>>= list.files(file.path(system.file(package = "yeastRNASeq"), "reads")) @ The reads were aligned to the yeast genome obtained from \url{ftp://ftpmips.gsf.de/yeast/sequences} (which was the basis for the Bowtie index available at the Bowtie website at the time of alignment). These files are ready to be parsed by the tools in the \RPackage{ShortRead} package. As an example we read the alignment files by <<>>= require(ShortRead) files <- list.files(file.path(system.file(package = "yeastRNASeq"), "reads"), pattern = "bowtie", full.names = TRUE) names(files) <- gsub("\\.bowtie.*", "", basename(files)) names(files) aligned <- lapply(files, readAligned, type = "Bowtie") @ The constructed object \RObject{aligned} is a list with 4 elements. Each element correspond to a lane and is an object of class \RClass{AlignedRead}. The output from this operation has already been stored as an R object and is accessible by <<>>= data(yeastAligned) yeastAligned[["mut_1_f"]] @ The percent of aligned reads is <<>>= round(sapply(aligned, length) / 500000, 2) @ There are two additional objects available in the package, purely for illustrative purposes (do not use them for analysis). The object \RObject{yeastAnno} is annotation obtained from Ensembl using biomaRt and is a \RClass{data.frame} of annotation: <<>>= data(yeastAnno) dim(yeastAnno) head(yeastAnno, n = 2) table(yeastAnno$gene_biotype) @ The other object is called \RObject{geneLevelData} and is a \RClass{matrix} of counts per gene. <<>>= data(geneLevelData) head(geneLevelData, n = 2) @ Such a matrix may be constructed from \RObject{yeastAligned} and \RObject{yeastAnno} using either the functionality in the \RPackage{IRanges} and \RPackage{ShortRead} packages or by using the functionality of the \RPackage{Genominator} package (which also contains a vignette describing a simplified differential analysis of this dataset). Note that the data does not contain any biological replicates. In the original publication this was addressed by also analyzing a set of tiling microarrays. \end{document}