%\VignetteIndexEntry{FANTOM3and4CAGE} %\VignetteKeywords{FANTOM3and4CAGE} %\VignettePackage{FANTOM3and4CAGE} \documentclass[12pt]{article} \usepackage{amsmath} \usepackage{hyperref} \usepackage[authoryear,round]{natbib} \textwidth=6.2in \textheight=8.5in \oddsidemargin=.1in \evensidemargin=.1in \headheight=-.3in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rcode}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \author{Vanja Haberle \footnote{Department of Biology, University of Bergen, Bergen, Norway}} \title{\textsf{FANTOM3and4CAGE: an R data package with CAGE data from FANTOM3 and FANTOM4 projects}} \date{February 6, 2013} \begin{document} <>= options(width = 60) olocale=Sys.setlocale(locale="C") @ \maketitle \tableofcontents %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This document briefly describes the content of the \Rpackage{FANTOM3and4CAGE} data package. \Rpackage{FANTOM3and4CAGE} is a Bioconductor-compliant R package that contains Cap Analysis of Gene Expression (CAGE) sequencing data produced by FANTOM consortium (\href{http://fantom.gsc.riken.jp/}{http://fantom.gsc.riken.jp/}). CAGE (\cite{Kodzius:2006}) is a high-throughput method for transcriptome analysis that utilizes "cap-trapping" (\cite{Carninci:1996}), a technique based on the biotinylation of the 7-methylguanosine cap of Pol II transcripts, to pulldown the 5'-complete cDNAs reversely transcribed from the captured transcripts. This enables the sequencing of short fragments from 5' ends, which can be mapped back to the referent genome to infer the exact position of the transcription start sites (TSSs) used for transcription of captured RNAs. Number of CAGE tags supporting each TSS gives the information on relative frequency of its usage and can be used as a measure of expression from that specific TSS. Thus, CAGE provides information on two aspects of capped transcriptome: genome-wide 1bp-resolution map of transcription start sites and transcript expression levels. This information can be used for various analyses, from 5' centered expression profiling (\cite{Takahashi:2012}) to studying promoter architecture (\cite{Carninci:2006}). This data package contains genomic coordinates of TSSs and number of CAGE tags supporting each TSS in various mouse and human samples analysed by CAGE in FANTOM3 and FANTOM4 projects. The data was originally published in main FANTOM publications (\cite{Carninci:2005}, \cite{Carninci:2006}, \cite{Suzuki:2009}, \cite{Faulkner:2009}), and represents a valuable resource of genome-wide TSSs for mouse and human. All of the data was downloaded from the FANTOM web resource (\cite{Kawaji:2010}, \href{http://fantom.gsc.riken.jp/4/download/}{http://fantom.gsc.riken.jp/4/download/}) and was organized into datasets by organism and tissue of origin. All human data is mapped to hg18 assembly, and mouse data to mm9 assembly of the genome. Figure~\ref{fig:FANTOM3and4CAGEpackageStructure} schematically describes the organization and the structure of the data in the \Rpackage{FANTOM3and4CAGE} package. \begin{figure}[h!] \centering \includegraphics[width = 105mm]{images/FANTOM3and4CAGEpackageStructure.pdf} \caption{Content and structure of data in \Rpackage{FANTOM3and4CAGE} data package} \label{fig:FANTOM3and4CAGEpackageStructure} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Getting started} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% To load the \Rpackage{FANTOM3and4CAGE} package into your R envirnoment type: <<>>= library(FANTOM3and4CAGE) @ \subsection{Listing available CAGE samples} As shown in Figure~\ref{fig:FANTOM3and4CAGEpackageStructure}, there are six datasets (shaded in blue) that can be loaded via call to \Rcode{data()} function. Two of them are \Rcode{data.frames} that describe the content of the remaining four datasets. These are \Rcode{FANTOMhumanSamples} and \Rcode{FANTOMmouseSamples}, for human and mouse, respectively. \newline To load the list of human samples type: <<>>= data(FANTOMhumanSamples) head(FANTOMhumanSamples, 10) @ The information is organized into three columns: \begin{itemize} \item \Rcode{dataset}: the name of the dataset that can be loaded using \Rcode{data()} function \item \Rcode{group}: the name of the group of samples that originate from the same tissue (\emph{e.g.} blood) \item \Rcode{sample}: the name of the specific sample \end{itemize} \subsection{CAGE datasets for various tissues} The \Rcode{FANTOMtissueCAGEhuman} and \Rcode{FANTOMtissueCAGEmouse} datasets contain CAGE data organized by tissue of origin: <<>>= data(FANTOMtissueCAGEhuman) names(FANTOMtissueCAGEhuman) @ It is a named list, where names correspond to entries in the \Rcode{group} column (in the \Rcode{data.frame} listing all the samples) and indicate tissue of origin. Each element of the list is a \Rcode{data.frame} with genomic coordinates of TSSs detected in that group of samples followed by columns with numbers of CAGE tags supporting each TSS in every individual sample. The names of columns correspond to entries in the \Rcode{sample} column (in the \Rcode{data.frame} listing all the samples) and describe individual samples. <<>>= lung_group <- FANTOMtissueCAGEhuman[["lung"]] head(lung_group) @ \subsection{CAGE timecourse datasets} In addition to CAGE data for various tissue types, there are timecourse datasets available in \Rpackage{FANTOM3and4CAGE} package. These are \Rcode{FANTOMtimecourseCAGEhuman} and \Rcode{FANTOMtimecourseCAGEmouse}, for human and mouse, respectively. <<>>= data(FANTOMtimecourseCAGEmouse) names(FANTOMtimecourseCAGEmouse) head(FANTOMtimecourseCAGEmouse[["adipogenic_induction"]]) @ They are organized in the same way as tissue datasets described above, \emph{i.e.} each element of the list is a \Rcode{data.frame} with CAGE detected TSSs for one timecourse. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Session Info} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% <<>>= sessionInfo() @ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \bibliographystyle{apalike} \bibliography{FANTOM3and4CAGE} \end{document}