%\VignetteIndexEntry{The TargetSearchData Package} %\VignetteKeywords{preprocess, GC-MS, analysis, CDF data} %\VignettePackage{TargetSearchData} \documentclass[a4paper]{article} \usepackage[dvipsnames]{xcolor} \usepackage{setspace} \usepackage{fancyvrb} \usepackage{hyperref} \usepackage{listings} \usepackage{bera} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage[sf,bf,compact]{titlesec} \usepackage{natbib} % definitions from BiocStyle \definecolor{functioncolor}{RGB}{240, 112, 32} \newcommand\Biocpkg[1]{{\href{https://bioconductor.org/packages/#1}{\textsl{#1}}}} \newcommand{\Rfunction}[1]{\texttt{\textcolor{functioncolor}{#1}}} \newcommand{\code}[1]{\colorbox{gray!10}{\texttt{\textcolor{black!50!red}{#1}}}} % -------------------------------------------------------------------------- % page layout \voffset=-15mm \textwidth=159mm \textheight=232mm \oddsidemargin=0in \evensidemargin=0in \headheight=14pt \footskip=15mm \setlength\parindent{0pt} \setlength{\parskip}{1.2ex} \onehalfspacing % -------------------------------------------------------------------------- % \usepackage{Sweave} % disable Sweave \newenvironment{Schunk}{}{} \DefineVerbatimEnvironment{File}{Verbatim}{% fontsize=\small, formatcom=\color{Black!80}, frame=single} % style for listing \lstdefinestyle{rstylein}{% backgroundcolor=\color{gray!10}, basicstyle=\small\sl\ttfamily, frame=single,framerule=0pt % give a bit of margin } \lstdefinestyle{rstyleout}{% backgroundcolor=\color{gray!10}, basicstyle=\small\ttfamily\color{black!80}, frame=single,framerule=0pt % give a bit of margin } \lstnewenvironment{Sinput}{\lstset{style=rstylein}}{} \lstnewenvironment{Soutput}{\lstset{style=rstyleout}}{} % -------------------------------------------------------------------------- \title{\sf The TargetSearchData Package} \author{\sf Álvaro Cuadros-Inostroza} \date{\sf\today} <>= require(TargetSearchData) options(width=80) @ \begin{document} \maketitle \begin{abstract}\sf This package contains exemplary files for the package \Biocpkg{TargetSearch}. It includes raw NetCDF files from an {\em E. coli} salt stress experiment, extracted peak list for each CDF file, a sample description file, a metabolite reference library, and a retention index marker definition file. \end{abstract} \section{Package Contents} The \Biocpkg{TargetSearchData} package contains as subset of metabolite data collected from a {\em E. coli} salt stress experiment. This was part of a stress response study described in \cite{Jozefczuk2010}. The samples were measured in an Agilent 6890 series GC system with a 7683 series autosampler injector coupled to a Leco Pegasus 2 time-of-flight mass spectrometer. For further details, please refer to said publication. In addition, this package \textbf{only provides} a couple of simple {\em helper} functions; these are intended to be used by \Biocpkg{TargetSearch} in its examples. This is discussed in the following section. The following subsections describe each type of file in detail. \subsection{Sample File} Tab-delimite text file of sample metadata. This provides a list of raw \code{CDF} files, their respective measurement day (MD), and the time point of the salt stress experiment. Samples with the same time point correspond to biological replicates. Note that the time point is given in arbitrary units, more precisely, 1 is the first sampling time, 3 is the third one, and so on. Note that the measurement day corresponds with the first four digits of the chromatogram file name. \begin{File}[label=samples.txt] CDF_FILE MEASUREMENT_DAY TIME_POINT 7235eg08.cdf 7235 1 7235eg11.cdf 7235 1 7235eg26.cdf 7235 1 7235eg04.cdf 7235 3 7235eg30.cdf 7235 3 ... \end{File} \subsection{CDF files} The \texttt{CDF} files, also known as \texttt{NetCDF}, are a trimmed down version of the original, {\em baseline corrected}, files that were experted by {\em LECO Pegasus} in the context of the {\em E. coli} salt stress experiment. The baseline correction was performed by {\em LECO} using default parameters. For information on the experiment, please refer to \cite{Jozefczuk2010}. In particular, the retention time was bounded between 200 and 400 seconds, while the mass-over-charge ratio ($m/z$) between 85 and 320 daltons. This was done in order to reduce the package size. The file names follow a systematic nomenclature: \code{ydddaann.cdf}, where \code{y} is the last digit of the year (starting from 2000), \code{ddd} is the day of the year (from 1 to 365), \code{aa} is an arbitrary code (originally this was connected to a specific mass spectrometer), and \code{nn} is the measurement order of the sample in the specified year and day. Together, the part \code{yddd} correspond to the measurement day. \subsection{RI files} For each \code{CDF} file there is a corresponding retention time corrected peak list file, the so-called \code{RI} files. These are tab-delimited text files containing the retention time, retention index, and spectra. Each spectrum is a list of \emph{m/z} and intensities separated by colons (:). The text file format is the original format used in early \Biocpkg{TargetSearch} version. There is also a binary format which is used by default and is designed for fast reading. These files are not part of this package, however (see \Biocpkg{TargetSearch} documentation). The file name convention is to prefix each \code{CDF} file with \code{RI\_} and change the file extension from \code{cdf} to \code{txt} (or \code{dat} for the binary format). \subsection{Library File} This is tab-delimited text file with the list of metabolite targets (library) to search in the chromatograms. Each row is a metabolite, while columns are the metabolite name, retention index (RI), time deviation (\code{Win\_1}), selective masses (\code{SEL\_MASS}), and the metabolite spectrum. The time deviation is specified in the first column (\code{Win\_1}) and represents the first search window. See \Biocpkg{TargetSearch} vignette for details. \begin{File}[label=library.txt] Name RI Win_1 SEL_MASS SPECTRUM Pyruvic acid 222767 4000 89;115;158;174;189 85:7 86:14 87:7 88:5 ... Glycine (2TMS) 228554 4000 86;102;147;176;204 86:26 87:19 88:8 89:4 ... Valine 271500 2000 100;144;156;218;246 85:8 86:14 87:6 88:5 ... Glycerol (3TMS) 292183 2000 103;117;205;293 85:14 86:2 87:16 88:13 ... Leucine 306800 1500 102;158;232;260 158:999 159:148 160:45 ... ... \end{File} \subsection{Retention Time Correction File} This is tab-delimited text file of retention time markers. The markers are fatty acid methyl esthers (FAME) which elute evenly distributed on the retention time dimension and have a characteristic \emph{m/z} value. \begin{File}[label=rimLimits.txt] LowerLimit UpperLimit RIstandard 230 280 262320 290 340 323120 350 400 381020 \end{File} A fourth column called \emph{Mass} can be specified if the \emph{m/z} value is \textbf{not} the default of 87. See \Rfunction{ImportFameSettings} documentation from \Biocpkg{TargetSearch}. \section{Helper functions} Simple functions are provided to return the absolute file path of the files contained in \Biocpkg{TargetSearchData}. These function are used frequently in \Biocpkg{TargetSearch} examples as a shorthand to function calls such as \Rfunction{find.package()} and \Rfunction{file.path()}. The functions also perform some basic checks internally. All these functions have a \Rfunction{tsd\_} prefix. The following shows a brief description and examples of these functions. The function \Rfunction{tsd\_path()} is a function to locate \Biocpkg{TargetSearchData} installation path, where the example files are found. It's built on \Rfunction{find.package()}. <>== path <- tsd_path() path @ The function \Rfunction{tsd\_data\_path()} returns the path where the example files are. They are in a subdirectory called \code{gc-ms-data}, which can be passed as argument, though this is not needed. <>== path <- tsd_data_path() dir(path) @ The function \Rfunction{tsd\_file\_path()} returns of path of one or more files contained in the directory \code{gc-ms-data}. It takes a character vector as argument. If the file does not exist, then it raises an error. <>== tsd_file_path(c('samples.txt','library.txt')) @ Finally, the functions \Rfunction{tsd\_cdffiles} and \Rfunction{tsd\_rifiles} return the list of \code{CDF} and \code{RI} files contained in the data path respectively. <>== cdf <- tsd_cdffiles() ri <- tsd_rifiles() @ \section{Session Info} Output of \Rfunction{sessionInfo()} on the system on which this document was compiled. <>= sessionInfo() @ \bibliographystyle{unsrtnat} \bibliography{biblio} \end{document} % vim: set ts=2 sw=2 et textwidth=80: