% -*- mode: noweb; noweb-default-code-mode: R-mode; -*- %\VignetteIndexEntry{cloudUtil primer} %\VignetteKeywords{} %\VignetteDepends{} %\VignettePackage{cloudUtil} %documentclass[12pt, a4paper]{article} \documentclass[12pt]{article} \usepackage{amsmath,pstricks} \usepackage{hyperref} \usepackage[numbers]{natbib} \usepackage{color} \usepackage[scaled]{helvet} \renewcommand*\familydefault{\sfdefault} \definecolor{NoteGrey}{rgb}{0.96,0.97,0.98} \textwidth=6.2in \textheight=9.5in @ \parskip=.3cm \oddsidemargin=.1in \evensidemargin=.1in \headheight=-1in \newcommand{\scscst}{\scriptscriptstyle} \newcommand{\scst}{\scriptstyle} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Robject}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} \newcommand{\code}[1]{{\texttt{#1}}} \author{Christian Panse \and Ermir Qeli} \begin{document} \title{\code{cloudUtil}: Cloud Utilization Visualizations } \maketitle %\fcolorbox{black}{NoteGrey} { %\begin{minipage}{13.5cm} %\begin{center} %\textbf{ Vignette for v.0.0.1 - first try.} %\end{center} %\end{minipage} %} \tableofcontents \newpage \section{Recent changes and updates} 'vignettes' directory has been migrated. \section{Introduction} \code{cloudUtil} is a package for creating comparison plots for Cluster, Grid and Cloud utilization data. Under utilization data we understand collected accounting data measuring the job execution time in the above mentioned environments. The idea behind this package is to create a single visualization of such data that has the following main features: \begin{itemize} \item gives an overview over the compute system utilization within a certain time frame \item allows the comparison of job lengths between different platforms giving thus hints on how well the respective job queues function e.g. how efficient the queue of Sun Grid Engine is performing \item allows the integration of replicates within the same visualization \item allows the comparison on both absolute and relative timescales \end{itemize} The functionality of \code{cloudUtilPlot} function was first used in \cite{publication-2386}. \section{Data preparation} \label{lab:datapreparation} The package includes sample accounting data for demonstration purposes. These data were collected by comparing the running times of several hundred compute jobs: each one of these jobs performs peptide-spectrum matching in proteomics (data published in \cite{pmid17450130}). The fragment below shows a random extract from the dataset provided in the package: <>= library(cloudUtil) data(cloudms2) cloudms2[sort(sample(nrow(cloudms2),10)),c(1,5,6,15)] @ The attributes of interest are \texttt{CLOUD}, \texttt{BEGIN\_PREPROCESS}, \texttt{END\_PREPROCESS}, and \texttt{id}. Additionally, it is also possible to use accounting data collected from other sources e.g. Sun Grid Engine accounting data \cite{gridscheduler}. \section{Analysis} The code extract below creates a plot of the data shown in Section \ref{lab:datapreparation}: <>= hist(cloudms2$END_PREPROCESS - cloudms2$BEGIN_PREPROCESS,100) ## boxplot((cloudms2$END_PROCESS-cloudms2$BEGIN_PROCESS)/3600~cloudms2$CLOUD, main="process time", ylab="time [hours]") ## throughput<-cloudms2$MZXMLFILESIZE*10^-6/ (cloudms2$END_COPYINPUT-cloudms2$BEGIN_COPYINPUT) boxplot(throughput~cloudms2$CLOUD, main="copy input network throughput", ylab="MBytes/s") ## cloudUtilPlot(begin=cloudms2$BEGIN_PROCESS, end=cloudms2$END_PROCESS, id=cloudms2$id, group=cloudms2$CLOUD) @ Transparency through alpha blending allows furthermore to compare several plots with each other. An example is given in the code fragment below: <>= #green col.amazon<-rgb(0.1,0.8,0.1,alpha=0.2) col.amazon2<-rgb(0.1,0.8,0.1,alpha=0.2) #blue col.fgcz<-rgb(0.1,0.1,0.8,alpha=0.2) col.fgcz2<-rgb(0.1,0.1,0.5,alpha=0.2) #red col.uzh<-rgb(0.8,0.1,0.1,alpha=0.2) col.uzh2<-rgb(0.5,0.1,0.1,alpha=0.2) cm<-c(col.amazon, col.amazon2, col.fgcz, col.fgcz2, col.uzh, col.uzh2) jpeg("cloudms2Fig.jpg", 640, 640) op<-par(mfrow=c(2,1)) cloudUtilPlot(begin=cloudms2$BEGIN_PROCESS, end=cloudms2$END_PROCESS, id=cloudms2$id, group=cloudms2$CLOUD, colormap=cm, normalize=FALSE, plotConcurrent=TRUE); cloudUtilPlot(begin=cloudms2$BEGIN_PROCESS, end=cloudms2$END_PROCESS, id=cloudms2$id, group=cloudms2$CLOUD, colormap=cm, normalize=TRUE, plotConcurrent=TRUE, plotConcurrentMax=TRUE) dev.off() @ The output of the above listed R session is shown in Figure \ref{fig:cloudms2}. \begin{figure} \centering \includegraphics[width=0.75\columnwidth]{cloudms2Fig.jpg} \caption{\code{cloudUtilPlot} visualization for the \code{cloudms2} data set. \label{fig:cloudms2} On the graphics each horizontal line indicates the start and the end of one single job. Color is used for classifying the different groups. On the upper plot the time of each group was not normalized. The visualization on the bottom on the other side uses normalized time scales whichs help to compare the compute systems. Tranparent colors were used to dial with the overplotting. The solid lines on the bottom plot show the total number of concurrently running jobs. The squares on the solid lines indicate the maxima on the respective system. The user can make use of all R graphic devices.} \end{figure} \bibliography{cloudUtil}{} \bibliographystyle{plain} \end{document}