% \VignetteIndexEntry{ A Short Introduction to the gpuR Package} % \VignettePackage{gpuR} % \VignetteEngine{knitr::knitr} % To compile this document % library('knitr'); rm(list=ls()); knit('gpuR.Rnw') \documentclass[12pt]{article} \usepackage[sc]{mathpazo} \usepackage[T1]{fontenc} \usepackage{geometry} \geometry{verbose,tmargin=2.5cm,bmargin=2.5cm,lmargin=2.5cm,rmargin=2.5cm} \setcounter{secnumdepth}{2} \setcounter{tocdepth}{2} \usepackage{url} \usepackage[unicode=true,pdfusetitle, bookmarks=true,bookmarksnumbered=true,bookmarksopen=true,bookmarksopenlevel=2, breaklinks=false,pdfborder={0 0 1},backref=false,colorlinks=false] {hyperref} \hypersetup{ pdfstartview={XYZ null null 1}} \newcommand{\pkg}[1]{{\fontseries{b}\selectfont #1}} \renewcommand{\pkg}[1]{{\textsf{#1}}} \newcommand{\Rpackage}[1]{\textsl{#1}} \newcommand\CRANpkg[1]{% {\href{http://cran.fhcrc.org/web/packages/#1/index.html}% {\Rpackage{#1}}}} \newcommand\Githubpkg[1]{\GithubSplit#1\relax} \def\GithubSplit#1/#2\relax{{\href{https://github.com/#1/#2}% {\Rpackage{#2}}}} \newcommand{\Rcode}[1]{\texttt{#1}} \newcommand{\Rfunction}[1]{\Rcode{#1}} \newcommand{\Robject}[1]{\Rcode{#1}} \newcommand{\Rclass}[1]{\textit{#1}} \begin{document} <>= library(knitr) opts_chunk$set( concordance=TRUE ) @ \title{A Short Introduction to the \Rpackage{gpuR} Package} \author{Dr. Charles Determan Jr. PhD\footnote{cdetermanjr@gmail.com}} \newpage \maketitle \section{Introduction} GPUs (Graphic Processing Units) were originally developed to perform graphic rendering and commonly referred to in the comuting gaming world. These devices are also able to be applied to numerical operations in parallel. Although there are a few different vendors, the two primary competitors are AMD and NVIDIA. NVIDIA GPUs depend upon the proprietary CUDA framework whereas AMD GPUs utilize the open source OpenCL framework. The upside of OpenCL is that 'kernels' are able to used on any GPU whereas CUDA kernels can only be used on NVIDIA GPUs. It is worth noting, however, that CUDA tends to edge out OpenCL performance likely a result of the highly specific framework to the NVIDIA GPUs. That said, programming either framework is often difficult for programmers unaccustomed to working with such a low-level interface. Creating bindings for high-level programming languages (such as R) make using GPUs much more accessible to a broader audience. Several R packages have been developed including \CRANpkg{gputools}, \CRANpkg{cudaBayesreg}, \CRANpkg{HiPLARM}, \CRANpkg{HiPLARb}, and \CRANpkg{gmatrix}. However, all of the packages depend upon a CUDA backend and therefore restrict the using to only using NVIDIA GPUs. The novelty of this package is the use of the ViennaCL library (http://viennacl.sourceforge.net/) which has been coveniently been repackaged in the \Githubpkg{cdeterman/RViennaCL} package to be used in other R packages. This allows the user to leverage the auto-tuned OpenCL kernels of the ViennaCL library on any GPU. It also allows for a CUDA backend for those who in fact have a NVIDIA GPU for improved performance, which will be provided in a companion \Githubpkg{cdeterman/gpuRcuda} package. Of the aforementioned packages, most contain a very limited set of functions available to the R user within the packages. The most extensive being the \Rpackage{gmatrix} package which contains most linear algebra operations. All of the packages (with the exception of \Rpackage{gmatrix}) don't store the data on the GPU. As such, there is the overhead of transferring data back and forth between the device and host. Similar to \Rpackage{gmatrix}, this package utilizes S4 classes to store an external pointer to the data on the GPU which mirror the base \Rcode{matrix} and \Rcode{vector} classes. However, given the interactive nature of R programming and the limited RAM available on GPU's this package provides intermediate classes that remove the object from GPU RAM to allow objects to be stored on the CPU but still utilize the GPU as needed. \maketitle \section{Install} Install \Rpackage{gpuR} using <>= # Stable version install.packages("gpuR") @ If a user is interested in using the current development version they should install directly from my github account. <>= # Dev version devtools::install_github("cdeterman/gpuR", ref = "develop") # Note this may require install of the RViennaCL from my github # if updates have been made #devtools::install_github("cdeterman/RViennaCL") @ \maketitle \section{Getting Help} If you find an error/bug while using this package I would like to know about it. Likewise, if there is a method you think would be useful for others feel free to request it's addition (or provide a contribution yourself to be added). Please submit the all such reports and requests on my github \href{https://github.com/cdeterman/gpuR/issues}{Issues}. \newpage \maketitle \section{Basic Use with \Rclass{gpuMatrix}} \pkg{gpuR} has most basic linear algebra operations. The user simply needs to create a \Rclass{gpuMatrix} object and the GPU methods will be used. Here is a minimal example demonstrating typical matrix multiplication. <>= library("gpuR") # verify you have valid GPUs detectGPUs() # create gpuMatrix and multiply set.seed(123) gpuA <- gpuMatrix(rnorm(16), nrow=4, ncol=4) gpuB <- gpuA %*% gpuA @ Most linear algebra methods have been created to be executed for the \Rclass{gpuMatrix} and \Rclass{gpuVector} objects. These methods include basic Arithmetic functions \Rcode{\%*\%}, \Rcode{+}, \Rcode{-}, \Rcode{*}, \Rcode{/}, \Rcode{t}, \Rcode{crossprod}, \Rcode{tcrossprod}, \Rcode{colMeans}, \Rcode{colSums}, \Rcode{rowMeans}, and \Rcode{rowSums}. Math functions include \Rcode{sin}, \Rcode{asin}, \Rcode{sinh}, \Rcode{cos}, \Rcode{acos}, \Rcode{cosh}, \Rcode{tan}, \Rcode{atan}, \Rcode{tanh}, \Rcode{exp}, \Rcode{log}, \Rcode{log10}, \Rcode{exp}, \Rcode{abs}, \Rcode{max}, and \Rcode{min}. Additional operations include some linear algebra routines such as \Rcode{cov} (Pearson Covariance) and \Rcode{eigen}. A few 'distance' routines have also been added with the \Rcode{dist} and \Rcode{distance} (for pairwise) functions. These currently include 'Euclidean' and 'SqEuclidean' methods. The objects may also be created specifying data type including int, float, and double. Float type was included to provide a smaller memory footprint and also increased throughput if the increased accuracy of double is not required. Both the \Rclass{gpuMatrix} and \Rclass{vclMatrix} objects return a pointer to the data. Given that working with GPU's implies that you are working with larger datasets, this prevents R from making unneccessary copies. However, this does require the user to exercise caution as any change made to a 'copy' (e.g. \Rcode{gpuB <- gpuA}) will result in changes to the original object and all others pointing to it as well. \maketitle \section{\Rclass{vclMatrix} Class} The \Rclass{vclMatrix} class was created to allow the user to put data directly on the GPU once and not need to continually push data back and forth between the host and device. Therefore, if multiple processes are to be applied to a given matrix, there will be significant savings by using \Rclass{vclMatrix} objects. It is important to remember though, different GPU's have different amounts of RAM. The interactive nature of R often has many objects existing simultaenously where you may exceed your GPU's RAM. As such, the \Rclass{gpuMatrix} class is provided. \maketitle \section{'block' \& 'slice' object} Sometimes it is useful to only refer to a subset of a matrix and apply some operations. This has been accomplished by providing the \Rcode{block} and \Rcode{slice} methods for matrix and vector classes respectively. The resulting object is a child class of the parent object (e.g. \Rcode{gpuMatrixBlock} from \Rcode{gpuMatrix}). As such, nearly all methods defined for the parent objects are also defined for the child class. <>= # create gpuMatrix set.seed(123) gpuA <- gpuMatrix(rnorm(16), nrow=4, ncol=4) # create block omitting the 1st row gpuB <- block(gpuA, rowStart = 2L, rowEnd = 4L, colStart = 1L, colEnd = 4L) @ The resulting object is a reference to the parent objects elements to avoid memory duplication. As such, any changes to the 'block' or 'slice' object will alter the parent elements. If the user wishes to make these objects distinct, they should use the \Rcode{deepcopy} function. \maketitle \section{GPU Memory Management} This package has been written to utilize the capabilities provided by \CRANpkg{Rcpp}. As such, the pointers referenced in the objects herein are handled by the 'XPtr' object. Once the object is deleted in the R session \textbf{and the garbage collector has run} the GPU memory will be freed. It is important for the user to respect the rate at which the R garbage collector is called. It has been shown that when some of these functions are called repeatedly in a loop the GPU processing goes too rapidly for the garbage collector to be called to release the memory resulting in the GPU becoming filled. Explicitly calling \Rcode{gc()} within loops following \Rcode{rm} of the temporary object should alleviate problems such as these. \newpage \maketitle \section{Contributing Back} Given the broad use for this package (i.e. many different GPUs) it would be useful if you could report success cases. A curated list of tested platforms would be valuable for future users. Please see report your Operating System, OpenCL Platform (can be seen with \Rcode{platformInfo}), and your GPU(s) also on my gitter \href{https://gitter.im/cdeterman/gpuR/Tested_GPUs}{Tested GPUs} page. \section{Crowd-source Testing} As with the above statements, there a many architecture possibilities that this package 'should' run on. That said, it is impossible for me to have access to all the different hardware. As such, I am hoping that users can continually test this package as well. \newline \newline 1. Test current release The simplest approach is to test the current CRAN version. Simply download the tar file from CRAN archive for \CRANpkg{gpuR}. Then you can just run: \newline \newline \Rcode{R CMD check gpuR\_version}. \newline \newline \textbf{NOTE} - you will likely receive 3 NOTES (checking repository dependencies, package size, and pandoc). You also will likely see a WARNING referring to the Date field being old. You can safely ignore these results. Any error should be reported to my github \href{https://github.com/cdeterman/gpuR/issues}{Issues}. Please include the OS, OpenCL Platform, \& GPU(s). Ideally you can at least report which test is failing if returned by \Rcode{R CMD check}. \newline \newline 2. Test development version. This requires a little bit more effort on the users part. First, you need to to clone my development branch from my git repository. \newline \newline \Rcode{git clone -b develop https://github.com/cdeterman/gpuR.git} \newline \newline Then you can enter the gpuR directory and run \Rcode{devtools::test()}. \newline \newline Ideally, users will try to at least diagnose at least which function the error is originating (or sequence of functions). It is important that if the user cannot solve the bug that it is reproducible so that I (or others) are able to try to approach the problem. \end{document}