% \VignetteIndexEntry{ A Short Introduction to the gpuR Package}
% \VignettePackage{gpuR}
% \VignetteEngine{knitr::knitr}

% To compile this document
% library('knitr'); rm(list=ls()); knit('gpuR.Rnw')

\documentclass[12pt]{article}
\usepackage[sc]{mathpazo}
\usepackage[T1]{fontenc}
\usepackage{geometry}
\geometry{verbose,tmargin=2.5cm,bmargin=2.5cm,lmargin=2.5cm,rmargin=2.5cm}
\setcounter{secnumdepth}{2}
\setcounter{tocdepth}{2}
\usepackage{url}
\usepackage[unicode=true,pdfusetitle,
 bookmarks=true,bookmarksnumbered=true,bookmarksopen=true,bookmarksopenlevel=2,
 breaklinks=false,pdfborder={0 0 1},backref=false,colorlinks=false]
 {hyperref}
\hypersetup{
 pdfstartview={XYZ null null 1}}

\newcommand{\pkg}[1]{{\fontseries{b}\selectfont #1}}
\renewcommand{\pkg}[1]{{\textsf{#1}}}

\newcommand{\Rpackage}[1]{\textsl{#1}}
\newcommand\CRANpkg[1]{%
  {\href{http://cran.fhcrc.org/web/packages/#1/index.html}%
    {\Rpackage{#1}}}}
\newcommand\Githubpkg[1]{\GithubSplit#1\relax}
\def\GithubSplit#1/#2\relax{{\href{https://github.com/#1/#2}%
    {\Rpackage{#2}}}}

\newcommand{\Rcode}[1]{\texttt{#1}}
\newcommand{\Rfunction}[1]{\Rcode{#1}}
\newcommand{\Robject}[1]{\Rcode{#1}}
\newcommand{\Rclass}[1]{\textit{#1}}


\begin{document}

<<setup, include=FALSE, cache=FALSE>>=
library(knitr)
opts_chunk$set(
concordance=TRUE
)
@

\title{A Short Introduction to the \Rpackage{gpuR} Package}
\author{Dr. Charles Determan Jr. PhD\footnote{cdetermanjr@gmail.com}}
\newpage

\maketitle
\section{Introduction}
GPUs (Graphic Processing Units) were originally developed to
perform graphic rendering and commonly referred to in the comuting gaming world.  
These devices are also able to be applied to numerical operations in parallel.
Although there are a few different vendors, the two primary competitors are
AMD and NVIDIA.

NVIDIA GPUs depend upon the proprietary CUDA framework whereas AMD
GPUs utilize the open source OpenCL framework.  The upside of OpenCL is that 
'kernels' are able to used on any GPU whereas CUDA kernels can only be used
on NVIDIA GPUs.  It is worth noting, however, that CUDA tends to edge out
OpenCL performance likely a result of the highly specific framework to the
NVIDIA GPUs.

That said, programming either framework is often difficult for programmers
unaccustomed to working with such a low-level interface.  Creating bindings
for high-level programming languages (such as R) make using GPUs much more
accessible to a broader audience.

Several R packages have been developed including \CRANpkg{gputools}, 
\CRANpkg{cudaBayesreg}, \CRANpkg{HiPLARM}, \CRANpkg{HiPLARb}, and 
\CRANpkg{gmatrix}. However, all of the packages depend upon a CUDA backend and 
therefore restrict the using to only using NVIDIA GPUs.  The novelty of this 
package is the use of the ViennaCL library (http://viennacl.sourceforge.net/) 
which has been coveniently been repackaged in the \Githubpkg{cdeterman/RViennaCL} package 
to be used in other R packages.  This allows the user to leverage the 
auto-tuned OpenCL kernels of the ViennaCL library on any GPU.  It also allows 
for a CUDA backend for those who in fact have a NVIDIA GPU for improved 
performance, which will be provided in a companion \Githubpkg{cdeterman/gpuRcuda} package.

Of the aforementioned packages, most contain a very limited set of 
functions available to the R user within the packages.  The most extensive
being the \Rpackage{gmatrix} package which contains most linear algebra operations.
All of the packages (with the exception of \Rpackage{gmatrix}) don't store the data
on the GPU.  As such, there is the overhead of transferring data back and forth
between the device and host.  Similar to \Rpackage{gmatrix}, this package utilizes
S4 classes to store an external pointer to the data on the GPU which mirror the
base \Rcode{matrix} and \Rcode{vector} classes.  However, given the interactive
nature of R programming and the limited RAM available on GPU's this package 
provides intermediate classes that remove the object from GPU RAM to allow
objects to be stored on the CPU but still utilize the GPU as needed.

\maketitle
\section{Install}
Install \Rpackage{gpuR} using  
<<install, eval = FALSE>>=
# Stable version
install.packages("gpuR")
@

If a user is interested in using the current development version they should
install directly from my github account.

<<installDev, eval = FALSE>>=
# Dev version
devtools::install_github("cdeterman/gpuR", ref = "develop")

# Note this may require install of the RViennaCL from my github
# if updates have been made
#devtools::install_github("cdeterman/RViennaCL")
@

\maketitle
\section{Getting Help}
If you find an error/bug while using this package I would like to know about it.  
Likewise, if there is a method you think would be useful for others feel free
to request it's addition (or provide a contribution yourself to be added).  
Please submit the all such reports and requests on my github 
\href{https://github.com/cdeterman/gpuR/issues}{Issues}.


\newpage
\maketitle
\section{Basic Use with \Rclass{gpuMatrix}}
\pkg{gpuR} has most basic linear algebra operations.  The user simply needs
to create a \Rclass{gpuMatrix} object and the GPU methods will be used.
Here is a minimal example demonstrating typical matrix multiplication.

<<matMult, eval=FALSE>>=
library("gpuR")

# verify you have valid GPUs
detectGPUs()

# create gpuMatrix and multiply
set.seed(123)
gpuA <- gpuMatrix(rnorm(16), nrow=4, ncol=4)
gpuB <- gpuA %*% gpuA
@

Most linear algebra methods have been created to be executed for the 
\Rclass{gpuMatrix} and \Rclass{gpuVector} objects.  These methods include basic 
Arithmetic functions \Rcode{\%*\%}, \Rcode{+}, \Rcode{-}, \Rcode{*}, 
\Rcode{/}, \Rcode{t}, \Rcode{crossprod}, \Rcode{tcrossprod}, \Rcode{colMeans},
\Rcode{colSums}, \Rcode{rowMeans}, and \Rcode{rowSums}.  Math functions include 
\Rcode{sin}, \Rcode{asin}, \Rcode{sinh}, \Rcode{cos}, \Rcode{acos},
\Rcode{cosh}, \Rcode{tan}, \Rcode{atan}, \Rcode{tanh}, \Rcode{exp}, 
\Rcode{log}, \Rcode{log10}, \Rcode{exp}, \Rcode{abs}, \Rcode{max}, and
\Rcode{min}.  Additional operations include some linear algebra routines such 
as \Rcode{cov} (Pearson Covariance) and \Rcode{eigen}. A few 'distance' routines
have also been added with the \Rcode{dist} and \Rcode{distance} (for pairwise) 
functions.  These currently include 'Euclidean' and 'SqEuclidean' methods.  

The objects may also be created specifying data type including int, float, and 
double.  Float type was included to provide a smaller memory footprint and also 
increased throughput if the increased accuracy of double is not required.

Both the \Rclass{gpuMatrix} and \Rclass{vclMatrix} objects return a pointer
to the data.  Given that working with GPU's implies that you are working with
larger datasets, this prevents R from making unneccessary copies.  However,
this does require the user to exercise caution as any change made to a 'copy' 
(e.g. \Rcode{gpuB <- gpuA}) will result in changes to the original object and
all others pointing to it as well.

\maketitle
\section{\Rclass{vclMatrix} Class}
The \Rclass{vclMatrix} class was created to allow the user to put data directly
on the GPU once and not need to continually push data back and forth between
the host and device.  Therefore, if multiple processes are to be applied to
a given matrix, there will be significant savings by using \Rclass{vclMatrix}
objects.  It is important to remember though, different GPU's have different
amounts of RAM.  The interactive nature of R often has many objects existing
simultaenously where you may exceed your GPU's RAM.  As such, the 
\Rclass{gpuMatrix} class is provided.

\maketitle
\section{'block' \& 'slice' object}
Sometimes it is useful to only refer to a subset of a matrix and apply
some operations.  This has been accomplished by providing the \Rcode{block} and
\Rcode{slice} methods for matrix and vector classes respectively.  The resulting
object is a child class of the parent object (e.g. \Rcode{gpuMatrixBlock} from
\Rcode{gpuMatrix}).  As such, nearly all methods defined for the parent objects
are also defined for the child class.

<<matBlock, eval=FALSE>>=

# create gpuMatrix
set.seed(123)
gpuA <- gpuMatrix(rnorm(16), nrow=4, ncol=4)

# create block omitting the 1st row
gpuB <- block(gpuA, 
              rowStart = 2L, rowEnd = 4L, 
              colStart = 1L, colEnd = 4L)
@

The resulting object is a reference to the parent objects elements to avoid
memory duplication.  As such, any changes to the 'block' or 'slice' object will 
alter the parent elements.  If the user wishes to make these objects distinct,
they should use the \Rcode{deepcopy} function.

\maketitle
\section{GPU Memory Management}
This package has been written to utilize the capabilities provided by 
\CRANpkg{Rcpp}.  As such, the pointers referenced in the objects herein are
handled by the 'XPtr' object.  Once the object is deleted in the R session
\textbf{and the garbage collector has run} the GPU memory will be freed.

It is important for the user to respect the rate at which the R garbage 
collector is called.  It has been shown that when some of these functions are
called repeatedly in a loop the GPU processing goes too rapidly for the garbage
collector to be called to release the memory resulting in the GPU becoming 
filled.  Explicitly calling \Rcode{gc()} within loops following \Rcode{rm} of
the temporary object should alleviate problems such as these.


\newpage
\maketitle
\section{Contributing Back}
Given the broad use for this package (i.e. many different GPUs)
it would be useful if you could report success cases.  A curated list of 
tested platforms would be valuable for future users.  Please see report your
Operating System, OpenCL Platform (can be seen with \Rcode{platformInfo}),
and your GPU(s) also on my gitter 
\href{https://gitter.im/cdeterman/gpuR/Tested_GPUs}{Tested GPUs} page.

\section{Crowd-source Testing}
As with the above statements, there a many architecture possibilities that this
package 'should' run on.  That said, it is impossible for me to have access
to all the different hardware.  As such, I am hoping that users can continually
test this package as well.
\newline
\newline
1. Test current release
The simplest approach is to test the current CRAN version.  Simply download the
tar file from CRAN archive for \CRANpkg{gpuR}.  Then you can just run:
\newline
\newline
\Rcode{R CMD check gpuR\_version}. 
\newline
\newline
\textbf{NOTE} - you will likely receive 3 NOTES (checking repository 
dependencies, package size, and pandoc).  You also
will likely see a WARNING referring to the Date field being old.  You can safely
ignore these results.  Any error should be reported to my github 
\href{https://github.com/cdeterman/gpuR/issues}{Issues}.  Please include the
OS, OpenCL Platform, \& GPU(s).  Ideally you can at least report which test
is failing if returned by \Rcode{R CMD check}.
\newline
\newline
2. Test development version.  
This requires a little bit more effort on the users part.  First, you need to
to clone my development branch from my git repository.
\newline
\newline
\Rcode{git clone -b develop https://github.com/cdeterman/gpuR.git}
\newline
\newline
Then you can enter the gpuR directory and run \Rcode{devtools::test()}.
\newline
\newline
Ideally, users will try to at least diagnose at least which function the error
is originating (or sequence of functions).  It is important that if the user
cannot solve the bug that it is reproducible so that I (or others) are able
to try to approach the problem.

\end{document}