% \VignetteIndexEntry{Introduction to doMPI} % \VignetteDepends{doMPI} % \VignettePackage{doMPI} \documentclass[12pt]{article} \usepackage{amsmath} \usepackage[pdftex]{graphicx} \usepackage{color} \usepackage{xspace} \usepackage{fancyvrb} \usepackage{fancyhdr} \usepackage[ colorlinks=true, linkcolor=blue, citecolor=blue, urlcolor=blue] {hyperref} \usepackage{lscape} \usepackage{Sweave} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % define new colors for use \definecolor{darkgreen}{rgb}{0,0.6,0} \definecolor{darkred}{rgb}{0.6,0.0,0} \definecolor{lightbrown}{rgb}{1,0.9,0.8} \definecolor{brown}{rgb}{0.6,0.3,0.3} \definecolor{darkblue}{rgb}{0,0,0.8} \definecolor{darkmagenta}{rgb}{0.5,0,0.5} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \newcommand{\bld}[1]{\mbox{\boldmath $#1$}} \newcommand{\shell}[1]{\mbox{$#1$}} \renewcommand{\vec}[1]{\mbox{\bf {#1}}} \newcommand{\ReallySmallSpacing}{\renewcommand{\baselinestretch}{.6}\Large\normalsize} \newcommand{\SmallSpacing}{\renewcommand{\baselinestretch}{1.1}\Large\normalsize} \newcommand{\halfs}{\frac{1}{2}} \setlength{\oddsidemargin}{-.25 truein} \setlength{\evensidemargin}{0truein} \setlength{\topmargin}{-0.2truein} \setlength{\textwidth}{7 truein} \setlength{\textheight}{8.5 truein} \setlength{\parindent}{0.20truein} \setlength{\parskip}{0.10truein} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \pagestyle{fancy} \lhead{} \chead{Introduction to {\em doMPI}} \rhead{} \lfoot{} \cfoot{} \rfoot{\thepage} \renewcommand{\headrulewidth}{1pt} \renewcommand{\footrulewidth}{1pt} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \title{Introduction to {\tt doMPI}} \author{Steve Weston \\ stephen.b.weston@gmail.com} \begin{document} \maketitle \thispagestyle{empty} \section{Introduction} <>= library(foreach) # examples in this document will be executed sequentially registerDoSEQ() @ The \texttt{doMPI} package is what I call a ``parallel backend'' for the \texttt{foreach} package. Since the \texttt{foreach} package is not a parallel programming system, but a parallel programming framework, it needs a parallel programming system to do the actual work in parallel. The \texttt{doMPI} package acts as an adaptor to the \texttt{Rmpi} package, which in turn is an \texttt{R} interface to an implementation of \texttt{MPI}. \texttt{MPI}, or \emph{Message Passing Interface}, is a specification for an \texttt{API} for passing messages between different computers. There are a number of \texttt{MPI} implementations available that allow data to be moved between computers quite efficiently. Programming with \texttt{MPI} is rather difficult, however, since it is a rather large and complex \texttt{API}. For example, the \texttt{Rmpi} package defines about 110 \texttt{R} functions, only a few of which are only for internal use. And the \texttt{MPI} standard includes many more functions that aren't supported by \texttt{Rmpi}, such as the file functions. Of course, you only need to learn a small percentage of those functions in order to start using \texttt{MPI} effectively, but it can take awhile just to figure out which functions are really important, and which ones you can safely ignore. The \texttt{foreach} package is an attempt to make parallel computing much simpler by providing a parallel for-loop construct for \texttt{R}. As an adaptor to \texttt{MPI}, the \texttt{doMPI} package is an attempt to give you the best of both worlds: the ease of use of parallel for-loops, with the efficiency of \texttt{MPI}. Unfortunately, there are still a wide variety of problems that you may run into. First, you need to have \texttt{MPI} installed on your computers, and then you have to build and install \texttt{Rmpi} to use that \texttt{MPI} installation. That is where many people run into problems, particularly on Windows. However, on Debian / Ubuntu, it is very easy to install \texttt{Open MPI}, and \texttt{Rmpi} works quite well with it. Another thing to be aware of is that you have to run your \texttt{R} programs differently in order to execute on multiple computers, and the exact method varies depending on your \texttt{MPI} implementation. If you just start a normal \texttt{R} session, your workers will only run on your local machine. That's great for testing, but the point of using \texttt{doMPI} is to execute on multiple computers. To do that, you'll need to start your \texttt{R} session using a command such as \texttt{mpirun} or \texttt{mpiexec}, depending on your \texttt{MPI} installation. Hopefully, this document will help you to get started, but it primarily deals with programming issues, not configuring or administering computers. And for running \texttt{doMPI} scripts, I only discuss the use of \texttt{Open MPI}. Very likely you will have some problem that I couldn't help you with anyway, and so I highly recommend that you join the R-sig-hpc mailing list. It is a very friendly and helpful community, and there isn't even a lot of traffic on it. \section{Installing the software} Assuming that you already have \texttt{R} and \texttt{MPI} installed on your computers, you will need to install \texttt{doMPI} and the packages that it depends on. That can be done with a single ``install.packages'' command, but you might want to explicitly install \texttt{Rmpi} first, so you can clearly see if it installs correctly or not. On Debian / Ubuntu, you can actually install \texttt{Rmpi} by using \texttt{apt-get} to install the Debian \texttt{r-cran-rmpi} package. That's the most fool-proof way to install \texttt{Rmpi} on Ubuntu, for example, since \texttt{apt-get} will automatically install a compatible version of \texttt{MPI}, if necessary. But on other systems you can install it with: <>= install.packages("Rmpi") @ As I mentioned, if you have problems, I strongly recommend the R-sig-hpc mailing list if you don't have a local expert. Once \texttt{Rmpi} is installed, the rest should be easy: <>= install.packages("doMPI", dependencies=TRUE) @ That will also install packages such as \texttt{foreach} and \texttt{iterators}. \section{Getting Started} So if I haven't scared you off with my introduction, and you got all of the software installed, then it's time to see if it actually works! For testing purposes, we'll just run on a single computer. In that case, you just start a normal \texttt{R} session. I'll talk about running on multiple computers later, in section 8. Once you've started an \texttt{R} session, load the \texttt{doMPI} package: <>= library(doMPI) @ You'll no doubt get an error at this point if \texttt{Rmpi} isn't properly installed. If that happens, you should restart the \texttt{R} session, load \texttt{Rmpi}, make a copy of the error message, google around for the error, scratch your head for awhile, and then post a polite, well-thought-out request for help to R-sig-hpc that includes the entire error message. Very possibly, someone on the list has encountered that error before, and will offer advice on how to fix it. But don't be too surprised if someone suggests that you switch to Ubuntu. Once you can successfully load \texttt{doMPI}, the next step is to create an \texttt{MPI} cluster object. There are lots of options, but here's one way to do that\footnote{I'm using the \texttt{count} argument because we're in an interactive R session. However, in section 8, I'll explain why I don't recommend using the \texttt{count} argument for most other circumstances.}: <>= cl <- startMPIcluster(count=2) @ This starts two cluster workers, or slaves, as \texttt{MPI} calls them. Note that if you're using \texttt{Open MPI}, those two processes will immediately start using as much of your CPU as they can. This is a known issue with \texttt{Open MPI}. They are waiting for the master process to send them a message, but they are ``busy waiting'' for that message. That doesn't make much sense in this context, but it can give better performance in some contexts. At any rate, it is an issue that the \texttt{Open MPI} group is planning to address in a future release. The next step is to register the \texttt{MPI} cluster object with the \texttt{foreach} package. This is done with the ``registerDoMPI'' function: <>= registerDoMPI(cl) @ This tells \texttt{foreach} that you want to use \texttt{doMPI} as the parallel backend, and that you want to use the specified \texttt{MPI} cluster object to execute the tasks. If you don't register a cluster object, your tasks won't be executed in parallel, even though you use \texttt{foreach} with the \texttt{\%dopar\%} operator. When you are finished using the \texttt{MPI} cluster object, it is important to shut it down, otherwise you may leak processes on your cluster.\footnote{Shutting down the workers is particularly important with \texttt{Open MPI}, since they use CPU time even when they are ``idle''.} You do that with the \texttt{closeCluster} function: <>= closeCluster(cl) @ And finally, you should probably call \texttt{mpi.quit} to exit your \texttt{R} session when using \texttt{doMPI}. You could call \texttt{mpi.finalize} instead if you want to use \texttt{doMPI} interactively during testing. Both of these functions will ``finalize'' \texttt{MPI}, which is a requirement for \texttt{MPI} programs. I'm not sure what the consequences are of not finalizing \texttt{MPI}, but I have seen error messages from \texttt{mpirun} when I forgot to finalize. \section{A simple example} Now we're ready to run a little test that actually runs in parallel. I always do that with a very simple program that executes the \texttt{sqrt} function in parallel. It's not a practical use of \texttt{foreach}, because the \texttt{sqrt} function runs so fast that it makes no sense to execute it in parallel. But it makes a great test. <<>>= foreach(i=1:3) %dopar% sqrt(i) @ There are a lot of things to comment on in this little example. First of all, notice that a list containing three numbers is returned by this command. That is quite different from a for-loop. If we did this in a for-loop, we would execute \texttt{sqrt} three times, but the result would be thrown away. A for-loop normally uses assignments to save its results, but \texttt{foreach} works more like \texttt{lapply} in this respect, and that is what makes it work well for parallel execution. Secondly, notice that we're using a strange binary operator, named \texttt{\%dopar\%}. \texttt{R} doesn't really make it easy to define new control structures, but since arguments to functions are lazily evaluated, it is possible. You don't have to worry about that, however. Just keep in mind that you need to put \texttt{\%dopar\%} in between the \texttt{foreach} and the loop body. In a real program, you'll obviously want to save the results of the computations. You may also have several operations to perform, so the way you'll usually write the previous example is as: <<>>= x <- foreach(i=1:3) %dopar% { sqrt(i) } x @ The braces are a good practice since they force the correct operator precedence. And if you squint hard, it looks just like a for-loop. \section{The .combine option} But what if we want our results to be returned in a vector, rather than a list? We could convert it to a list afterwards, but there's another way. The \texttt{foreach} function takes a number of additional arguments, all of which start with a ``.'' to distinguish them from the ``variable'' arguments. The one we need in this case is named \texttt{.combine}. You can use it to specify a function that will be used to combine the results. This function must take at least two input arguments, and return a value which is a ``combined'' or possibly ``reduced'' version of its input arguments. To combine many numeric values into a single vector of numeric values, we can use the standard \texttt{c} function, which concatenates its arguments: <<>>= x <- foreach(i=1:3, .combine="c") %dopar% { sqrt(i) } x @ This works well for numeric, character, and logical values. I also use the \texttt{cbind} function as a \texttt{.combine} function quite often. I use it to turn vectors into matrices: <<>>= x <- foreach(seed=c(7, 11, 13), .combine="cbind") %dopar% { set.seed(seed) rnorm(3) } x @ \section{Computing the sinc function} Now let's try a slightly more realistic example. Let's evaluate the \texttt{sinc} function over a grid of values. We'd like to return the results in a matrix, as in the previous example, so we'll use \texttt{cbind} to combine the results: <>= x <- seq(-8, 8, by=0.5) v <- foreach(y=x, .combine="cbind") %dopar% { r <- sqrt(x^2 + y^2) + .Machine$double.eps sin(r) / r } persp(x, x, v) @ One thing to note in this example is that I'm doing an assignment to the variable \texttt{r} in the loop. You can do that with \texttt{foreach}, but if you want your code to work correctly in parallel, you need to be careful not to use variable assignments as a way of communicating between different iterations of the loop. That's what parallel programmers call a ``loop dependence''. In this case I'm only using it to communicate within a single iteration, so it's safe. Also note that I'm using the variable \texttt{x} inside the loop, which was defined before the loop. In many other parallel computing systems, you'd have to explicitly export that variable somehow or other. With \texttt{foreach}, it's done automatically. As a comparison, let's write this program using the standard \texttt{lapply} function: <>= x <- seq(-8, 8, by=0.5) sinc <- function(y) { r <- sqrt(x^2 + y^2) + .Machine$double.eps sin(r) / r } r <- lapply(x, sinc) v <- do.call("cbind", r) persp(x, x, v) @ As you can see, there is a similarity between these two versions. But it makes me very happy that the parallel version is actually shorter and easier to explain than the sequential version. I should also point out that in this \texttt{sinc} example, I've been very careful to make use of vector operations in the loop. The primary rule for getting good performance in \texttt{R} is to use vector operations whenever possible. I flagrantly broke that rule in the \texttt{sqrt} example. In the \texttt{sinc} example, I used vector operations to compute the columns, and I used \texttt{foreach} to execute those vector operations in parallel. The \texttt{sinc} example is still not really compute intensive enough to really merit executing it in parallel, but at least it uses much better \texttt{R} programming style. \section{Processing a directory of data files} Now let's try a different kind of example that you probably haven't seen in any parallel programming book, but simply reeks of practicality. Let's say that you have a directory full of CSV data files, and you want to read each of them, analyze the data in some way, and produce a plot of the results that you write to output files. Obviously, it's embarrassingly parallel, it can be time consuming, and it's a common task to perform. Let's perform that kind of operation using Andy Liaw's \texttt{randomForest} package: <>= ifiles <- list.files(pattern="\\.csv$") ofiles <- sub("\\.csv$", ".png", ifiles) foreach(i=ifiles, o=ofiles, .packages="randomForest") %dopar% { d <- read.csv(i) rf <- randomForest(Species~., data=d, proximity=TRUE) png(filename=o) MDSplot(rf, d$Species) dev.off() NULL } @ Note that we're iterating over two different vectors in this example: \texttt{ifiles} and \texttt{ofiles}, which are both the same length. The \texttt{foreach} function let's you specify any number of variables to iterate over, and it stops when one of them runs out of values. Also note the \texttt{NULL} at the end of the loop body. There is no value to return in this case, so I used a \texttt{NULL} to make sure that I wasn't needlessly sending anything large back to the master, which would hurt the performance somewhat. The real results of executing the loop are written to disk by the workers directly. \section{Running doMPI on a cluster} Now that we've run some interesting examples on a single computer, let's try running on multiple computers, which is the real purpose of the \texttt{doMPI} package. To execute \texttt{doMPI} scripts on multiple computers, you need to execute the \texttt{R} interpreter using a command such as \texttt{mpirun} or \texttt{mpiexec}. In this vignette, I'll only discuss the Open MPI version of \texttt{mpirun}, although it should be pretty simple to translate my instructions for other implementations. When you execute \texttt{mpirun}, you can specify the hosts to use as a comma-separated list using the \texttt{-H} or \texttt{-host} option, or by specifying a ``host file'' using the \texttt{-hostfile} option. You can also use the \texttt{-n} or \texttt{-np} option to specify how many copies of your script to run on those nodes. If you specify that only one copy should be executed, then \texttt{doMPI} will start workers for you using the \texttt{Rmpi} \texttt{mpi.comm.spawn} function when you execute \texttt{startMPIcluster}. If you specify that more than one copy of your script should be executed on those hosts, then \texttt{doMPI} will not spawn workers, assuming that you have started all of the workers that you wanted using \texttt{mpirun}\footnote{Actually, \texttt{startMPIcluster} will always spawn workers unless the \texttt{comm} argument is \texttt{0}, but \texttt{comm} defaults to \texttt{0} if \texttt{mpi.comm.size(0)} is greater than one. See the documentation on \texttt{startMPIcluster} for more information.}. That means that multiple instances of your script are going to start running, or in other words, the workers are going to execute your script. In that case, your script should call the \texttt{startMPIcluster} function near the beginning. The \texttt{startMPIcluster} function will notice when a ``worker'' calls it, and will execute the \texttt{workerLoop} function, in order to execute tasks submitted by the master, or ``rank 0'' process. Now let's try some examples. Here's an example that will run the script ``sincMPI.R'' using two workers: \begin{verbatim} $ mpirun -H localhost,n2,n3 R --slave -f sincMPI.R \end{verbatim} The \texttt{mpirun} command will run the \texttt{R} interpreter on ``localhost'', ``n2'', and ``n3'', and \texttt{R} will execute the script named ``sincMPI.R'' in the current directory. \texttt{sincMPI.R} is an example that comes in the \texttt{doMPI} package, so you can try this command out by going to the ``examples'' directory of the \texttt{doMPI} installation. The \texttt{sincMPI.R} script first loads \texttt{doMPI}, and then calls \texttt{startMPIcluster}. The copy of \texttt{sincMPI.R} running on the master, or ``rank 0'' process, will return from \texttt{startMPIcluster}, but the other two processes (which are running on hosts ``n2'' and ``n3'') will not return, since they will execute the \texttt{workerLoop} function. That is why it's a good idea to execute \texttt{startMPIcluster} near the beginning of the script, otherwise the workers may run into problems executing code that was intended to be executed by the master. The last example used what I call ``non-spawn'' or ``static'' mode. Now let's run in ``spawn'' or ``dynamic'' mode. We'll start the master on node ``n1'', but we'll also list hosts ``n2'' and ``n3''. That will include those nodes in the \texttt{MPI} ``universe'', so that \texttt{startMPIcluster} will be able to spawn processes to them: \begin{verbatim} $ mpirun -H n1,n2,n3 -n 1 R --slave -f sincMPI.R \end{verbatim} Because we specified \texttt{-n 1}, the \texttt{mpirun} command only starts one copy of the \texttt{R} interpreter running. \texttt{startMPIcluster} will then spawn two workers, since it notices that \texttt{mpi.universe.size()} is three. They will execute on hosts ``n2'' and ``n3'', since \texttt{MPI} knows that they are part of the ``universe''. So which of these two methods should you use? The \texttt{Open MPI} documentation advices using ``non-spawn'' mode, since that will give better performance. You probably also need to use ``non-spawn'' mode to run with a batch queueing system, such as ``slurm''. But if your script needs to perform other operations before \texttt{startMPIcluster} that would fail on the worker processes, or if you need to create multiple clusters, then you should use the ``spawn'' mode, by specifying \texttt{-n 1}. Here's a final example that starts multiple processes on the hosts using ``non-spawn'' mode: \begin{verbatim} $ mpirun -H localhost,n2,n3 -n 6 R --slave -f sincMPI.R \end{verbatim} This will start two processes on each of the hosts for a total of six processes. The master and one worker will run on ``localhost'', and ``n2'' and ``n3'' will both run two workers. In order to spawn five workers, you would use \texttt{-n 1}, but you'd also have to modify the script to call \texttt{startMPIcluster} with \texttt{count} set to \texttt{5}. In general, I'd suggest not using the \texttt{startMPIcluster} \texttt{count} argument unless you're doing something fancy, like creating multiple clusters. I think that's the best method, since it let's you control the number of workers strictly from the \texttt{mpirun} command, which is what we've been doing in all of these examples. But if you do specify a value for \texttt{count} while in ``non-spawn'' mode, it's got to be equal to \texttt{mpi.comm.size(0) - 1}, or \texttt{startMPIcluster} will issue an error. That's one reason that none of the examples or benchmarks that come with \texttt{doMPI} use the \texttt{count} argument. \section{Host setup} Of course, in order for any of these examples to work, you've got to set up a lot of things correctly on all of the hosts specified via \texttt{-H}. Generally speaking, I would strongly suggest that you setup your hosts so they have an environment that is almost identical to your local machine. That is, they should all have the same user account, with the same home directory, with passwordless ssh enabled, and all of the same software installed using the same file paths. For example, \texttt{R} should be installed in the same place on all machines, and the \texttt{doMPI} package should be installed in the same directory on all machines, as well. That's a pretty big requirement, but that's the way most people do things with \texttt{MPI}, and so that's the way that many computer clusters are configured. There are various tricks that you can use to make differently configured machines work together. One trick is to create symbolic links on the worker machines to make them look more like the master machine. And if your user account has different names on different machines, you can edit the ssh configuration file on the master machine to use the appropriate user name for each worker machine. But if you're going to do much parallel execution, it's worth the time and effort to set up your computers so that you don't have to depend on tricks. \section{Conclusion} I'd like to end by saying how easy parallel computing on a computer cluster can be if you use \texttt{foreach} and \texttt{doMPI}. And it is a lot simpler than many of the alternatives. But running on networks of computers always seems to present challenges. Hopefully, by using \texttt{foreach} and \texttt{doMPI}, most of those will be system administration challenges, and not programming challenges. Since \texttt{foreach} allows you to separate your parallel program from your parallel environment, you can develop your program on a single computer, without using any parallel backend at all, let alone a tricky one. Once your program is debugged and working, you can run it on a multicore workstation using a ``low maintenance'' parallel backend such as \texttt{doMC}\footnote{The \texttt{doMC} package is also available on CRAN.}. And once you've got a computer cluster set up that has all the software installed by some friendly sysadmin, with an NFS-mounted home directory, you can use the \texttt{doMPI} package to run that same parallel program very efficiently using \texttt{MPI} without having to learn anything about message passing. \end{document}