\documentclass{ubmanual} \usepackage{amsmath,amssymb} \usepackage{hyperref} \hypersetup{colorlinks=false, pdfborder=0 0 0, pdftitle={APCluster - An R Package for Affinity Propagation Clustering}, pdfauthor={Ulrich Bodenhofer}} \title{{\Huge APCluster}\\[5mm] An R Package for Affinity Propagation Clustering} \author{Ulrich Bodenhofer\affilmark{1,2}, Johannes Palme\affilmark{2}, Chrats Melkonian\affilmark{2}, and Andreas Kothmeier\affilmark{2}} \affiliation{\affilmark{1} School of Informatics, Communication and Media\\ University of Applied Sciences Upper Austria\\ Softwarepark 11, 4232 Hagenberg, Austria\\[2mm] {\grey\affilmark{2} previously with: Institute of Bioinformatics, Johannes Kepler University\\Altenberger Str.\ 69, 4040 Linz, Austria}} \newcommand{\APCluster}{\texttt{apcluster}} \newcommand{\KeBABS}{\texttt{kebabs}} \newcommand{\R}{R} \newcommand{\Real}{\mathbb{R}} \renewcommand{\vec}[1]{\mathbf{#1}} %\VignetteIndexEntry{An R Package for Affinity Propagation Clustering} %\VignetteDepends{methods, stats, graphics, utils} %\VignetteEngine{knitr::knitr} \begin{document} <>= options(width=72) knitr::opts_knit$set(width=72) set.seed(0) library(apcluster, quietly=TRUE) apclusterVersion <- packageDescription("apcluster")$Version apclusterDateRaw <- packageDescription("apcluster")$Date apclusterDateYear <- as.numeric(substr(apclusterDateRaw, 1, 4)) apclusterDateMonth <- as.numeric(substr(apclusterDateRaw, 6, 7)) apclusterDateDay <- as.numeric(substr(apclusterDateRaw, 9, 10)) apclusterDate <- paste(month.name[apclusterDateMonth], " ", apclusterDateDay, ", ", apclusterDateYear, sep="") @ \newcommand{\APClusterVer}{\Sexpr{apclusterVersion}} \newcommand{\APClusterDate}{\Sexpr{apclusterDate}} \manualtitlepage{Version \APClusterVer, \APClusterDate}{https://github.com/UBod/apcluster} \section*{Scope and Purpose of this Document} This document is a user manual for the \R\ package \APCluster\ \cite{BodenhoferKothmeierHochreiter11}. It is only meant as a gentle introduction into how to use the basic functions implemented in this package. Not all features of the \R\ package are described in full detail. Such details can be obtained from the documentation enclosed in the \R\ package. Further note the following: (1) this is neither an introduction to affinity propagation nor to clustering in general; (2) this is not an introduction to \R. If you lack the background for understanding this manual, you first have to read introductory literature on these subjects. \newpage \vspace{1cm} \newlength{\auxparskip} \setlength{\auxparskip}{\parskip} \setlength{\parskip}{0pt} \tableofcontents \clearpage \setlength{\parskip}{\auxparskip} \newlength{\Nboxwidth} \setlength{\Nboxwidth}{\textwidth} \addtolength{\Nboxwidth}{-2\fboxrule} \addtolength{\Nboxwidth}{-2\fboxsep} \newcommand{\notebox}[1]{% \begin{center} \fbox{\begin{minipage}{\Nboxwidth} \noindent{\sffamily\bfseries Note:} #1 \end{minipage}} \end{center}} \section{Introduction} Affinity propagation (AP) is a relatively new clustering algorithm that has been introduced by Brendan J.\ Frey and Delbert Dueck \cite{FreyDueck07}.\footnotemark[1]\footnotetext[1]{% \url{https://psi.toronto.edu/research/affinity-propagation-clustering-by-message-passing/}}\stepcounter{footnote} The authors themselves describe affinity propagation as follows: \begin{quote} ``{\em An algorithm that identifies exemplars among data points and forms clusters of data points around these exemplars. It operates by simultaneously considering all data point as potential exemplars and exchanging messages between data points until a good set of exemplars and clusters emerges.}'' \end{quote} AP has been applied in various fields recently, among which bioinformatics is becoming increasingly important. Frey and Dueck have made their algorithm available as Matlab code.\footnotemark[1] Matlab, however, is relatively uncommon in bioinformatics. Instead, the statistical computing platform \R\ has become a widely accepted standard in this field. In order to leverage affinity propagation for bioinformatics applications, we have implemented affinity propagation as an \R\ package. Note, however, that the given package is in no way restricted to bioinformatics applications. It is as generally applicable as Frey's and Dueck's original Matlab code.\footnotemark[1] Starting with Version 1.1.0, the \APCluster\ package also features {\em exemplar-based agglomerative clustering} which can be used as a clustering method on its own or for creating a hierarchy of clusters that have been computed previously by affinity propagation. {\em Leveraged Affinity Propagation}, a variant of AP especially geared to applications involving large data sets, has first been included in Version 1.3.0. \section{Installation} \subsection{Installation via CRAN} The \R\ package \APCluster\ (current version: \APClusterVer) is part of the {\em Comprehensive R Archive Network (CRAN)}% \footnote{\url{http://cran.r-project.org/}}. The simplest way to install the package, therefore, is to enter the following command into your \R\ session: <>= install.packages("apcluster") @ If you use R on Windows or Mac OS, you can also conveniently use the package installation menu of your R GUI. \subsection{Manual installation from source} Under special circumstances, e.g. if you want to compile the C++ code included in the package with some custom options, you may prefer to install the package manually from source. To this end, open the package's page at CRAN% \footnote{\url{https://CRAN.R-project.org/package=apcluster}} and then proceed as follows: \begin{enumerate} \item Download \texttt{apcluster\_\APClusterVer.tar.gz} and save it to your harddisk. \item Open a shell/terminal/command prompt window and change to the directory where you put {\ttfamily apcluster\_\APClusterVer.tar.gz}. Enter \begin{quote} \ttfamily R CMD INSTALL apcluster\_\APClusterVer.tar.gz \end{quote} to install the package. \end{enumerate} Note that this might require additional software on some platforms. Windows requires Rtools\footnote{\url{http://cran.r-project.org/bin/windows/Rtools/}} to be installed and to be available in the default search path (environment variable \verb+PATH+). Mac OS X requires Xcode developer tools% \footnote{\url{https://developer.apple.com/technologies/tools/}} (make sure that you have the command line tools installed with Xcode). \subsection{Compatibility issues} All versions downloadable from CRAN have been built using the latest version, \R\ \Sexpr{R.version$major}.\Sexpr{R.version$minor}. However, the package should work without severe problems on \R\ versions $\geq$3.0.0. \section{Getting Started} To load the package, enter the following in your \R\ session: <>= library(apcluster) @ If this command terminates without any error message or warning, you can be sure that the package has been installed successfully. If so, the package is ready for use now and you can start clustering your data with affinity propagation. The package includes both a user manual (this document) and a reference manual (help pages for each function). To view the user manual, enter <>= vignette("apcluster") @ Help pages can be viewed using the \verb+help+ command. It is recommended to start with <>= help(apcluster) @ Affinity propagation does not require the data samples to be of any specific kind or structure. AP only requires a {\em similarity matrix}, i.e., given $l$ data samples, this is an $l\times l$ real-valued matrix $\mathbf{S}$, in which an entry $S_{ij}$ corresponds to a value measuring how similar sample $i$ is to sample $j$. AP does not require these values to be in a specific range. Values can be positive or negative. AP does not even require the similarity matrix to be symmetric (although, in most applications, it will be symmetric anyway). A value of $-\infty$ is interpreted as ``absolute dissimilarity''. The higher a value, the more similar two samples are considered. To get a first impression, let us create a random data set in $\Real^2$ as the union of two ``Gaussian clouds'': \begin{center} <>= cl1 <- cbind(rnorm(30, 0.3, 0.05), rnorm(30, 0.7, 0.04)) cl2 <- cbind(rnorm(30, 0.7, 0.04), rnorm(30, 0.4, .05)) x1 <- rbind(cl1, cl2) plot(x1, xlab="", ylab="", pch=19, cex=0.8) @ \end{center} The package \APCluster\ offers several different ways for clustering data. The simplest way is the following: <>= apres1a <- apcluster(negDistMat(r=2), x1) @ In this example, the function \verb+apcluster()+ first computes a similarity matrix for the input data \verb+x1+ using the {\em similarity function} passed as first argument. The choice \verb+negDistMat(r=2)+ is the standard similarity measure used in the papers of Frey and Dueck --- negative squared distances. Alternatively, one can compute the similarity matrix beforehand and call \verb+apcluster()+ for the similarity matrix (for a more detailed description of the differences, see \ref{ssec:memeff}): <>= s1 <- negDistMat(x1, r=2) apres1b <- apcluster(s1) @ The function \verb+apcluster()+ creates an object belonging to the S4 class \verb+APResult+ which is defined by the present package. To get detailed information on which data are stored in such objects, enter <>= help(APResult) @ The simplest thing we can do is to enter the name of the object (which implicitly calls \verb+show()+) to get a summary of the clustering result: <>= apres1a @ The \APCluster\ package allows for plotting the original data set along with a clustering result: \begin{center} <>= plot(apres1a, x1) @ \end{center} In this plot, each color corresponds to one cluster. The exemplar of each cluster is marked by a box and all cluster members are connected to their exemplars with lines. A heatmap is plotted with \verb+heatmap()+: \begin{center} <>= heatmap(apres1a) @ \end{center} In the heatmap, the samples are grouped according to clusters. The above heatmap confirms again that there are two main clusters in the data. A heatmap can be plotted for the object \verb+apres1a+ because \verb+apcluster()+, if called for data and a similarity function, by default includes the similarity matrix in the output object (unless it was called with the switch \verb+includeSim=FALSE+). If the similarity matrix is not included (which is the default if \verb+apcluster()+ has been called on a similarity matrix directly), \verb+heatmap()+ must be called with the similarity matrix as second argument: \begin{center} <>= heatmap(apres1b, s1) @ \end{center} Suppose we want to have better insight into what the algorithm did in each iteration. For this purpose, we can supply the option \verb+details=TRUE+ to \verb+apcluster()+: <>= apres1c <- apcluster(s1, details=TRUE) @ This option tells the algorithm to keep a detailed log about its progress. For example, this allows for plotting the three performance measures that AP uses internally for each iteration: \begin{center} <>= plot(apres1c) @ \end{center} These performance measures are: \begin{enumerate} \item Sum of exemplar preferences \item Sum of similarities of exemplars to their cluster members \item Net fitness: sum of the two former \end{enumerate} For details, the user is referred to the original affinity propagation paper \cite{FreyDueck07} and the supplementary material published on the affinity propagation Web page.\footnotemark[1] We see from the above plot that the algorithm has not made any change for the last 100 iterations. AP, through its parameter \verb+convits+, allows to control for how long AP waits for a change until it terminates (the default is \verb+convits=100+). If the user has the feeling that AP will probably converge quicker on his/her data set, a lower value can be used: <>= apres1c <- apcluster(s1, convits=15, details=TRUE) apres1c @ \section{Adjusting Input Preferences}\label{sec:ipref} Apart from the similarity matrix itself, the most important input parameter of AP is the so-called {\em input preference} which can be interpreted as the tendency of a data sample to become an exemplar (see \cite{FreyDueck07} and supplementary material on the AP homepage\footnotemark[1] for a more detailed explanation). This input preference can either be chosen individually for each data sample or it can be a single value shared among all data samples. Input preferences largely determine the number of clusters, in other words, how fine- or coarse-grained the clustering result will be. The input preferences one can specify for AP are roughly in the same range as the similarity values, but they do not have a straightforward interpretation. Frey and Dueck have introduced the following rule of thumb: ``{\it The shared value could be the median of the input similarities (resulting in a moderate number of clusters) or their minimum (resulting in a small number of clusters).}'' \cite{FreyDueck07} Our AP implementation uses the median rule by default if the user does not supply a custom value for the input preferences. In order to provide the user with a knob that is --- at least to some extent --- interpretable, the function \verb+apcluster()+ provides an argument \verb+q+ that allows to set the input preference to a certain quantile of the input similarities: resulting in the median for \verb+q=0.5+ and in the minimum for \verb+q=0+. As an example, let us add two more ``clouds'' to the data set from above: \begin{center} <>= cl3 <- cbind(rnorm(20, 0.50, 0.03), rnorm(20, 0.72, 0.03)) cl4 <- cbind(rnorm(25, 0.50, 0.03), rnorm(25, 0.42, 0.04)) x2 <- rbind(x1, cl3, cl4) plot(x2, xlab="", ylab="", pch=19, cex=0.8) @ \end{center} For the default setting, we obtain the following result: \begin{center} <>= apres2a <- apcluster(negDistMat(r=2), x2) plot(apres2a, x2) @ \end{center} For the minimum of input similarities, we obtain the following result: \begin{center} <>= apres2b <- apcluster(negDistMat(r=2), x2, q=0) plot(apres2b, x2) @ \end{center} So we see that AP is quite robust against a reduction of input preferences in this example which may be caused by the clear separation of the four clusters. If we increase input preferences, however, we can force AP to split the four clusters into smaller sub-clusters: \begin{center} <>= apres2c <- apcluster(negDistMat(r=2), x2, q=0.8) plot(apres2c, x2) @ \end{center} Note that the input preference used by AP can be recovered from the output object (no matter which method to adjust input preferences has been used). On the one hand, the value is printed if the object is displayed (by \verb+show+ or by entering the output object's name). On the other hand, the value can be accessed directly via the slot \verb+p+: <>= apres2c@p @ As noted above already, we can produce a heatmap by calling \verb+heatmap()+ for an \verb+APResult+ object: \begin{center} <>= heatmap(apres2c) @ \end{center} The order in which the clusters are arranged in the heatmap is determined by means of joining the cluster agglomeratively (see Section \ref{sec:agglo} below). Although the affinity propagation result contains \Sexpr{length(apres2c@exemplars)} clusters, the heatmap indicates that there are actually four clusters which can be seen as very brightly colored squares along the diagonal. We also see that there seem to be two pairs of adjacent clusters, which can be seen from the fact that there are two relatively light-colored blocks along the diagonal encompassing two of the four clusters in each case. If we look back at how the data have been created (see also plots above), this is exactly what is to be expected. The above example with \verb+q=0+ demonstrates that setting input preferences to the minimum of input similarities does not necessarily result in a very small number of clusters (like one or two). This is due to the fact that input preferences need not necessarily be exactly in the range of the similarities. To determine a meaningful range, an auxiliary function is available which, in line with Frey's and Dueck's Matlab code,\footnotemark[1] allows to compute a minimum value (for which one or at most two clusters would be obtained) and a maximum value (for which as many clusters as data samples would be obtained): <>= preferenceRange(apres2b@sim) @ The function returns a two-element vector with the minimum value as first and the maximum value as second entry. The computations are done approximately by default. If one is interested in exact bounds, supply \verb+exact=TRUE+ (resulting in longer computation times). Many clustering algorithms need to know a pre-defined number of clusters. This is often a major nuisance, since the exact number of clusters is hard to know for non-trivial (in particular, high-dimensional) data sets. AP avoids this problem. If, however, one still wants to require a fixed number of clusters, this has to be accomplished by a search algorithm that adjusts input preferences in order to produce the desired number of clusters in the end. For convenience, this search algorithm is available as a function \verb+apclusterK()+ (analogous to Frey's and Dueck's Matlab implementation\footnotemark[1]). We can use this function to force AP to produce only two clusters (merging the two pairs of adjacent clouds into one cluster each). Analogously to \verb+apcluster()+, \verb+apclusterK()+ supports two variants --- it can either be called for a similarity measure and data or on a similarity matrix directly. \begin{center} <>= apres2d <- apclusterK(negDistMat(r=2), x2, K=2, verbose=TRUE) plot(apres2d, x2) @ \end{center} Now let us quickly consider a simple data set with more than two features. The notorious example is Fisher's iris data set: <>= data(iris) apIris1 <- apcluster(negDistMat(r=2), iris) apIris1 @ AP has identified \Sexpr{length(apIris1)} clusters. Since Version 1.3.2, the package also allows for superimposing clustering results in scatter plot matrices: \begin{center} <>= plot(apIris1, iris) @ \end{center} The heatmap looks as follows: \begin{center} <>= heatmap(apIris1) @ \end{center} Now let us try to obtain fewer clusters by using the minimum of off-diagonal similarities: <>= data(iris) apIris2 <- apcluster(negDistMat(r=2), iris, q=0) apIris2 @ AP has identified \Sexpr{length(apIris2)} clusters. If we again superimpose them in the scatter plot matrix, we obtain the following: \begin{center} <>= plot(apIris2, iris) @ \end{center} Finally, the heatmap looks as follows: \begin{center} <>= heatmap(apIris2) @ \end{center} So, looking at the heatmap, the \Sexpr{length(apIris2)} clusters seem quite reasonable, at least in the light of the fact that there are three species in the data set, {\em Iris setosa}, {\em Iris versicolor}, and {\em Iris virginica}, where {\em Iris setosa} is very clearly separated from each other (first cluster in the heatmap) and the two others are partly overlapping. \section{Exemplar-based Agglomerative Clustering}\label{sec:agglo} The function \verb+aggExCluster()+ realizes what can best be described as ``exemplar-based agglomerative clustering'', i.e.\ agglomerative clustering whose merging objective is geared towards the identification of meaningful exemplars. Analogously to \verb+apcluster()+, \verb+aggExCluster()+ supports two variants --- it can either be called for a similarity measure and data or on matrix of pairwise similarities. \subsection{Getting started} Let us start with a simple example: <>= aggres1a <- aggExCluster(negDistMat(r=2), x1) aggres1a @ The output object \verb+aggres1a+ contains the complete cluster hierarchy. As obvious from the above example, the \verb+show()+ method only displays the most basic information. Calling \verb+plot()+ on an object that was the result of \verb+aggExCluster()+ (an object of class \verb+AggExResult+), a dendrogram is plotted: \begin{center} <>= plot(aggres1a) @ \end{center} The heights of the merges in the dendrogram correspond to the merging objective: the higher the vertical bar of a merge, the less similar the two clusters have been. The dendrogram, therefore, clearly indicates two clusters. Heatmaps can be produced analogously as for \verb+APResult+ objects with the additional property that dendrograms are displayed on the top and on the left: \begin{center} <>= heatmap(aggres1a, s1) @ \end{center} Once we have confirmed the number of clusters, which is clearly 2 according to the dendrogram and the heatmap above, we can extract the level with two clusters from the cluster hierarchy. In concordance with standard \R\ terminology, the function for doing this is called \verb+cutree()+: \begin{center} <>= cl1a <- cutree(aggres1a, k=2) cl1a plot(cl1a, x1) @ \end{center} \subsection{Merging clusters obtained from affinity propagation} The most important application of \verb+aggExCluster()+ (and the reason why it is part of the \APCluster\ package) is that it can be used for creating a hierarchy of clusters starting from a set of clusters previously computed by affinity propagation. The examples in Section \ref{sec:ipref} indicate that it may sometimes be tricky to define the right input preference. Exemplar-based agglomerative clustering on affinity propagation results provides an additional tool for finding the right number of clusters. Let us revisit the four-cluster example from Section \ref{sec:ipref}. We can apply \verb+aggExCluster()+ to an affinity propagation result if we run it on the affinity propagation result supplied as second argument \verb+x+: <>= aggres2a <- aggExCluster(x=apres2c) aggres2a @ The result \verb+apres2c+ had \Sexpr{length(apres2c)} clusters. \verb+aggExCluster()+ successively joins these clusters until only one cluster is left. The dendrogram of this cluster hierarchy is given as follows: \begin{center} <>= plot(aggres2a) @ \end{center} If one wants to see the original samples in the dendrogram of the cluster hierarchy, the \verb+showSamples=TRUE+ option can be used. In this case, it is recommended to reduce the font size of the labels via the \verb+nodePar+ parameter (see \verb+?plot.dendrogram+ and the examples therein): \begin{center} <>= plot(aggres2a, showSamples=TRUE, nodePar=list(pch=NA, lab.cex=0.4)) @ \end{center} The following heatmap coincides with the one shown in Section \ref{sec:ipref} above. This is not surprising, since the heatmap plot for an affinity propagation result uses \verb+aggExCluster()+ internally to arrange the clusters: \begin{center} <>= heatmap(aggres2a) @ \end{center} Once we are more or less sure about the number of clusters, we extract the right clustering level from the hierarchy. For demonstation purposes, we do this for $k=5,\dots,2$ in the following plots: \begin{center} <>= par(mfrow=c(2,2)) for (k in 5:2) plot(aggres2a, x2, k=k, main=paste(k, "clusters")) @ \end{center} There is one obvious, but important, condition: applying \verb+aggExCluster()+ to an affinity propagation result only makes sense if the number of clusters to start from is at least as large as the number of true clusters in the data set. Clearly, if the number of clusters is already too small, then merging will make the situation only worse. \subsection{Details on the merging objective} Like any other agglomerative clustering method (see, e.g., \cite{JainMurtyFlynn99,MichalskiStepp92,Ward63}), \verb+aggExCluster()+ merges clusters until only one cluster containing all samples is obtained. In each step, two clusters are merged into one, i.e.\ the number of clusters is reduced by one. The only aspect in which \verb+aggExCluster()+ differs from other methods is the merging objective. Suppose we consider two clusters for possible merging, each of which is given by an index set: \[ I = \{i_1,\dots,i_{n_I}\} \text{ and } J = \{j_1,\dots,j_{n_J}\} \] Then we first determine the potential {\em joint exemplar} $\mathop{\mathrm{ex}}(I,J)$ as the sample that maximizes the average similarity to all samples in the joint cluster $I\cup J$: \[ \mathop{\mathrm{ex}}(I,J) =\mathop{\mathrm{argmax}}\limits_{i\in I\cup J} \frac{1}{n_I + n_J}\cdot \sum\limits_{j\in I\cup J} S_{ij} \] Recall that $\mathbf{S}$ denotes the similarity matrix and $S_{ij}$ corresponds to the similarity of the $i$-th and the $j$-th sample. Then the merging objective is computed as \[ \mathop{\mathrm{obj}}(I,J)=\frac{1}{2}\cdot\Big(\frac{1}{n_I}\cdot \sum\limits_{j\in I} S_{\mathop{\mathrm{ex}}(I,J)j}+\frac{1}{n_J}\cdot \sum\limits_{k\in J} S_{\mathop{\mathrm{ex}}(I,J)k}\Big), \] which can be best described as ``{\em balanced average similarity to the joint exemplar}''. In each step, \verb+aggExCluster()+ considers all pairs of clusters in the current cluster set and joins that pair of clusters whose merging objective is maximal. The rationale behind the merging objective is that those two clusters should be joined that are best described by a joint exemplar. \section{Leveraged Affinity Propagation}\label{sec:lever} Leveraged affinity propagation is based on the idea that, for large data sets with many samples, the cluster structure is already visible on a subset of the samples. Instead of evaluating the similarity matrix for all sample pairs, the similarities of all samples to a subset of samples are computed --- resulting in a non-square similarity matrix. Clustering is performed on this reduced similarity matrix allowing for clustering large data sets more efficiently. In this form of clustering, several rounds of affinity propagation are executed with different sample subsets --- iteratively improving the clustering result. The implementation is based on the Matlab code of Frey and Dueck provided on the AP Web page\footnotemark[1]. Apart from dynamic improvements through reduced amount of distance calculations and faster clustering, the memory consumption is also reduced not only in terms of the memory used for storing the similarity matrix, but also in terms of memory used by the clustering algorithm internally. The two main parameters controlling leveraged AP clustering are the fraction of data points that should be selected for clustering (parameter \verb+frac+) and the number of sweeps or repetitions of individual clustering runs (parameter \verb+sweeps+). Initially, a sample subset is selected randomly. For the subsequent repetitions, the exemplars of the previous run are kept in the sample subset and the other samples in the subset are chosen randomly again. The best result of all sweeps with the highest net similarity is kept as final clustering result. When called with a similarity measure and a dataset the function \verb+apclusterL()+ performs both the calculation of similarities and leveraged affinity propagation. In the example below, we use 10\% of the samples and run 5 repetitions. The function implementing the similarity measure can either be passed as a function or as a function name (which must of course be resolvable in the current environment). Additional parameters for the distance calculation can be passed to \verb+apclusterL()+ which passes them on to the function implementing the similarity measure via the \verb+...+ argument list. In any case, this function must be implemented such that it expects the data in its first argument \verb+x+ (a subsettable data structure, such as, a vector, matrix, data frame, or list) and that it takes the selection of ``column objects'' as a second argument \verb+sel+ which must be a set of column indices. The functions \verb+negDistMat()+, \verb+expSimMat()+, \verb+linSimMat()+, \verb+corSimMat()+, and \verb+linKernel()+ provided by the \APCluster\ package also support the easy creation of parameter-free similarity measures (in R terminology called ``closures''). We recommend this variant, as it is safer in terms of possible name conflicts between arguments of \verb+apclusterL()+ and arguments of the similarity function. Here is an example that makes use of a closure for defining the similarity measure: \begin{center} <>= cl5 <- cbind(rnorm(100, 0.3, 0.05), rnorm(100, 0.7, 0.04)) cl6 <- cbind(rnorm(100, 0.70, 0.04), rnorm(100, 0.4, 0.05)) x3 <- rbind(cl5, cl6) apres3 <- apclusterL(s=negDistMat(r=2), x=x3, frac=0.1, sweeps=5, p=-0.2) apres3 plot(apres3, x3) @ \end{center} The function \verb+apclusterL()+ creates a result object of S4 class \verb+APResult+ that contains the same information as for standard AP. Additionally, the selected sample subset, the associated rectangular similarity matrix for the best sweep (provided that \verb+includeSim=TRUE+) and the net similarities of all sweeps are returned in this object. <>= dim(apres3@sim) apres3@sel apres3@netsimLev @ The result returned by leveraged affinity propagation can be used for further processing in the same way as a result object returned from \verb+apcluster()+, e.g., merging of clusters with agglomerative clustering can be performed. For heatmap plotting either the parameter \verb+includeSim=TRUE+ must be set in \verb+apcluster()+ or \verb+apclusterL()+ to make the similarity matrix available in the result object or the similarity matrix must be passed as second parameter to \verb+heatmap()+ explicitly. The heatmap for leveraged AP looks slightly different compared to the heatmap for affinity propagation because the number of samples is different in both dimensions. \begin{center} <>= heatmap(apres3) @ \end{center} Often selected samples will be chosen as exemplars because, only for them, the full similarity information is available. This means that the fraction of samples should be selected in a way such that a considerable number of samples is available for each expected cluster. Please also note that a data set of the size used in this example can easily be clustered with regular affinity propagation. The data set was kept small to keep the package build time short and the amount of data output in the manual reasonable. For users requiring a higher degree of flexibility, e.g., for a customized selection of the sample subset, \verb+apclusterL()+ called with a rectangular similiarity matrix performs affinity propagation on a rectangular similarity matrix. See the source code of \verb+apclusterL()+ with signature \verb+s=function+ and \verb+x=ANY+ for an example how to embed \verb+apclusterL()+ into a complete loop performing leveraged AP. The package-provided functions for distance calculation support the generation of rectangular similarity matrices (see Chapter \ref{sec:DistMat}). \section{Sparse Affinity Propagation}\label{sec:sparse} Starting with Version 1.4.0 of the \APCluster\ package, the functions \verb+apcluster()+, \verb+apclusterK()+, and \verb+preferenceRange()+ can also handle similarity matrices as defined by the \verb+Matrix+ package. While all dense matrix formats are converted to standard R matrices, sparse matrices are converted internally to \verb+dgTMatrix+ objects. For these sparse matrices, special implementations of the \verb+apcluster()+, \verb+apclusterK()+, and \verb+preferenceRange()+ are available that fully exploit the sparseness of the matrices and may require much less operations if the matrix is sufficiently sparse. In order to demonstrate that, consider the following example: <>= dsim <- negDistMat(x2, r=2) ssim <- as.SparseSimilarityMatrix(dsim, lower=-0.2) str(ssim) @ The function \verb+as.SparseSimilarityMatrix()+ converts the dense similarity matrix \verb+dsim+ into a sparse similarity matrix by removing all pairwise similarities that are -0.2 or lower. Note that this is only for demonstration purposes. If the size of data permits that, it is advisable to use the entire dense similarity matrix. Anyway, let us run sparse AP on this similarity matrix: \begin{center} <>= sapres <- apcluster(ssim, q=0) plot(sapres, x2) @ \end{center} The functions \verb+preferenceRange()+ and \verb+apclusterK()+ work in the same way as for dense similarity matrices: <>= preferenceRange(ssim) apclusterK(ssim, K=2) @ The functions \verb+aggExCluster()+ and \verb+heatmap()+ have been extended to be able to handle sparse matrices. Note, however, that these functions are not yet exploiting sparsity properly. Instead, they convert all inputs to dense matrices before processing them, which may lead to memory and/or performance issues for large data sets. \begin{center} <>= heatmap(sapres, ssim) @ \end{center} The above heatmap illustrates that values that are not stored in the sparse similarity matrix are filled up with low values (see the red areas between some pairs of samples that belong to different clusters). Actually, each missing value is replaced with \[ \min(s)-(\max(s)-\min(s))=2\cdot\min(s)-\max(s), \] where $\min(s)$ and $\max(s)$ denote the smallest and the largest similarity value specified in the sparse similarity matrix $s$, respectively. The same replacement takes place when \verb+aggExCluster()+ converts sparse similarity matrices to dense ones. \section{Processing Biological Sequences}\label{sec:bioseq} As noted in the introduction above, one of the goals of this package is to leverage affinity propagation in bioinformatics applications. Previous versions of this document showed a toy example of using affinity propagation on a set of biological sequences that computed a similarity matrix using the simple {\em spectrum kernel} \cite{LeslieEskinNoble02} as implemented in the \KeBABS\ package \cite{PalmeHochreiterBodenhofer15}. This example has been removed in version 1.4.9 in order to avoid dependencies to a non-CRAN package. Instead, readers are now refered to the vignette of the \KeBABS\ package \cite{PalmeHochreiterBodenhofer15}, which also includes an example how to use affinity propagation clustering on a set of biological sequences. \section{Similarity Matrices}\label{sec:DistMat} Apart from the obvious monotonicity ``the higher the value, the more similar two samples'', affinity propagation does not make any specific assumption about the similarity measure. Negative squared distances must be used if one wants to minimize squared errors \cite{FreyDueck07}. Apart from that, the choice and implementation of the similarity measure is left to the user. Our package offers a few more methods to obtain similarity matrices. The choice of the right one (and, consequently, the objective function the algorithm optimizes) still has to be made by the user. All functions described in this section assume the input data matrix to be organized such that each row corresponds to one sample and each column corresponds to one feature (in line with the standard function \verb+dist+). If a vector is supplied instead of a matrix, each single entry is interpreted as a (one-dimensional) sample. \subsection{The function \texttt{negDistMat()}} The function \verb+negDistMat()+, in line with Frey and Dueck, allows for computing negative distances for a given set of real-valued data samples. If called with the first argument \verb+x+, a similarity matrix with pairwise negative distances is returned: <>= s <- negDistMat(x2) @ The function \texttt{negDistMat()} provides the same set of distance measures and parameters as the standard function \verb+dist()+ (except for \verb+method="binary"+ which makes little sense for real-valued data). Presently, \verb+negDistMat()+ provides the following variants of computing the distance $d(\vec{x},\vec{y})$ of two data samples $\vec{x}=(x_1,\dots,x_n)$ and $\vec{y}=(y_1,\dots,y_n)$: \begin{description} \item[Euclidean:] \[ d(\vec{x},\vec{y})=\sqrt{\sum\limits_{i=1}^n (x_i-y_i)^2} \] use \verb+method="euclidean"+ or do not specify argument \verb+method+ (since this is the default); \item[Maximum:] \[ d(\vec{x},\vec{y})=\max\limits_{i=1}^n |x_i-y_i| \] use \verb+method="maximum"+; \item[Sum of absolute distances / Manhattan:] \[ d(\vec{x},\vec{y})=\sum\limits_{i=1}^n |x_i-y_i| \] use \verb+method="manhattan"+; \item[Canberra:] \[ d(\vec{x},\vec{y})=\sum\limits_{i=1}^n \frac{|x_i-y_i|}{|x_i+y_i|} \] summands with zero denominators are not taken into account; use \verb+method="canberra"+; \item[Minkowski:] \[ d(\vec{x},\vec{y})=\left(\sum\limits_{i=1}^n (x_i-y_i)^p\right)^{\frac{1}{p}} \] use \verb+method="minkowski"+ and specify $p$ using the additional argument $\verb+p+$ (default is \verb+p=2+, resulting in the standard Euclidean distance); \item[Discrepancy:] \[ d(\vec{x},\vec{y})=\max\limits_{1\leq\alpha\leq\beta\leq n}\left|\sum\limits_{i=\alpha}^{\beta} (y_i-x_i)\right| \] use \verb+method="discrepancy"+ \cite{Weyl16}. \end{description} The function \verb+negDistMat()+ then takes the distances computed with one of the variants listed above and returns $-1$ times the $r$-th power of it, i.e., \begin{equation}\label{eq:negDistMat} s(\vec{x},\vec{y})=-d(\vec{x},\vec{y})^r. \end{equation} The exponent $r$ can be adjusted with the argument \verb+r+. The default is \verb+r=1+, hence, one has to supply \verb+r=2+ to obtain negative squared distances as in the examples in previous sections. Here are some examples: <>= ex <- matrix(c(0, 0.5, 0.8, 1, 0, 0.2, 0.5, 0.7, 0.1, 0, 1, 0.3, 1, 0.8, 0.2), 5, 3, byrow=TRUE) ex @ Standard Euclidean distance: <>= negDistMat(ex) @ Squared Euclidean distance: <>= negDistMat(ex, r=2) @ Maximum norm-based distance: <>= negDistMat(ex, method="maximum") @ Sum of absolute distances (aka Manhattan distance): <>= negDistMat(ex, method="manhattan") @ Canberra distance: <>= negDistMat(ex, method="canberra") @ Minkowski distance for $p=3$ ($3$-norm): <>= negDistMat(ex, method="minkowski", p=3) @ If called without the data argument \verb+x+, a function object is returned that can be supplied to clustering functions --- as in the majority of the above examples: <>= sim <- negDistMat(r=2) is.function(sim) apcluster(sim, x1) @ Depending on the application, it might be advisable to center and/or scale the data in order to equalize the influence of all features/columns. This makes sense for standard vector space distances like the Euclidean distance and can easily be accomplished by the \verb+scale()+ method. The discrepancy distance, in contrast, is strongly dependent on the order to feature/columns and is rather aimed at comparing signals. For this measure, therefore, row-wise centering can be advisable \cite{BauerBodenhoferKlement96c}. This is easily done with the \verb+sweep()+ function: <>= ex2 <- matrix(c(0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1), 3, 8, byrow=TRUE) matplot(t(ex2), ylab="") matlines(t(ex2), type="s") negDistMat(ex2, method="discrepancy") ex2Scaled <- sweep(ex2, 1, rowMeans(ex2)) ex2Scaled matplot(t(ex2Scaled), ylab="") matlines(t(ex2Scaled), type="s") negDistMat(ex2Scaled, method="discrepancy") @ \subsection{Other similarity measures} The package \APCluster\ offers four more functions for creating similarity matrices for real-valued data: \begin{description} \item[Exponential transformation of distances:] the function \verb+expSimMat()+ works in the same way as the function \verb+negDistMat()+. The difference is that, instead of the transformation \eqref{eq:negDistMat}, it uses the following transformation: \[ s(\vec{x},\vec{y})=\exp\left(-\left(\frac{d(\vec{x},\vec{y})}{w}\right)^r\right) \] Here the default is \verb+r=2+. It is clear that \verb+r=2+ in conjunction with \verb+method="euclidean"+ results in the well-known {\em Gaussian kernel / RBF kernel} \cite{FitzGeraldMicchelliPinkus95,Micchelli86,SchoelkopfSmola02}, whereas \verb+r=1+ in conjunction with \verb+method="euclidean"+ results in the similarity measure that is sometimes called {\em Laplace kernel} \cite{FitzGeraldMicchelliPinkus95,Micchelli86}. Both variants (for non-Euclidean distances as well) can also be interpreted as {\em fuzzy equality/similarity relations} \cite{DeBaetsMesiar02}. \item[Linear scaling of distances with truncation:] the function \verb+linSimMat()+ uses the transformation \[ s(\vec{x},\vec{y})=\max\left(1-\frac{d(\vec{x},\vec{y})}{w},0\right) \] which is also often interpreted as a {\em fuzzy equality/similarity relation} \cite{DeBaetsMesiar02}. \item[Correlation:] the function \verb+corSimMat()+ interprets the rows of its argument \verb+x+ (matrix or data frame) as multivariate observations and computes similarities as pairwise correlations. The function \verb+corSimMat()+ is actually a wrapper around the standard function \verb+cor()+. Consequently, the \verb+method+ argument allows for selecting the type of correlation to compute (Pearson, Spearman, or Kendall). \item[Linear kernel:] scalar products can also be interpreted as similarity measures, a view that is often adopted by kernel methods in machine learning. In order to provide the user with this option as well, the function \verb+linKernel()+ is available. For two data samples $\vec{x}=(x_1,\dots,x_n)$ and $\vec{y}=(y_1,\dots,y_n)$, it computes the similarity as \[ s(\vec{x},\vec{y})=\sum\limits_{i=1}^n x_i\cdot y_i. \] The function has one additional argument, \verb+normalize+ (default: \verb+FALSE+). If \verb+normalize+ is set to \verb+TRUE+, values are normalized to the range $[-1,+1]$ in the following way: \[ s(\vec{x},\vec{y})=\frac{\sum_{i=1}^n x_i\cdot y_i}% {\sqrt{\big(\sum_{i=1}^n x_i^2\big)\cdot% \big(\sum_{i=1}^n y_i^2\big)}} \] Entries for which at least one of the two factors in the denominator is zero are set to zero (however, the user should be aware that this should be avoided anyway). \end{description} For the same example data as above, we obtain the following for the RBF kernel: <>= expSimMat(ex) @ Laplace kernel: <>= expSimMat(ex, r=1) @ Pearson correlation coefficient: <>= corSimMat(ex, method="pearson") @ Spearman rank correlation coefficient: <>= corSimMat(ex, method="spearman") @ Linear scaling of distances with truncation: <>= linSimMat(ex, w=1.2) @ Linear kernel: <>= linKernel(ex[2:5,]) @ Normalized linear kernel: <>= linKernel(ex[2:5,], normalize=TRUE) @ All of these functions work in the same way as \verb+negDistMat()+: if called with argument \verb+x+, a similarity matrix is returned, otherwise a function is returned. \subsection{Rectangular similarity matrices} With the introduction of leveraged affinity propagation, distance calculations are entirely performed within the \APCluster\ package. The code is based on a customized version of the \verb+dist()+ function from the \verb+stats+ package. In the following example, a rectangular similarity matrix of all samples against a subset of the samples is computed: <>= sel <- sort(sample(1:nrow(x1), ceiling(0.08 * nrow(x1)))) sel s1r <- negDistMat(x1, sel, r=2) dim(s1r) s1r[1:7,] @ The rows correspond to all samples, the columns to the sample subset. The \verb+sel+ parameter specifies the sample indices of the selected samples in increasing order. Rectangular similarity calculation is provided in all distance functions of the package. If the parameter \verb+sel+ is not specified, the quadratic similarity matrix of all sample pairs is computed. \subsection{Defining a custom similarity measure for leveraged affinity propagation}\label{ssec:leverSim} As mentioned in Section \ref{sec:lever} above, leveraged affinity propagation requires the definition of a similarity measure that is supplied as a function or function name to \verb+apclusterL()+. For vectorial data, the similarity measures supplied with the package (see above) may be sufficient. If other similarity measures are necessary or if the data are not vectorial, the user must supply his/her own similarity measure. The user can supply any function as argument \verb+s+ to \verb+apcluster()+, \verb+apclusterK()+, or \verb+apclusterL()+, but the following rules must be obeyed in order to avoid errors and to ensure meaningful results: \begin{enumerate} \item The data must be supplied as first argument, which must be named \verb+x+. \item The second argument must be named \verb+sel+ and must be interpreted as a vector of indices that select a subset of data items in \verb+x+. \item The function must return a numeric matrix with similarities. If \verb+sel=NA+, the format of the matrix must be \verb+length(x)+$\times$\verb+length(x)+. If \verb+sel+ is not \verb+NA+, but contains indices selecting a subset, the format of the returned similarity matrix must be \verb+length(x)+$\times$\verb+length(sel)+. \item Although this is not a must, it is recommended to properly set row and column names in the returned similarity matrix. \end{enumerate} \subsection{Defining a custom similarity measure that creates a sparse similarity matrix} Since Version 1.4.0, similarity matrices may also be sparse (cf.~Section~\ref{sec:sparse}). Correspondingly, the similarity measures passed to \verb+apcluster()+ and \verb+apclusterK()+ may also return sparse similarity matrices: <>= sparseSim <- function(x) { as.SparseSimilarityMatrix(negDistMat(x, r=2), lower=-0.2) } sapres2 <- apcluster(sparseSim, x2, q=0) sapres2 str(similarity(sapres2)) @ Note that similarity measures passed to \verb+apclusterL()+ may not return sparse matrices. Instead, they must accept a \verb+sel+ argument and return a rectangular dense matrix (see Subsection~\ref{ssec:leverSim} above). \section{Miscellaneous} \subsection{Convenience vs.\ efficiency}\label{ssec:memeff} In most of the above examples, we called a clustering method by supplying it with a similarity function and the data to be clustered. This is undoubtedly a convenient approach. Since the resulting output objects (unless the option \verb+includeSim=FALSE+ is supplied) even includes the similarity matrix, we can plot heatmaps and produce a cluster hierarchy on the basis of the clustering result without the need to supply the similarity matrix explicitly. For large data sets, however, this convenient approach has some disadvantages: \begin{itemize} \item If the clustering algorithm is run several times on the same data set (e.g., for different parameters), the similarity matrix is recomputed every time. \item Every clustering result (depending on the option \verb+includeSim+) usually includes a copy of the similarity matrix. \end{itemize} For these reasons, depending on the actual application scenario, users should consider computing the similarity matrix beforehand. This strategy, however, requires some extra effort for subsequent processing, i.e.\ the similarity must be supplied as an extra argument in subsequent processing. \subsection{Clustering named objects}\label{ssec:names} The function \verb+apcluster()+ and all functions for computing distance matrices are implemented to recognize names of data objects and to correctly pass them through computations. The mechanism is best described with a simple example: <>= x3 <- c(1, 2, 3, 7, 8, 9) names(x3) <- c("a", "b", "c", "d", "e", "f") s3 <- negDistMat(x3, r=2) @ So we see that the \verb+names+ attribute must be used if a vector of named one-dimensional samples is to be clustered. If the data are not one-dimensional (a matrix or data frame), object names must be stored in the row names of the data matrix. All functions for computing similarity matrices recognize the object names. The resulting similarity matrix has the list of names both as row and column names. <>= s3 colnames(s3) @ The function \verb+apcluster()+ and all related functions use column names of similarity matrices as object names. If object names are available, clustering results are by default shown by names. <>= apres3a <-apcluster(s3) apres3a apres3a@exemplars apres3a@clusters @ \subsection{Computing a label vector from a clustering result} \label{ssec:labels} For later classification or comparisons with other clustering methods, it may be useful to compute a label vector from a clustering result. Our package provides an instance of the generic function \verb+labels()+ for this task. As obvious from the following example, the argument \verb+type+ can be used to determine how to compute the label vector. <>= apres3a@exemplars labels(apres3a, type="names") labels(apres3a, type="exemplars") labels(apres3a, type="enum") @ The first choice, \verb+"names"+ (default), uses names of exemplars as labels (if names are available, otherwise an error message is displayed). The second choice, \verb+"exemplars"+, uses indices of exemplars (enumerated as in the original data set). The third choice, \verb+"enum"+, uses indices of clusters (consecutively numbered as stored in the slot \verb+clusters+ --- analogous to the standard implementation of \verb+cutree()+ or the \verb+clusters+ field of the list returned by the standard function \verb+kmeans()+). \subsection{Customizing heatmaps} With Version 1.3.1, the implementation of heatmap plotting has changed significantly. The method now allows for many more customizations than before. Apart from changes in the argument list (see \verb+?heatmap+), the behavior of the method has changed as follows: \begin{itemize} \item Dendrograms are always plotted if possible. To switch off plotting of dendrograms, set \verb+Rowv+ and \verb+Colv+ to \verb+FALSE+ or \verb+NA+. If a dendrogram should only appear to the left of the heatmap, set \verb+Colv+ to \verb+FALSE+ or \verb+NA+. Analogously, set \verb+Rowv+ to \verb+FALSE+ or \verb+NA+ if a dendrogram should only be plotted on top of the plot (not possible if the similarity matrix is non-quadratic). \item Previously, \verb+rainbow()+ was used internally to determine how the bars illustrating the clusters are colored. Now users can determine the coloring of the color bars using the \verb+sideColors+ argument. For \verb+sideColors=NULL+, a meaningful color coding is determined automatically which still uses \verb+rainbow()+, but ensures that no similar colors are placed next to each other in the bar. \item The default font sizes for displaying row/column labels have been changed to make sure that they do not overlap. This can result in quite small labels if the number of samples is larger. In any case, the user can override the sizes by making custom settings of the parameters \verb+cexRow+ and \verb+cexCol+. Row and column labels can even be switched off entirely by setting \verb+cexRow+ and \verb+cexCol+ to 0, respectively. \end{itemize} Moreover, with Version~1.4.3, the possibility to add a color legend has been integrated. Here is an example with the vertical dendrogram switched off, an alternate color scheme, custom margins, and a color legend: \begin{center} <>= heatmap(apres2c, sideColors=c("darkgreen", "yellowgreen"), col=terrain.colors(12), Rowv=FALSE, dendScale=0.5, margins=c(3, 3, 2), legend="col") @ \end{center} The following example reverts to the default behavior prior to Version 1.3.1: consecutive rainbow colors, no dendrograms, and traditional sizing of row/column labels: \begin{center} <>= heatmap(apres2c, sideColors=rainbow(length(apres2c)), Rowv=FALSE, Colv=FALSE, cexRow=(0.2 + 1 / log10(nrow(apres2c@sim))), cexCol=(0.2 + 1 / log10(nrow(apres2c@sim)))) @ \end{center} \subsection{Adding a legend to plots of clustering results} As shown above, \verb+plot()+ called for an \verb+APResult+ object as first and a matrix or data frame as second argument plots the clustering result superimposed on a scatter plot (or a scatter plot matrix if the number of columns in the second argument exceeds 2). The clusters are shown in different colors, but it may not be clear which cluster is shown in which color. Therefore, it may be useful to show a legend along with the plot. The current implementation of \verb+plot()+ does not show a legend, since it is hard to determine where to actually place the legend such that no important cluster information gets occluded by the legend. Therefore, the user has to add legends manually. Actually, colors are always chosen according to a simple rule: \verb+plot()+ uses \verb+rainbow()+ to create a vector of colors that is exactly as long as the number of clusters in the \verb+APResult+ object. The following example shows how to plot a legend manually (with the clusters enumerated in the same way as in the \verb+APResult+ object): \begin{center} <>= plot(apres2a, x2) legend("bottomleft", legend=paste("Cluster", 1:length(apres2a)), col=rainbow(length(apres2a)), pch=19) @ \end{center} Note that this method is only meaningful for plotting clustering results superimposed on a 2D data set. For scatter plot matrices, this does not work in a meaningful way. In such a case, the user is rather recommended to create a legend separately (in a separate graphics device/file) and to display it along with the scatter plot matrix. To create only the legend, code like the following could be used: \begin{center} <>= plot.new() par(oma=rep(0, 4), mar=rep(0, 4)) legend("center", legend=paste("Cluster", 1:length(apres2c)), col=rainbow(length(apres2c)), pch=19) @ \end{center} It still may be necessary to strip off white margins for further usage of the legend. \subsection{Implementation and performance issues}\label{ssec:perf} Prior to Version 1.2.0, \verb+apcluster()+ was implemented in R. Starting with version 1.2.0, the main iteration loop of \verb+apcluster()+ has been implemented in C++ using the Rcpp package \cite{EddelbuettelFrancois11}, which has led to a speedup in the range of a factor or 9--10. Note that \verb+details=TRUE+ requires quite an amount of additional memory. If possible, avoid this for larger data sets. The asymptotic computational complexity of \verb+aggExCluster()+ is $\mathcal{O}(l^3)$ (where $l$ is the number of samples or clusters from which the clustering starts). This may result in excessively long computation times if \verb+aggExCluster()+ is used for larger data sets without using affinity propagation first. For real-world data sets, in particular, if they are large, we recommend to use affinity propagation first and then, if necessary, to use \verb+aggExCluster()+ to create a cluster hierarchy. \section{Special Notes for Users Upgrading from Previous Versions} \subsection{Upgrading from a version older than 1.3.0} Version 1.3.0 has brought several fundamental changes to the architecture of the package. We tried to ensure backward compatibility with previous versions where possible. However, there are still some caveats the users should take into account: \begin{itemize} \item The functions \verb+apcluster()+, \verb+apclusterK()+, and \verb+aggExCluster()+ have been re-im\-ple\-ment\-ed as S4 generics, therefore, they do not have a fixed list of arguments anymore. For this reason, users are recommended to name all optional parameters. \item Heatmap plotting has been shifted to the function \verb+heatmap()+ which has now been defined as an S4 generic method. Previous methods for plotting heatmaps using \verb+plot()+ have been partly available in Versions 1.3.0 and 1.3.1. Since Version 1.3.2, they are no longer available. \end{itemize} \subsection{Upgrading to Version 1.3.3 or newer} Users who upgrade to Version 1.3.3 (or newer) from an older version should be aware that the package now requires a newer version of Rcpp. This issue can simply be solved by re-installing Rcpp from CRAN using \verb+install.packages("Rcpp")+. \subsection{Upgrading to Version 1.4.0} The function \verb+sparseToFull()+ has been deprecated. A fully compatible function \verb+as.DenseSimilarityMatrix()+ is available that replaces and extends \verb+sparseToFull()+. \subsection{Upgrading to Version 1.4.9} The function \verb+sparseToFull()+ that has been deprecated since version 1.4.0 has finally been removed completely. From now on, you really must use the function \verb+as.DenseSimilarityMatrix()+ that replaces and extends \verb+sparseToFull()+. Since the dependency to the \KeBABS\ package has been removed, the example file \verb+inst/examples/ch22Promoters.fasta+ has been removed, too. \section{How to Cite This Package} If you use this package for research that is published later, you are kindly asked to cite it as follows: \begin{quotation} \noindent U.\ Bodenhofer, A.\ Kothmeier, and S.\ Hochreiter (2011). APCluster: an R package for affinity propagation clustering. {\em Bioinformatics} {\bf 27}(17):2463--2464. DOI: \href{http://dx.doi.org/10.1093/bioinformatics/btr406}{10.1093/bioinformatics/btr406}. \end{quotation} Moreover, we insist that, any time you cite the package, you also cite the original paper in which affinity propagation has been introduced \cite{FreyDueck07}. To obtain Bib\TeX\ entries of the two references, you can enter the following into your R session: <>= toBibtex(citation("apcluster")) @ \bibliographystyle{plain} \bibliography{apcluster} \end{document}