% -*- mode: noweb; noweb-default-code-mode: R-mode; -*- \documentclass[nojss]{jss} \usepackage{dsfont} \usepackage{bbm} \usepackage{amsfonts} \usepackage{wasysym} \usepackage{amssymb} %\title{Programmers' Niche: Multivariate polynomials in \R} %\subtitle{The \pkg{multipol} package} %\author{Robin K. S. Hankin} %\maketitle \author{Robin K. S. Hankin\\Auckland University of Technology} \title{Multivariate polynomials in \proglang{R}} %\VignetteIndexEntry{A vignette for the multipol package} %% for pretty printing and a nice hypersummary also set: \Plainauthor{Robin K. S. Hankin} \Plaintitle{Multivariate polynomials in R} \Shorttitle{Multivariate polynomials in \proglang{R}} %% an abstract and keywords \Abstract{ In this short article I introduce the \pkg{multipol} package, which provides some functionality for handling multivariate polynomials; the package is discussed here from a programming perspective. An example from the field of enumerative combinatorics is presented. The package is almost completely superceded by the \pkg{mvp} and \pkg{spray} packages~\citep{hankin2022_mvp,hankin2022_spray} which use a sparse array technique and follow \pkg{disordR} discipline~\citep{hankin2022_disordR}. This vignette is based on~\citet{RNews:Hankin:2008}; the discussion of sparsity is unchanged from 2008. } \Keywords{Multivariate polynomials, \proglang{R}} \Plainkeywords{Multivariate polynomials, R} %% publication information %% NOTE: This needs to filled out ONLY IF THE PAPER WAS ACCEPTED. %% If it was not (yet) accepted, leave them commented. %% \Volume{13} %% \Issue{9} %% \Month{September} %% \Year{2004} %% \Submitdate{2004-09-29} %% \Acceptdate{2004-09-29} %% The address of (at least) one author should be given %% in the following format: \Address{ Robin K. S. Hankin\\ Auckland University of Technology\\ New Zealand } %% It is also possible to add a telephone and fax number %% before the e-mail in the following format: %% Telephone: +43/1/31336-5053 %% Fax: +43/1/31336-734 %% for those who use Sweave please include the following line (with % symbols): %% need no \usepackage{Sweave.sty} %% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% <>= ignore <- require(multipol) ignore <- require(polynom) @ \SweaveOpts{} \begin{document} \section{Univariate polynomials} A {\em polynomial} is an algebraic expression of the form $\sum_{i=0}^na_ix^i$ where the~$a_i$ are real or complex numbers and~$n$ (the {\em degree} of the polynomial) is a nonnegative integer. A polynomial may be viewed in three distinct ways: \begin{itemize} \item Polynomials are interesting and instructive examples of complete functions: they map $\mathbbm{C}$ (the complex numbers) to~$\mathbbm{C}$. \item Polynomials are a map from the positive integers to~$\mathbbm{C}$: this is~$f(n)=a_n$ and one demands that~$\exists n_0$ with~$n\geqslant n_0\longrightarrow f(n)=0$. Relaxation of the final clause results in a {\em generating function} which is useful in combinatorics. \item Polynomials with complex coefficients form an algebraic object known as a {\em ring}: polynomial multiplication is associative and distributive with respect to addition; $(ab)c=a(bc)$ and $a(b+c)=ab+ac$. \end{itemize} A {\em multivariate polynomial} is a generalization of a polynomial to expressions of the form~$\sum a_{i_1i_2\ldots i_d}\prod_{j=1}^d x_j^{i_j}$. The three characterizations of polynomials above generalize to the multivariate case, but note that the algebraic structure is more general. In the context of \proglang{R} programming, the first two points typically dominate. Viewing a polynomial as a function is certainly second nature to the current readership and unlikely to yield new insight. But generating functions are also interesting and useful applications of polynomials~\citep{wilf1994} which may be less familiar and here I discuss an example from the discipline of integer partitions~\citep{andrews1998}. A {\em partition of an integer}~$n$ is a non-increasing sequence of positive integers~$p_1,p_2,\ldots,p_r$ such that~$n=\sum_{i=1}^rp_i$~\citep{hankin2006}. How many distinct partitions does~$n$ have? The answer is the coefficient of~$x^n$ in \[ \prod_{i=1}^n \frac{1}{1-x^i} \] (observe that we may truncate the Taylor expansion of~$1/(1-x^j)$ to terms not exceeding~$x^n$; thus the problem {\em is} within the domain of polynomials as infinite sequences of coefficients are not required). Here, as in many applications of generating functions, one uses the mechanism of polynomial multiplication as a bookkeeping device to keep track of the possibilities. The \proglang{R} idiom used in the \pkg{polynom} package is a spectacularly efficient method for doing so. Multivariate polynomials generalize the concept of generating function, but in this case the functions are from $n$-tuples of nonnegative integers to~$\mathbbm{C}$. An example is given in the appendix below. \subsection[The polynom package]{The \pkg{polynom} package} The \pkg{polynom} package~\citep{venables2007} is a consistent and convenient suite of software for manipulating polynomials. This package was originally written in~1993 and is used by~\citet{venables2001} as an example of \proglang{S3} classes. The following \proglang{R} code shows the \pkg{polynom} package in use; the examples are then generalized to the multivariate case using the \pkg{multipol} package. <>= require(polynom) (p <- polynomial(c(1,0,0,3,4))) str(p) @ See how a polynomial is represented as a vector of coefficients with \code{p[i]} holding the coefficient of $x^{i-1}$; note the off-by-one issue. Observe the natural print method which suppresses the zero entries---but the internal representation requires all coefficients so a length~5 vector is needed to store the object. Polynomials may be multiplied and added: <>= p + polynomial(1:2) p*p @ Note the overloading of `+' and `*': polynomial addition and multiplication are executed using the natural syntax on the command line. Observe that the addition is not entirely straightforward: the shorter polynomial must be padded with zeroes. A polynomial may be viewed either as an object, or a function. Coercing a polynomial to a function is straightforward: <>= f1 <- as.function(p) f1(pi) f1(matrix(1:6,2,3)) @ Note the effortless and transparent vectorization of \code{f1()}. \section{Multivariate polynomials} There exist several methods by which polynomials may be generalized to multipols. To this author, the most natural is to consider an array of coefficients; the dimensionality of the array corresponds to the arity of the multipol. However, other methods suggest themselves and a brief discussion is given at the end. Much of the univariate polynomial functionality presented above is directly applicable to multivariate polynomials. <>= options("showchars" = FALSE) @ <>= require(multipol) (a <- as.multipol(matrix(1:10,nrow=2))) @ See how a multipol is actually an array, with one extent per variable present, in this case 2, although the package is capable of manipulating polynomials of arbitrary arity. Multipol addition is a slight generalization of the univariate case: <>= b <- as.multipol(matrix(1:10,ncol=2)) a+b @ In the multivariate case, the zero padding must be done in each array extent; the natural command-line syntax is achieved by defining an appropriate \code{Ops.multipol()} function to overload the arithmetic operators. \subsection{Multivariate polynomial multiplication} The heart of the package is multipol multiplication: <>= a * b @ Multivariate polynomial multiplication is considerably more involved than in the univariate case. Consider the coefficient of~$x^2y^2$ in the product. This is \[ \begin{array}{l} \begin{array}{l} \hphantom{{}+{}}C_a\left(x^2y^2\right) C_b\left(1\right) + C_a\left(xy^2 \right) C_b\left(x\right) + C_a\left(y^2 \right) C_b\left(x^2\right) \\ {}+C_a\left(x^2y \right) C_b\left(y\right) + C_a\left(xy \right) C_b\left(xy\right) + C_a\left(y\right) C_b\left(x^2y\right) \vphantom{h^{h^h}}\\ {}+C_a\left(x^2\right) C_b\left(y^2\right) + C_a\left(x\right) C_b\left(xy^2\right) + C_a\left(1\right) C_b\left(x^2y^2\right)\vphantom{h^{h^h}} \end{array}\\ {}= \hphantom{{}+{}} 0\cdot 1 + 6\cdot 2 + 5\cdot 3\\ \hphantom{{}={}} + 0\cdot 6 + 4\cdot 7 + 3\cdot 8\\ \hphantom{{}={}} + 0\cdot 0 + 2\cdot 0 + 1\cdot 0\\ {}= 79, \end{array} \] where ``$C_a\left(x^my^n\right)$'' means the coefficient of $x^my^n$ in polynomial \code{a}. It should be clear that large multipols involve more terms and a typical example is given later in the paper. \subsubsection[Multivariate polynomial multiplication in multipol]{Multivariate polynomial multiplication in \pkg{multipol}} The appropriate \proglang{R} idiom is to follow the above prose description in a vectorized manner; the following extract from \code{mprod()} is very slightly edited in the interests of clarity. First we define a matrix, \code{index}, whose rows are the array indices of the product: \begin{Schunk} \begin{Sinput} outDims <- dim(a)+dim(b)-1 \end{Sinput} \begin{quote}\it Here \code{outDims} is the dimensions of the product. Note again the off-by-one issue: the package uses array indices internally, while the user consistently indexes by variable power. \end{quote} \begin{Sinput} index <- expand.grid(lapply(outDims,seq_len)) \end{Sinput} \begin{quote}\it Each row of matrix \code{index} is thus an array index for the product. The next step is to define a convenience function \code{f()}, whose argument \code{u} is a row of \code{index}, that returns the entry in the multipol product: \end{quote} \begin{Sinput} f <- function(u){ jja <- expand.grid(lapply(u,function(i)0:(i-1))) jjb <- -sweep(jja, 2, u)-1 \end{Sinput} \begin{quote}\begin{quote}\it So \code{jja} is the (power) index of \code{a}, and the rows of \code{jjb} added to those of \code{jja} give \code{u}, which is the power index of the returned array. Now not all rows of \code{jja} and \code{jjb} correspond to extant elements of \code{a} and \code{b} respectively; so define a Boolean variable \code{wanted} that selects just the appropriate rows: \end{quote}\end{quote} \begin{Sinput} wanted <- apply(jja,1,function(x)all(x < dim(a))) & apply(jjb,1,function(x)all(x < dim(b))) & apply(jjb,1,function(x)all(x >= 0)) \end{Sinput} \begin{quote}\begin{quote}\it Thus element \code{n} of \code{wanted} is \code{TRUE} only if the \code{n}th row of both \code{jja} and \code{jjb} correspond to a legal element of \code{a} and \code{b} respectively. Now perform the addition by summing the products of the legal elements: \end{quote}\end{quote} \begin{Sinput} sum(a[1+jja[wanted,]] * b[1+jjb[wanted,]]) } \end{Sinput} \begin{quote}\it Thus function \code{f()} returns the coefficient, which is the sum of products of pairs of legal elements of \code{a} and \code{b}. Again observe the off-by-one issue. Now \code{apply()} function \code{f()} to the rows of \code{index} and reshape: \end{quote} \begin{Sinput} out <- apply(index,1,f) dim(out) <- outDims \end{Sinput} \begin{quote}{\it Thus array \code{out} contains the multivariate polynomial product of \code{a} and \code{b}.} \end{quote} \end{Schunk} The preceding code shows how multivariate polynomials may be multiplied. The implementation makes no assumptions about the entries of \code{a} or \code{b} and the coefficients of the product are summed over all possibilities; opportunities to streamline the procedure are discussed below. \subsection{Multipols as functions} Polynomials are implicitly functions of one variable; multivariate polynomials are functions too, but of more than one argument. Coercion of a multipol to a function is straightforward: <>= f2 <- as.function(a*b) @ <>= f2(c(x=1,y=3i)) @ It is worth noting the seamless integration between \pkg{polynom} and \pkg{multipol} in this regard: \code{f1(a)} is a multipol [recall that \code{f1()} is a function coerced from a univariate polynomial]. \subsection{Multipol extraction and replacement} One often needs to extract or replace parts of a multipol. The package includes extraction and replacement methods but, partly because of the off-by-one issue, these are not straightforward. Consider the case where one has a multipol and wishes to extract the terms of order zero ane one: <>= a[0:1,0:1] @ Note how the off-by-one issue is handled: \code{a[i,j]} is the coefficient of $x^iy^j$ (here the constant and first-order terms); the code is due to~\citet{rougier2007}. Replacement is slightly different: <>= a[0,0] <- -99 a @ Observe how replacement operators---unlike extraction operators---return a multipol; this allows expeditious modification of multivariate polynomials. The reason that the extraction operator returns an array rather than a multipol is that the extracted object often does not have unambiguous interpretation as a multipol (consider \code{a[-1,-1]}, for example). It seems to this author that the loss of elegance arising from the asymmetry between extraction and replacement is amply offset by the impossibility of an extracted object's representation as a multipol being undesired---unless the user explicitly coerces. \section{The elephant in the room} Representing a multivariate polynomial by an array is a natural and efficient method, but suffers some disadvantages. Consider Euler's four-square identity \begin{eqnarray*} \left(a_1^2+a_2^2+a_3^2+a_4^2\right)\cdot \left(b_1^2+b_2^2+b_3^2+b_4^2\right)=\\ \left(a_1b_1 - a_2b_2 - a_3b_3 - a_4b_4\right)^2+\\ \left(a_1b_2 + a_2b_1 + a_3b_4 - a_4b_3\right)^2+\\ \left(a_1b_3 - a_2b_4 + a_3b_1 + a_4b_2\right)^2+\\ \left(a_1b_4 + a_2b_3 - a_3b_2 + a_4b_1\right)^2\hphantom{+} \end{eqnarray*} \noindent which was discussed in~\citeyear{euler1749} in a letter from~\citeauthor{euler1749} to Goldbach. The identity is important in number theory, and may be proved straightforwardly by direct expansion\footnote{Or indeed more elegantly by observing that both sides of the identity express the absolute value of the product of two quaternions: $\left|a\right|^2\left|b\right|^2=\left|ab\right|^2$. With the \pkg{onion} package~\citep{Rnews:Hankin:a:2006}, one would define \code{f <- function(a,b){Norm(a)*Norm(b) - Norm(a*b)}} and observe (for example) that \code{f(rquat(rand="norm"),rquat(rand="norm"))} is zero to machine precision.}. It may by verified to machine precision using the \pkg{multipol} package; the left hand side is given by: \begin{Schunk} \begin{Sinput} > options("showchars" = TRUE) > lhs <- polyprod(ones(4,2),ones(4,2)) \end{Sinput} \begin{Soutput} [1] "1*x1^2*x5^2 + 1*x2^2*x5^2 + ... \end{Soutput} \end{Schunk} (the right hand side's idiom is more involved), but this relatively trivial expansion requires about~20 minutes on my $1.5\,\rm{GHz}$~G4; the product comprises~$3^8=6561$ elements, of which only~16 are nonzero. Note the \code{options()} statement controlling the format of the output which causes the result to be printed in a more appropriate form. Clearly the \pkg{multipol} package as currently implemented is inefficient for multivariate problems of this nature in which the arrays possess few nonzero elements. \subsubsection{A challenge} The inefficiency discussed above is ultimately due to the storage and manipulation of many zero coefficients that may be omitted from a calculation. Multivariate polynomials for which this is an issue appear to be common: the package includes many functions---such as \code{uni()}, \code{single()}, and \code{lone()}---that define useful multipols in which the number of nonzero elements is very small. In this section, I discuss some ideas for implementations in which zero operations are implicitly excluded. These ideas are presented in the spirit of a request for comments: although they seem to this author to be reasonable methodologies, readers are invited to discuss the ideas presented here and indeed to suggest alternative strategies. The canonical solution would be to employ some form of sparse array class, along the lines of Mathematica's \code{SparseArray}. Unfortunately, no such functionality exists as of~\citeyear{venables2008}, but \proglang{C++} includes a ``map'' class~\citep{stroustrup} that would be ideally suited to this application. There are other paradigms that may be worth exploring. It is possible to consider a multivariate polynomial of arity~$d$ (call this an object of class~$P^d$) as being a univariate polynomial whose coefficients are of class~$P^{d-1}$---class~$P^0$ would be a real or complex number---but such recursive class definitions appear not to be possible with the current implementation of~\proglang{S3} or \proglang{S4}~\citep{venables2008}. Recent experimental work by~\citet{west2008} exhibits a proof-of-concept in \proglang{C++} which might form the back end of an \proglang{R} implementation. Euler's identity appears to be a particularly favourable example and is proved essentially instantaneously. \section{Conclusions} This short document introduces the \pkg{multipol} package that provides functionality for manipulating multivariate polynomials. The \pkg{multipol} package builds on and generalizes the \pkg{polynom} package of~\citeauthor{venables2007}, which is restricted to the case of univariate polynomials. The generalization is not straightforward and presents a number of programming issues that were discussed. One overriding issue is that of performance: many multivariate polynomials of interest are ``sparse'' in the sense that they have many zero entries that unnecessarily consume storage and processing resources. Several possible solutions are suggested, in the form of a request for comments. The canonical method appears to be some form of sparse array, for which the ``map'' class of the \proglang{C++} language is ideally suited. Implementation of such functionality in \proglang{R} might well find application in fields other than multivariate polynomials. \section{An example} This appendix presents a brief technical example of multivariate polynomials in use in the field of enumerative combinatorics~\citep{good1976}. Suppose one wishes to determine how many contingency tables, with non-negative integer entries, have specified row and column marginal totals. The appropriate generating function is \[ \prod_{1\leqslant i\leqslant\mathit{nr}} \prod_{1\leqslant j\leqslant\mathit{nc}}\frac{1}{1-x_iy_j} \] \noindent where the table has~$nr$ rows and~$nc$ columns (the number of contingency tables is given by the coefficient of~$x_1^{s_1}x_2^{s_2}\cdots x_{r}^{s_r}\cdot y_1^{t_1}y_2^{t_2}\cdots y_{t}^{t_c}$ where the~$s_i$ and~$t_i$ are the row- and column- sums respectively). The \proglang{R} idiom for the generating function \code{gf} in the case of~$nr=nc=n=3$ is: \begin{Schunk} \begin{Sinput} n <- 3 jj <- as.matrix(expand.grid(seq_len(n),n+seq_len(n))) f <- function(i) ooom(n,lone(2*n,jj[i,]),maxorder=n) u <- c(sapply(seq_len(n^2),f,simplify=FALSE)) gf <- do.call("mprod", u) \end{Sinput} \end{Schunk} [here function \code{ooom()} is ``one-over-one-minus''; and \code{mprod()} is the function name for multipol product]. In this case, it is clear that sparse array functionality would not result in better performance, as almost every element of the generating function \code{gf} is nonzero. Observe that the maximum of \code{gf}, \Sexpr{55}, is consistent with~\citet{sloane2008}. \subsubsection*{Acknowledgements} I would like to acknowledge the many stimulating comments made by the \proglang{R}-help list. In particular, the insightful comments from Bill Venables and Kurt Hornik were extremely helpful. \bibliography{multipol} \end{document}