Type: | Package |
Title: | Randomized Singular Value Decomposition |
Version: | 1.0.5 |
Date: | 2021-04-11 |
Author: | N. Benjamin Erichson [aut, cre] |
Maintainer: | N. Benjamin Erichson <erichson@berkeley.edu> |
Description: | Low-rank matrix decompositions are fundamental tools and widely used for data analysis, dimension reduction, and data compression. Classically, highly accurate deterministic matrix algorithms are used for this task. However, the emergence of large-scale data has severely challenged our computational ability to analyze big data. The concept of randomness has been demonstrated as an effective strategy to quickly produce approximate answers to familiar problems such as the singular value decomposition (SVD). The rsvd package provides several randomized matrix algorithms such as the randomized singular value decomposition (rsvd), randomized principal component analysis (rpca), randomized robust principal component analysis (rrpca), randomized interpolative decomposition (rid), and the randomized CUR decomposition (rcur). In addition several plot functions are provided. |
Depends: | R (≥ 4.0.0) |
Imports: | Matrix |
License: | GPL (≥ 3) |
LazyData: | TRUE |
LazyDataCompression: | xz |
URL: | https://github.com/erichson/rSVD |
BugReports: | https://github.com/erichson/rSVD/issues |
Suggests: | ggplot2, testthat |
RoxygenNote: | 7.1.1 |
NeedsCompilation: | no |
Encoding: | UTF-8 |
Packaged: | 2021-04-14 17:52:09 UTC; ben |
Repository: | CRAN |
Date/Publication: | 2021-04-16 05:40:03 UTC |
Digits
Description
Subsampled MNIST database of handwritten digits. This smaller dataset has 3000 samples for each of the digits corresponding to the class labels 0,1,2,3. Each 28x28 image patch is stored as a flattened row vector.
Usage
data('digits')
Format
An object of class rsvd
.
References
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11):2278-2324, November 1998.
Examples
## Not run:
library('rsvd')
data('digits')
#Display first digit
digit <- matrix(digits[1,], nrow = 28, ncol = 28)
image(digit[,28:1], col = gray(255:0 / 255))
## End(Not run)
Biplot for rpca
using ggplot
.
Description
Creates a pretty biplot which is showing the individual factor map overlayed by the variables factor map, i.e, plotting both the principal component scores and directions.
Usage
ggbiplot(
rpcaObj,
pcs = c(1, 2),
loadings = TRUE,
groups = NULL,
alpha = 0.6,
ellipse = TRUE,
alpha.ellipse = 0.2,
var_labels = TRUE,
var_labels.names = NULL,
ind_labels = TRUE,
ind_labels.names = NULL
)
Arguments
rpcaObj |
Object returned by the |
pcs |
Array_like. |
loadings |
Bool ( |
groups |
Factor, optional. |
alpha |
Scalar, optional. |
ellipse |
Bool ( |
alpha.ellipse |
Scalar, optional. |
var_labels |
Bool ( |
var_labels.names |
Array_like, optional. |
ind_labels |
Bool ( |
ind_labels.names |
Array_like, optional. |
Author(s)
N. Benjamin Erichson, erichson@berkeley.edu
See Also
Examples
#See ?rpca
Variables factor map for rpca
using ggplot
.
Description
Creates a pretty plot which is showing the correlation of the original variable with the principal component (PCs).
Usage
ggcorplot(
rpcaObj,
pcs = c(1, 2),
loadings = TRUE,
var_labels = FALSE,
var_labels.names = NULL,
alpha = 1,
top.n = NULL
)
Arguments
rpcaObj |
Object returned by the |
pcs |
Array_like. |
loadings |
Bool ( |
var_labels |
Bool ( |
var_labels.names |
Array_like, optional. |
alpha |
Scalar, optional. |
top.n |
Scalar, optional. |
Author(s)
N. Benjamin Erichson, erichson@berkeley.edu
See Also
Examples
#
Individual factor map for rpca
using ggplot
.
Description
Creates a pretty plot which is showing the individual factor map, i.e, plotting the principal component scores.
Usage
ggindplot(
rpcaObj,
pcs = c(1, 2),
groups = NULL,
alpha = 0.6,
ellipse = TRUE,
alpha.ellipse = 0.2,
ind_labels = TRUE,
ind_labels.names = NULL
)
Arguments
rpcaObj |
Object returned by the |
pcs |
Array_like. |
groups |
Factor, optional. |
alpha |
Scalar, optional. |
ellipse |
Bool ( |
alpha.ellipse |
Scalar, optional. |
ind_labels |
Bool ( |
ind_labels.names |
Array_like, optional. |
Author(s)
N. Benjamin Erichson, erichson@berkeley.edu
See Also
Examples
#See ?rpca
Pretty Screeplot
Description
Creates a pretty screeplpot using ggplot
. By default the explained variance is plotted
agaings the number of the principal component.
Alternatively the explained variance ratio, the cumulative
explained variance ratio, or the eigenvalues can be plotted.
Usage
ggscreeplot(rpcaObj, type = c("var", "ratio", "cum", "eigenvals"))
Arguments
rpcaObj |
Object returned by the |
type |
String c('var', 'ratio', 'cum', 'eigenvals'), optional. |
Author(s)
N. Benjamin Erichson, erichson@berkeley.edu
See Also
Examples
#
Screeplot
Description
Creates a screeplot, variables and individual factor maps to
summarize the results of the rpca
function.
Usage
## S3 method for class 'rpca'
plot(x, ...)
Arguments
x |
Object returned by the |
... |
Additional arguments passed to the individual plot functions (see below). |
See Also
ggscreeplot
, ggcorplot
, ggindplot
Examples
#
Randomized CUR matrix decomposition.
Description
Randomized CUR matrix decomposition.
Usage
rcur(A, k = NULL, p = 10, q = 0, idx_only = FALSE, rand = TRUE)
Arguments
A |
array_like; |
k |
integer; |
p |
integer, optional; |
q |
integer, optional; |
idx_only |
bool, optional; |
rand |
bool, optional; |
Details
Algorithm for computing the CUR matrix decomposition of a rectangular (m, n)
matrix A
,
with target rank k << min(m,n)
. The input matrix is factored as
A = C * U * R
using the rid
decomposition. The factor matrix C
is formed using actual
columns of A
, also called the partial column skeleton. The factor matrix R
is formed
using actual rows of A
, also called the partial row skeleton.
If rand=TRUE
a probabilistic strategy is used to compute the decomposition, otherwise a
deterministic algorithm is used.
Value
rcur
returns a list with class id
containing the following components:
- C
array_like;
column subsetC = A[,C.idx]
;(m, k)
dimensional array.- R
Array_like.
row subsetR = A[R.idx, ]
;(k, n)
dimensional array.- U
array_like;
connector matrix;(k,k)
dimensional array.- C.idx
array_like;
index set of thek
selected columns used to formC
.- R.idx
array_like;
index set of thek
selected rows used to formR
.- C.scores
array_like;
scores of the selected columns.- R.scores
array_like;
scores of the selected rows.
Author(s)
N. Benjamin Erichson, erichson@berkeley.edu
References
[1] N. B. Erichson, S. Voronin, S. L. Brunton and J. N. Kutz. 2019. Randomized Matrix Decompositions Using R. Journal of Statistical Software, 89(11), 1-48. doi: 10.18637/jss.v089.i11.
[2] N. Halko, P. Martinsson, and J. Tropp. "Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions" (2009). (available at arXiv https://arxiv.org/abs/0909.4061).
See Also
Examples
## Not run:
# Load test image
data('tiger')
# Compute (column) randomized interpolative decompsition
# Note that the image needs to be transposed for correct plotting
out <- rcur(tiger, k = 150)
# Reconstruct image
tiger.re <- out$C %*% out$U %*% out$R
# Compute relative error
print(norm(tiger-tiger.re, 'F') / norm(tiger, 'F'))
# Plot approximated image
image(tiger.re, col = gray((0:255)/255))
## End(Not run)
Randomized interpolative decomposition (ID).
Description
Randomized interpolative decomposition.
Usage
rid(A, k = NULL, mode = "column", p = 10, q = 0, idx_only = FALSE, rand = TRUE)
Arguments
A |
array_like; |
k |
integer, optional; |
mode |
string c('column', 'row'), optional; |
p |
integer, optional; |
q |
integer, optional. |
idx_only |
bool, optional; |
rand |
bool, optional; |
Details
Algorithm for computing the ID of a rectangular (m, n)
matrix A
, with target rank
k << min(m,n)
. The input matrix is factored as
A = C * Z
using the column pivoted QR decomposition. The factor matrix C
is formed as a subset of
columns of A
, also called the partial column skeleton.
If mode='row'
, then the input matrix is factored as
A = Z * R
using the row pivoted QR decomposition. The factor matrix R
is now formed as
a subset of rows of A
, also called the partial row skeleton.
The factor matrix Z
contains a (k, k)
identity matrix as a submatrix,
and is well-conditioned.
If rand='TRUE'
a probabilistic strategy is used to compute the decomposition, otherwise a
deterministic algorithm is used.
Value
rid
returns a list containing the following components:
- C
array_like;
column subsetC = A[,idx]
, ifmode='column'
; array with dimensions(m, k)
.- R
array_like;
row subsetR = A[idx, ]
, ifmode='row'
; array with dimensions(k, n)
.- Z
array_like;
well conditioned matrix; Depending on the selected mode, this is an array with dimensions(k,n)
or(m,k)
.- idx
array_like;
index set of thek
selected columns or rows used to formC
orR
.- pivot
array_like;
information on the pivoting strategy used during the decomposition.- scores
array_like;
scores of the columns or rows of the input matrixA
.- scores.idx
array_like;
scores of thek
selected columns or rows inC
orR
.
Author(s)
N. Benjamin Erichson, erichson@uw.edu
References
[1] N. Halko, P. Martinsson, and J. Tropp. "Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions" (2009). (available at arXiv https://arxiv.org/abs/0909.4061).
[2] N. B. Erichson, S. Voronin, S. L. Brunton and J. N. Kutz. 2019. Randomized Matrix Decompositions Using R. Journal of Statistical Software, 89(11), 1-48. doi: 10.18637/jss.v089.i11.
See Also
rcur
,
Examples
## Not run:
# Load test image
data("tiger")
# Compute (column) randomized interpolative decompsition
# Note that the image needs to be transposed for correct plotting
out <- rid(t(tiger), k = 150)
# Show selected columns
tiger.partial <- matrix(0, 1200, 1600)
tiger.partial[,out$idx] <- t(tiger)[,out$idx]
image(t(tiger.partial), col = gray((0:255)/255), useRaster = TRUE)
# Reconstruct image
tiger.re <- t(out$C %*% out$Z)
# Compute relative error
print(norm(tiger-tiger.re, 'F') / norm(tiger, 'F'))
# Plot approximated image
image(tiger.re, col = gray((0:255)/255))
## End(Not run)
Randomized principal component analysis (rpca).
Description
Fast computation of the principal components analysis using the randomized singular value decomposition.
Usage
rpca(
A,
k = NULL,
center = TRUE,
scale = TRUE,
retx = TRUE,
p = 10,
q = 2,
rand = TRUE
)
Arguments
A |
array_like; |
k |
integer; |
center |
bool, optional; |
scale |
bool, optional; |
retx |
bool, optional; |
p |
integer, optional; |
q |
integer, optional; |
rand |
bool, optional; |
Details
Principal component analysis is an important linear dimension reduction technique.
Randomized PCA is computed via the randomized SVD algorithm (rsvd
).
The computational gain is substantial, if the desired number of principal components
is relatively small, i.e. k << min(m,n)
.
The print and summary method can be used to present the results in a nice format.
A scree plot can be produced with ggscreeplot
.
The individuals factor map can be produced with ggindplot
,
and a correlation plot with ggcorplot
.
The predict function can be used to compute the scores of new observations. The data will automatically be centered (and scaled if requested). This is not fully supported for complex input matrices.
Value
rpca
returns a list with class rpca
containing the following components:
- rotation
array_like;
the rotation (eigenvectors);(n, k)
dimensional array.- eigvals
array_like;
eigenvalues;k
dimensional vector.- sdev
array_like;
standard deviations of the principal components;k
dimensional vector.- x
array_like;
the scores / rotated data;(m, k)
dimensional array.- center, scale
array_like;
the centering and scaling used.
Note
The principal components are not unique and only defined up to sign (a constant of modulus one in the complex case) and so may differ between different PCA implementations.
Similar to prcomp
the variances are computed with the usual divisor N - 1.
Author(s)
N. Benjamin Erichson, erichson@berkeley.edu
References
[1] N. B. Erichson, S. Voronin, S. L. Brunton and J. N. Kutz. 2019. Randomized Matrix Decompositions Using R. Journal of Statistical Software, 89(11), 1-48. doi: 10.18637/jss.v089.i11.
[2] N. Halko, P. Martinsson, and J. Tropp. "Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions" (2009). (available at arXiv https://arxiv.org/abs/0909.4061).
See Also
ggscreeplot
, ggindplot
,
ggcorplot
, plot.rpca
,
predict
, rsvd
Examples
library('rsvd')
#
# Load Edgar Anderson's Iris Data
#
data('iris')
#
# log transform
#
log.iris <- log( iris[ , 1:4] )
iris.species <- iris[ , 5]
#
# Perform rPCA and compute only the first two PCs
#
iris.rpca <- rpca(log.iris, k=2)
summary(iris.rpca) # Summary
print(iris.rpca) # Prints the rotations
#
# Use rPCA to compute all PCs, similar to \code{\link{prcomp}}
#
iris.rpca <- rpca(log.iris)
summary(iris.rpca) # Summary
print(iris.rpca) # Prints the rotations
plot(iris.rpca) # Produce screeplot, variable and individuls factor maps.
Randomized QB Decomposition (rqb).
Description
Compute the near-optimal QB decomposition of a rectangular matrix.
Usage
rqb(A, k = NULL, p = 10, q = 2, sdist = "normal", rand = TRUE)
Arguments
A |
array_like; |
k |
integer, optional; |
p |
integer, optional; |
q |
integer, optional; |
sdist |
string |
rand |
bool, optional; |
Details
The randomized QB decomposition factors a rectangular (m,n)
matrix A
as
A = Q * B
. Q
is an (m,k)
matrix with orthogonal columns, and B
a (k,n)
matrix.
The target rank is assumed to be k << min(m,n)
.
p
is an oversampling parameter to improve the approximation.
A value between 5 and 10 is recommended, and p=10
is set by default.
The parameter q
specifies the number of power (subspace) iterations
to reduce the approximation error. This is recommended
if the the singular values decay slowly. In practice 1 or 2 iterations
achieve good results, however, computing power iterations increases the
computational time. The number of power iterations is set to q=2
by default.
Value
rqb
returns a list containing the following components:
- Q
array_like;
matrix with orthogonal columns;(m, k)
dimensional array.- B
array_like;
smaller matrix;(k, n)
dimensional array.
Author(s)
N. Benjamin Erichson, erichson@berkeley.edu
References
[1] N. B. Erichson, S. Voronin, S. L. Brunton and J. N. Kutz. 2019. Randomized Matrix Decompositions Using R. Journal of Statistical Software, 89(11), 1-48. doi: 10.18637/jss.v089.i11.
[2] N. Halko, P. Martinsson, and J. Tropp. "Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions" (2009). (available at arXiv https://arxiv.org/abs/0909.4061).
See Also
Randomized robust principal component analysis (rrpca).
Description
Robust principal components analysis separates a matrix into a low-rank plus sparse component.
Usage
rrpca(
A,
lambda = NULL,
maxiter = 50,
tol = 1e-05,
p = 10,
q = 2,
trace = FALSE,
rand = TRUE
)
Arguments
A |
array_like; |
lambda |
scalar, optional; |
maxiter |
integer, optional; |
tol |
scalar, optional; |
p |
integer, optional; |
q |
integer, optional; |
trace |
bool, optional; |
rand |
bool, optional; |
Details
Robust principal component analysis (RPCA) is a method for the robust seperation of a
a rectangular (m,n)
matrix A
into a low-rank component L
and a
sparse comonent S
:
A = L + S
To decompose the matrix, we use the inexact augmented Lagrange multiplier method (IALM). The algorithm can be used in combination with either the randomized or deterministic SVD.
Value
rrpca
returns a list containing the following components:
- L
array_like;
low-rank component;(m, n)
dimensional array.- S
array_like
sparse component;(m, n)
dimensional array.
Author(s)
N. Benjamin Erichson, erichson@berkeley.edu
References
[1] N. B. Erichson, S. Voronin, S. L. Brunton and J. N. Kutz. 2019. Randomized Matrix Decompositions Using R. Journal of Statistical Software, 89(11), 1-48. doi: 10.18637/jss.v089.i11.
[2] Lin, Zhouchen, Minming Chen, and Yi Ma. "The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices." (2010). (available at arXiv https://arxiv.org/abs/1009.5055).
Examples
library('rsvd')
# Create toy video
# background frame
xy <- seq(-50, 50, length.out=100)
mgrid <- list( x=outer(xy*0,xy,FUN="+"), y=outer(xy,xy*0,FUN="+") )
bg <- 0.1*exp(sin(-mgrid$x**2-mgrid$y**2))
toyVideo <- matrix(rep(c(bg), 100), 100*100, 100)
# add moving object
for(i in 1:90) {
mobject <- matrix(0, 100, 100)
mobject[i:(10+i), 45:55] <- 0.2
toyVideo[,i] = toyVideo[,i] + c( mobject )
}
# Foreground/Background separation
out <- rrpca(toyVideo, trace=TRUE)
# Display results of the seperation for the 10th frame
par(mfrow=c(1,4))
image(matrix(bg, ncol=100, nrow=100)) #true background
image(matrix(toyVideo[,10], ncol=100, nrow=100)) # frame
image(matrix(out$L[,10], ncol=100, nrow=100)) # seperated background
image(matrix(out$S[,10], ncol=100, nrow=100)) #seperated foreground
Randomized Singular Value Decomposition (rsvd).
Description
The randomized SVD computes the near-optimal low-rank approximation of a rectangular matrix using a fast probablistic algorithm.
Usage
rsvd(A, k = NULL, nu = NULL, nv = NULL, p = 10, q = 2, sdist = "normal")
Arguments
A |
array_like; |
k |
integer; |
nu |
integer, optional; |
nv |
integer, optional; |
p |
integer, optional; |
q |
integer, optional; |
sdist |
string |
Details
The singular value decomposition (SVD) plays an important role in data analysis, and scientific computing.
Given a rectangular (m,n)
matrix A
, and a target rank k << min(m,n)
,
the SVD factors the input matrix A
as
A = U_{k} diag(d_{k}) V_{k}^\top
The k
left singular vectors are the columns of the
real or complex unitary matrix U
. The k
right singular vectors are the columns
of the real or complex unitary matrix V
. The k
dominant singular values are the
entries of d
, and non-negative and real numbers.
p
is an oversampling parameter to improve the approximation.
A value of at least 10 is recommended, and p=10
is set by default.
The parameter q
specifies the number of power (subspace) iterations
to reduce the approximation error. The power scheme is recommended,
if the singular values decay slowly. In practice, 2 or 3 iterations
achieve good results, however, computing power iterations increases the
computational costs. The power scheme is set to q=2
by default.
If k > (min(n,m)/4)
, a deterministic partial or truncated svd
algorithm might be faster.
Value
rsvd
returns a list containing the following three components:
- d
array_like;
singular values; vector of length(k)
.- u
array_like;
left singular vectors;(m, k)
or(m, nu)
dimensional array.- v
array_like;
right singular vectors;(n, k)
or(n, nv)
dimensional array.
Note
The singular vectors are not unique and only defined up to sign (a constant of modulus one in the complex case). If a left singular vector has its sign changed, changing the sign of the corresponding right vector gives an equivalent decomposition.
Author(s)
N. Benjamin Erichson, erichson@berkeley.edu
References
[1] N. B. Erichson, S. Voronin, S. L. Brunton and J. N. Kutz. 2019. Randomized Matrix Decompositions Using R. Journal of Statistical Software, 89(11), 1-48. doi: 10.18637/jss.v089.i11.
[2] N. Halko, P. Martinsson, and J. Tropp. "Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions" (2009). (available at arXiv https://arxiv.org/abs/0909.4061).
See Also
Examples
library('rsvd')
# Create a n x n Hilbert matrix of order n,
# with entries H[i,j] = 1 / (i + j + 1).
hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, "+") }
H <- hilbert(n=50)
# Low-rank (k=10) matrix approximation using rsvd
k=10
s <- rsvd(H, k=k)
Hre <- s$u %*% diag(s$d) %*% t(s$v) # matrix approximation
print(100 * norm( H - Hre, 'F') / norm( H,'F')) # percentage error
# Compare to truncated base svd
s <- svd(H)
Hre <- s$u[,1:k] %*% diag(s$d[1:k]) %*% t(s$v[,1:k]) # matrix approximation
print(100 * norm( H - Hre, 'F') / norm( H,'F')) # percentage error
Tiger
Description
1600x1200 grayscaled (8 bit [0-255]/255) image.
Usage
data('tiger')
Format
An object of class rsvd
.
Source
References
S. Taheri (2006). "Panthera tigris altaica", (Online image)
Examples
## Not run:
library('rsvd')
data('tiger')
#Display image
image(tiger, col = gray((0:255)/255))
## End(Not run)