msPCA
An R Package for Sparse PCA with Multiple Principal Components
Installation
This package can be installed from CRAN directly (pending CRAN
registration):
install.packages("msPCA")
Alternatively, it can be installed from this Github repository using
the devtools package. You would first need to install
devtools:
install.packages("devtools")
and then run the following commands:
library(devtools)
install_github('jeanpauphilet/msPCA')
Getting started
The package consists of one main function, msPCA, which
takes as input: - a data matrix (either the correlation or covariance
matrix of the dataset), - the number of principal components (PCs) to be
computed, r, - a list of r integers corresponding to the sparsity of
each PC.
It returns an objecti with 4 fields - x_best (p x r
array containing the sparse PCs), - objective_value -
orthogonality_violation - runtime.
Here is a short example demonstrating how to use the package. First,
you need to load the library.
Then, define the input variables.
library(datasets)
df <- datasets::mtcars
TestMat <- cor(df)
And then simply call the function
mspca(TestMat, 2, c(4,4))
Development
Here, we provide more information about the code structure and
organization to help developers that would like to improve the method or
build up on it.
Files
- R
- RcppExports.R
It offers the R interface, which will call the
corresponding C++ functions. Regenerate or change it manually if needed
(e.g., if the interface changes). We recommend generating it
automatically by using Rcpp::compileAttributes().
- main.R
It contains all the functions of the package. For the
functions coded in Rcpp (and exported in the RcppExports.R file), this
script provides (i) user-friendly names, (ii) documentation. This script
also defines useful supporting functions.
- man/ contains the pages of the manual: one page for the package and
one per function. The are generated automatically from the comments in
R/main.R via the
devtools::document() command.
- src/ contains the source files of the algorithm, in C++.
- ConstantArguments.h
It contains some parameters of the
algorithm that are not directly tuneable by the end user.
- msPCA_R_CPP.cpp
It contains the implementation of the
algorithm.
- RcppExports.cpp
It contains the converted function that can be
used by R. Regenerate or change it manually if needed (e.g., if the
interface changes). It can be generated using
Rcpp::compileAttributes().
- Makevars
This is not currently used. Use it to set attributes,
such as the version of C++ for compilation.
- Makevars.win
This is not currently used. Use it to set
attributes, such as the version of C++ for compilation.
- test/ contains some template R notebooks
- notebook_mtcars.R compares the PCs generated by msPCA on the mtcars
dataset with the ones obtained using several alternative packages
(elasticnet, PMA, sparsepca)
- notebook_plot.R provides code to represent the resulting PCs on any
2D-plane
- notebook_synthetic.R compares the performance of msPCA and
elasticnet on synthetically generated data with 2 true sparse PCs.
Results are stored in the ‘msPCA_synthetic_results.csv’ file and
graphically represented.
- NAMESPACE
It is used to build this package. Change it if
needed (e.g., if the interface changes).
- DESCRIPTION
It contains the description of this package.
- LICENSE
It contains the license information.
- msPCA.Rproj
It contains the settings of this R project. It is
used by RStudio and often does not need to be changed. ### Guidance to
future developers
- The essence of this algorithm is in the file “msPCA_R_CPP.cpp” and
the file “ConstantArguments.h”, where “msPCA_R_CPP.cpp” handles the
computation and “ConstantArguments.h” lists all internal arguments.