This vignette introduces features that will be available in version
2.1.0 of the corrselect package. These enhancements aim to
provide more flexibility and alternative strategies for variable subset
selection.
Spectral Method (Prototype)
A new selection strategy based on spectral clustering is currently in
development. This approach performs a normalized spectral clustering on
the correlation matrix to identify sets of weakly correlated
variables.
Rationale
Unlike local or exhaustive search algorithms, spectral clustering
provides a global approximation that can rapidly identify candidate
subsets with minimal internal association.
Overview of Steps
The algorithm follows these steps:
- Similarity matrix from absolute correlations: \(S_{ij} = 1 - |r_{ij}|\)
- Degree vector: \(D_i =
\sum_j S_{ij}\)
- Normalized Laplacian: \(L
= I - D^{-1/2} S D^{-1/2}\)
- Eigen decomposition of \(L\)
- K-means clustering in the reduced eigenvector
space
- Validation of each cluster based on correlation
threshold and forced variables
Basic Example
set.seed(1)
mat <- matrix(rnorm(100), ncol = 10)
colnames(mat) <- paste0("V", 1:10)
cmat <- cor(mat)
res <- MatSelect(cmat, threshold = 0.5, method = "spectral")
res
Customizing the Number of Clusters
You can pass an integer k to override the default number
of clusters:
res <- MatSelect(cmat, threshold = 0.5, method = "spectral", k = 4)
Note that this method is still under testing and might change before
release.
Availability
This feature will be available in version 2.1.0. If
you’re interested in testing it early, you can install the development
version from GitHub:
# install.packages("devtools")
devtools::install_github("gcol33/corrselect")
I welcome feedback and suggestions via GitHub issues or direct
contact.