---
title: "rSDR vignette"
output: rmarkdown::html_vignette
author: "Sheau-Chiann Chen, Shilin Zhao and Hsin-Hsiung Bill Huang"
vignette: >
%\VignetteIndexEntry{rSDR_vignette}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
# Introduction
This R Markdown document demonstrates the usage of the `rSDR` package for Robust Sufficient Dimension Reduction (rSDR) using alpha-distance covariance and Stiefel manifold learning for estimation (Hsin-Hsiung Huang, Feng Yu & Teng Zhang, 2024). We apply the method to `ionosphere` dataset, including cross-validation, bootstrap methods for optimal alpha selection, and visualization of results for both 2 and 3 dimensions.
# Setup
- Load all packages required by rSDR package.
```{r setup, include=FALSE, echo=FALSE}
library(ManifoldOptim)
library(Matrix)
library(rstiefel)
library(RcppNumerical)
library(expm)
library(ggplot2)
library(future) #parallel
library(future.apply) #parallel
library(fdm2id) #ionosphere dataset
```
```{r, echo=TRUE}
#
# # Load rSDR package
# if (!requireNamespace("rSDR", quietly = TRUE)) {
# if (!requireNamespace("devtools", quietly = TRUE)){ install.packages("devtools")
# }
# library(devtools)
# devtools::install_github("scchen1/rSDR")
# # or
#
# #a local zip file (`rSDR-main.zip`) downloaded from GitHub
# #devtools::install_local(path='yourpath/rSDR-main.zip')
# }
library(rSDR)
```
# Data Preparation
We use the `ionosphere` dataset from the `fdm2id` package. The dataset has 33 predictor variables (V1-V33) and one response variable (V35, ‘g’ for good or ‘b’ for bad).
The data include a response variable `Y` with nx1 matrix and predictors `X` with nxp matrix, where n is the number of observation and p is the number of variable.
```{r ionosphere, echo=TRUE}
#Load ionosphere data in fdm2id package
utils::data("ionosphere", package = "fdm2id")
X<-as.matrix(ionosphere[,c(1:33)])
Y<-ifelse(ionosphere[,34]=='b',0,1)
# Y will be used for rSDR
Y<-matrix(Y,length(Y),1)
#Y.factor will be used in plot_rSDR
Y.factor<-factor(ionosphere$V35,levels=c('b','g'),labels=c('Bad','Good'))
```
# Robust Sufficient Dimension Reduction
- Robust dimension reduction with optimal alpha
+ beta: estimated SDR directions, i.e. reduced p x d matrix.
+ result: ManifoldOptim result
+ projected_data: This projects X onto the estimated SDR directions (X %*% beta), the projected matrix (data) n x d.
+ f_value: the optimal cost function
## Example: 3-Dimensional Reduction
```{r, echo=TRUE}
set.seed(2435)
sdr_result<-rSDR(X=X, Y=Y, d=3, alpha=0.3,maxiter=1000,tol=1e-7)
# An optimal of C is obtained by maximizing the target function using
# ManifoldOptim method
head(sdr_result$C_value)
#The value of cost function f is equal to the negative of the target function.
sdr_result$f_value
#beta=sigma_x^(-0.5)%*%C
head(sdr_result$beta)
#projected_data: This projects X onto the estimated SDR directions (X %*% beta),
#the projected matrix (data) n x d
head(sdr_result$projected_data)
```
## Visualizing the 3D Projected Data
```{r, echo=TRUE,fig.height=3,fig.width=4, fig.cap='Plot for 3-dimensional reduction using rSDR'}
#Plot for 3-dimensional reduction using rSDR
plot_rSDR(projected_data=sdr_result$projected_data,Y=Y.factor,Y.name='group',colors=c("#374E55FF", "#DF8F44FF"))
```
## Cross-Validation for Optimal Alpha (d=3)
- opt.alpha: The optimal alpha that gives minimum mean of cost (target) value (f_test.mean)
- f_test.mean: The mean of cost function value by the alpha sequence - a vector of length length(alpha.v).
- f_test.sd: The standard deviation of cost function value by the alpha sequence - a vector of length length(alpha.v).
- f_test: A kfolds x length(alpha.v) matrix. The cost function value for each fold at a given alpha.
- d: The number of reduced dimensions. The value of d as passed to optimal_alpha_cv.
- kfolds: The value of kfolds as passed to optimal_alpha_cv.
```{r, echo=TRUE}
# plan(multisession) will launch parallel workers running in the background
# to save running time. To shut down background workers launched this way, call
# plan(sequential)
# use all local cores except one
# future::plan(future::multisession, workers = future::availableCores() - 1)
# use 2 cores for parallel
future::plan("multisession", workers = 2)
set.seed(2435)
opt_results<-optimal_alpha_cv(alpha.v=c(0.3, 0.5, 0.7),X=X,Y=Y,d=3,kfolds=10)
#opt_results<-optimal_alpha_cv(alpha.v=c(0.3, 0.4, 0.5, 0.6, 0.7),X=X,Y=Y,d=3,kfolds=10)
# Optimal alpha using cross validation
opt_results$opt.alpha
# The cost function values by alpha (column) for each cross validation (row)
head(opt_results$f_test)
# The mean of cost function value by alpha
opt_results$f_test.mean
# The standard deviation of cost function value by alpha
opt_results$f_test.sd
# The reduced dimension
opt_results$d
# Number of folds
opt_results$kfolds
```
```{r, fig.height=3,fig.width=3,echo=TRUE,fig.cap='Mean and standard deviation of cost function (k-fold cross-validation method)'}
plot_alpha(opt_results=opt_results)
```
## Bootstrap for Optimal Alpha (d=3)
- opt.alpha: the optimal alpha that gives minimum mean of cost (target) value (f_test.mean)
- f_test.mean: The mean of cost function value by the alpha sequence - a vector of length length(alpha.v).
- f_test.sd: The standard deviation of cost function value by the alpha sequence - a vector of length length(alpha.v).
- f_test: A R x length(alpha.v) matrix. The cost function value for each replicates at a given alpha.
- d: The number of reduced dimensions. The value of d as passed to optimal_alpha_cv.
- R: The value of R as passed to optimal_alpha_boot. the number of bootstrap replicates. The default is 50.
```{r,echo=TRUE}
# plan(multisession) will launch parallel workers running in the background
# to save running time. To shut down background workers launched this way, call
# plan(sequential)
# use all local cores except one
# future::plan(future::multisession, workers = future::availableCores() - 1)
# use 2 cores for parallel
future::plan("multisession", workers = 2)
set.seed(2435)
opt_results<-optimal_alpha_boot(alpha.v=c(0.3,0.5, 0.7),X=X,Y=Y,d=3,R=10)
#opt_results<-optimal_alpha_boot(alpha.v=c(0.3,0.4,0.5,0.6, 0.7),X=X,Y=Y,d=3,R=50)
# Optimal alpha using bootstrap method
opt_results$opt.alpha
# The cost function values by alpha (column) for R bootstrap replicates (row)
head(opt_results$f_test)
# The mean of cost function value by alpha
opt_results$f_test.mean
# The standard deviation of cost function value by alpha
opt_results$f_test.sd
# The reduced dimension
opt_results$d
# Number of bootstrap replicates
opt_results$R
```
```{r,fig.height=3,fig.width=3,fig.cap='Mean and standard deviation of cost function (bootstrap method)'}
plot_alpha(opt_results=opt_results)
```
## Example: 2-Dimensional Reduction and Plotting
```{r}
set.seed(2435)
sdr_result<-rSDR(X=X, Y=Y, d=2, alpha=0.3,maxiter=1000,tol=1e-7)
# An optimal of C is obtained by maximizing the target function using ManifoldOptim method
head(sdr_result$C_value)
#The value of cost function f is equal to the negative of the target function.
sdr_result$f_value
#beta=sigma_x^(-0.5)%*%C
head(sdr_result$beta)
#projected_data: This projects X onto the estimated SDR directions (X %*% beta), the projected matrix (data) n x d
head(sdr_result$projected_data)
```
```{r}
#Plot for 2-dimensional reduction using rSDR
plot_rSDR(projected_data=sdr_result$projected_data,Y=Y.factor,Y.name='group',colors=c("#374E55FF", "#DF8F44FF"))
```
# Conclusion
This document tested the core functionality of the `rSDR` package, including:
- Performing Robust Sufficient Dimension Reduction with `rSDR`.
- Selecting the optimal alpha using cross-validation (`optimal_alpha_cv`) and bootstrap (`optimal_alpha_boot`).
- Visualizing the results with `plot_alpha`, and `plot_rSDR`.
The package successfully processes the ionosphere dataset, optimizes the alpha parameter, and produces meaningful visualizations of the reduced dimensions.
# Reference
Hsin-Hsiung Huang, Feng Yu & Teng Zhang (19 Feb 2024): Robust sufficient dimension reduction via α-distance covariance, Journal of Nonparametric Statistics, DOI:10.1080/10485252.2024.2313137