Type: | Package |
Title: | Generalized Ridge Regression for Linear Models |
Version: | 1.0 |
Description: | Ridge regression due to Hoerl and Kennard (1970)<doi:10.1080/00401706.1970.10488634> and generalized ridge regression due to Yang and Emura (2017)<doi:10.1080/03610918.2016.1193195> with optimized tuning parameters. These ridge regression estimators (the HK estimator and the YE estimator) are computed by minimizing the cross-validated mean squared errors. Both the ridge and generalized ridge estimators are applicable for high-dimensional regressors (p>n), where p is the number of regressors, and n is the sample size. |
License: | GPL-2 |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
NeedsCompilation: | no |
Packaged: | 2023-12-07 05:29:27 UTC; takes |
Author: | Takeshi Emura [aut, cre], Szu-Peng Yang [ctb] |
Maintainer: | Takeshi Emura <takeshiemura@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2023-12-07 11:50:05 UTC |
GCV (generalized cross-validation)
Description
The CGV function gives the sum of cross-validated squared errors that can be used to optimize tuning parameters in ridge regression and generalized ridge regression. See Golub et al. (1979), and Sections 2.3 and 3.3 of Yang and Emura (2017) for details.
Usage
GCV(X, Y, k, W = diag(ncol(X)))
Arguments
X |
matrix of explanatory variables (design matrix) |
Y |
vector of response variables |
k |
shrinkage parameter (>0); it is the "lambda" parameter |
W |
matrix of weights (default is the identity matrix) |
Value
The value of GCV
References
Yang SP, Emura T (2017) A Bayesian approach with generalized ridge estimation for high-dimensional regression and testing, Commun Stat-Simul 46(8): 6083-105.
Golub GH, Heath M, Wahba G (1979) Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21:215–223.
Examples
n=100 # no. of observations
p=100 # no. of dimensions
q=r=10 # no. of nonzero coefficients
beta=c(rep(0.5,q),rep(0.5,r),rep(0,p-q-r))
X=X.mat(n,p,q,r)
Y=X%*%beta+rnorm(n,0,1)
GCV(X,Y,k=1)
X.mat (generating a design matrix)
Description
A design matrix (X; nrow(X)=n, ncol(X)=p) is generated by random numbers as previously used in our simulation studies (Section 5 of Yang and Emura (2017); p.6093). The design matrix has two blocks of correlated regressors (Pearson correlation=0.5): the first q regressors and the second r regressors. Other p-q-r regressors are independent. If regressors are gene expressions, the correlated blocks may be regarded as "gene pathways" (Emura et al. 2012).
Usage
X.mat(n, p, q, r)
Arguments
n |
the number of rows (samples) |
p |
the number of columns (regressors) |
q |
the number of correlated regressors in the first block (1<=q<p, q+r<p) |
r |
the number of correlated regressors in the second block (1<=r<p, q+r<p) |
Value
a matrix X (nrow(X)=n, ncol(X)=p)
References
Yang SP, Emura T (2017) A Bayesian approach with generalized ridge estimation for high-dimensional regression and testing, Commun Stat-Simul 46(8): 6083-105
Emura T, Chen YH, Chen HY (2012) Survival prediction based on compound covariate method under Cox proportional hazard models PLoS ONE 7(10) doi:10.1371/journal.pone.0047627
Examples
X.mat(n=10,p=5,q=2,r=2)
X.mat(n=100,p=50,q=10,r=10) # Case I in Section 5 of Yang and Emura (2017)
g.ridge (generalized ridge regression)
Description
Generalized ridge regression with the optimal shrinkage parameter. Ridge regression (Hoerl and Kennard, 1970) and generalized ridge regression (Yang and Emura 2017) are implemented. Tuning parameters are optimized by minimizing the CGV function (by the function CGV(.)): See Golub et al. (1979), and Sections 2.3 and 3.3 of Yang and Emura (2017).
Usage
g.ridge(X, Y, method = "HK", kmax = 500)
Arguments
X |
design matrix of explanatory variables (regressors) |
Y |
vector of response variables |
method |
"HK" or "YE" for Hoerl and Kennard (1970) or Yang and Emura (2017) |
kmax |
maximum possible value for the shrinkage parameter (the "lambda" parameter), where the parameter is optimized in the interval (0, kmax). |
Value
lambda: optimized shrinkage parameter
delta: the optimized thresholding parameter
estimate: regression coefficients (beta)
SE: Standard Error
Z: Z-value for testing beta=0
SE: P-value for testing beta=0
Sigma: variance estimate of the error distribution (the square of the standard deviation)
delta: thresholding parameter
References
Yang SP, Emura T (2017) A Bayesian approach with generalized ridge estimation for high-dimensional regression and testing, Commun Stat-Simul 46(8): 6083-105.
Hoerl AE, Kennard RW (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12:55–67.
Examples
n=100 # no. of observations
p=100 # no. of dimensions
q=r=10 # no. of nonzero coefficients
beta=c(rep(0.5,q),rep(0.5,r),rep(0,p-q-r))
X=X.mat(n,p,q,r)
Y=X%*%beta+rnorm(n,0,1)
g.ridge(X,Y-mean(Y),method="HK",kmax=200)
g.ridge(X,Y-mean(Y),method="YE",kmax=200)