Type: Package
Title: Penalized Cox Model for High-Dimensional Data with Grouped Predictors
Version: 1.0.2
Date: 2023-12-10
Author: Xuan Dang
Maintainer: Xuan Dang <xuandang11289@gmail.com>
Description: Fit the penalized Cox models with both non-overlapping and overlapping grouped penalties including the group lasso, group smoothly clipped absolute deviation, and group minimax concave penalty. The algorithms combine the MM approach and group-wise descent with some computational tricks including the screening, active set, and warm-start. Different tuning regularization parameter methods are provided.
License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
Encoding: UTF-8
Imports: Rcpp (≥ 1.0.11)
LinkingTo: Rcpp, RcppEigen
Depends: Matrix (≥ 1.6-1.1), MASS, colorspace
RoxygenNote: 7.2.3
NeedsCompilation: yes
Packaged: 2023-12-11 15:42:43 UTC; xuandang
Repository: CRAN
Date/Publication: 2023-12-11 16:00:11 UTC

Penalized Cox Model for High-Dimensional Data with Grouped Predictors

Description

Fit the penalized Cox models with both non-overlapping and overlapping grouped penalties including the group lasso, group smoothly clipped absolute deviation, and group minimax concave penalty. The algorithms combine the MM approach and group-wise descent with some computational tricks including the screening, active set, and warm-start. Different tuning regularization parameter methods are provided.

Package Content

Index of help topics:

cv.grpCox               Cross-validation for grpCox
cv.grpCoxOverlap        Cross-validation for grpCoxOverlap
grpCox                  Fit a penalized Cox model.
grpCox-package          Penalized Cox Model for High-Dimensional Data
                        with Grouped Predictors
grpCoxOverlap           Fit a penalized regression path with
                        overlapping grouped covariates.
plot.Coef               Plots the coefficient paths
plot.gCoef              Plots the coefficient paths with the same color
                        for the covariates in the same group.
plot.llCV               Plot the cross-validation curve produced by
                        cv.grpCox or cv.grpCoxOverlap

Maintainer

Xuan Dang <xuandang11289@gmail.com>

Author(s)

Xuan Dang


Cross-validation for grpCox

Description

Does k-fold cross-validation for grpCox

Usage

cv.grpCox(X, y, g, m, penalty=c("glasso", "gSCAD", "gMCP"), lambda=NULL, 
nlambda=100, rlambda=NULL, gamma=switch(penalty, SCAD = 3.7, 3), 
standardize=TRUE, thresh=1e-3, maxit=1e+4, nfolds=10, foldid=NULL)

Arguments

X

The design matrix.

y

The response vector includes time corresponding to failure/censor times, and status indicating failure (1) or censoring (0).

g

A vector indicating the group structure of the covariates. It can be unordered groups.

m

Group multipliers. Default is the square root of group size.

penalty

The penalty to be applied to the model. It is one of glasso, gSCAD, or gMCP.

lambda

A user supplied sequence of lambda values. If it is left unspecified, and the function automatically computes a grid of lambda values.

nlambda

The number of lambda values to use in the regularization path. Default is 100.

rlambda

Smallest value for lambda, as a fraction of the maximum lambda, the data derived entry value, i.e. the smallest value for which all coefficients are zero. The default depends on the sample size relative to the number of covariates. If sample size>#covariates, the default is 0.001, close to zero. If sample size>#covariates, the default is 0.05.

gamma

Tuning parameter of the group SCAD/MCP penalty. Default is 3.7 for SCAD and 3 for MCP.

standardize

Logical flag for variable standardization prior to fitting the model.

thresh

Convergence threshold for one-step coordinate descent. Defaults value is 1E-7.

maxit

Maximum number of passes over the data for all lambda values; default is 1E+5.

nfolds

The number of cross-validation folds. Default is 10.

foldid

An optional vector of values between 1 and nfolds identifying what fold each observation is in.

Value

aBetaSTD

A standardized coefficient matrix whose columns correspond to nlambda values of lambda.

aBetaO

A coefficient matrix (without standardization) whose columns correspond to nlambda values of lambda.

mBetaSTD

The coefficient in standardized form gives maximum log-likelihood value using the first cross-validation method.

mBetaO

The coefficient in original form gives maximum log-likelihood value using the first cross-validation method.

pBetaSTD

The coefficient in standardized form gives maximum log-likelihood value using the penalized cross-validation method.

pBetaO

The coefficient in original formgives maximum log-likelihood value using the penalized cross-validation method.

fit

A matrix includes lambda value, the mean cross-validation error.

lambda

The lambda values used.

g

A vector indicating the group structure of the covariates.

cvmax

The maximum value of log likelihood.

lambda.max

The value of lambda corresponds to the maximum value of log likelihood using the first cross-validation method.

lambda.pcvl

The value of lambda corresponds to the maximum value of log likelihood using the penalized cross-validation method.

Author(s)

Xuan Dang <xuandang11289@gmail.com>

References

Verweij PJ, Houwelingen HC. Cross-validation in survival analysis. Statistics in Medicine 1993; 12(24): 385-395.

Ternes N, Rotolo F, Michiels S. Empirical extensions of the lasso penalty to reduce the false discovery rate in highdimensional Cox regression models. Statistics in Medicine 2016; 35(15): 2561-73.

Examples

set.seed(200)
N <- 50
p <- 9
x <- matrix(rnorm(N * p), nrow = N)
beta <- c(.65,.65,0,0,.65,.65,0,.65,0)
hx <- exp(x %*% beta) 
ty <- rexp(N,hx) 
tcens <- 1 - rbinom(n=N, prob = 0.2, size = 1)
y <- data.frame(illt=ty, ills=tcens)
names(y) <- c("time", "status")

g <- c(1,1,2,2,3,3,2,3,2)
m <- c(sqrt(2),sqrt(4),sqrt(3))

cvfit <- cv.grpCox(x,y,g,m,penalty="glasso")
plot.llCV(cvfit)
plot.gCoef(cvfit$aBetaO, cvfit$g, cvfit$lambda)

Cross-validation for grpCoxOverlap

Description

Does k-fold cross-validation for grpCoxOverlap

Usage

cv.grpCoxOverlap(X0, y, group, penalty=c("glasso", "gSCAD", "gMCP"), 
lambda=NULL, nlambda=100, rlambda=NULL,gamma=switch(penalty, SCAD = 3.7, 3), 
standardize=TRUE, thresh=1e-3, maxit=1e+4, nfolds=10, foldid=NULL, 
returnLatent=TRUE)

Arguments

X0

The design matrix.

y

The response vector includes time corresponding to failure/censor times, and status indicating failure (1) or censoring (0).

group

A list of groups, each includes indices of covariates in the group.

penalty

The penalty to be applied to the model. It is one of glasso, gSCAD, or gMCP.

lambda

A user supplied sequence of lambda values. If it is left unspecified, and the function automatically computes a grid of lambda values.

nlambda

The number of lambda values to use in the regularization path. Default is 100.

rlambda

Smallest value for lambda, as a fraction of the maximum lambda, the data derived entry value, i.e. the smallest value for which all coefficients are zero. The default depends on the sample size relative to the number of covariates. If sample size>#covariates, the default is 0.001, close to zero. If sample size>#covariates, the default is 0.05.

gamma

Tuning parameter of the group SCAD/MCP penalty. Default is 3.7 for SCAD and 3 for MCP.

standardize

Logical flag for variable standardization prior to fitting the model.

thresh

Convergence threshold for one-step coordinate descent. Defaults value is 1E-7.

maxit

Maximum number of passes over the data for all lambda values; default is 1E+5.

nfolds

The number of cross-validation folds. Default is 10.

foldid

An optional vector of values between 1 and nfolds identifying what fold each observation is in.

returnLatent

Return the coefficient matrix in latent space. Default is TRUE.

Value

aBetaLatent

A coefficient matrix whose columns correspond to nlambda values of lambda in latent space.

aBetaOri

A coefficient matrix whose columns correspond to nlambda values of lambda in original space.

mBetaLatent

The coefficient in latent space gives maximum log-likelihood value using the first cross-validation method.

mBetaOri

The coefficient in original space gives maximum log-likelihood value using the first cross-validation method.

pBetaLatent

The coefficient in latent space gives maximum log-likelihood value using the penalized cross-validation method.

pBetaOri

The coefficient in original space gives maximum log-likelihood value using the penalized cross-validation method.

fit

A matrix includes lambda value, the mean cross-validation error.

lambda

The lambda values used.

group

A list of groups, each includes indices of covariates in the group.

glatent

A vector indicating the group structure of the covariates in latent space.

cvmax

The maximum value of log likelihood.

lambda.max

The value of lambda corresponds to the maximum value of log likelihood using the first cross-validation method.

lambda.pcvl

The value of lambda corresponds to the maximum value of log likelihood using the penalized cross-validation method.

Author(s)

Xuan Dang <xuandang11289@gmail.com>

References

Verweij PJ, Houwelingen HC. Cross-validation in survival analysis. Statistics in Medicine 1993; 12(24): 385-395.

Ternes N, Rotolo F, Michiels S. Empirical extensions of the lasso penalty to reduce the false discovery rate in highdimensional Cox regression models. Statistics in Medicine 2016; 35(15): 2561-73.

Examples

set.seed(100001)
N <- 50
p <- 6
times <- 1:p
rho <- 0.5
H <- abs(outer(times, times, "-"))
C <- 1 * rho^H
C[cbind(1:p, 1:p)] <- C[cbind(1:p, 1:p)] 
sigma <- matrix(C,p,p)
mu <- rep(0,p)
x <- mvrnorm(n=N, mu, sigma)

beta <- c(0, .8, 1, 2, 1, 0)
hx <- exp(x %*% beta) 
ty <- rexp(N,hx) 
tcens <- 1 - rbinom(n=N, prob = 0.2, size = 1)
y <- data.frame(illt=ty, ills=tcens)
names(y) <- c("time", "status")

group <- list(g1 = c(1,2,3,4), g2 = c(1,2,6), g3 = c(2,3), 
              g4 = c(4,5), g5 = c(5))
cvfit <- cv.grpCoxOverlap(x, y, group, penalty="glasso", nlambda=50)
plot.llCV(cvfit)

Fit a penalized Cox model.

Description

Fit the regularization paths for Cox models with grouped covariates.

Usage

grpCox(X, y, g, m, penalty=c("glasso", "gSCAD", "gMCP"), lambda=NULL, 
nlambda=100, rlambda=NULL, gamma=switch(penalty, gSCAD = 3.7, 3), 
standardize=TRUE, thresh=1e-3, maxit=1e+4)

Arguments

X

The design matrix.

y

The response vector includes time corresponding to failure/censor times, and status indicating failure (1) or censoring (0).

g

A vector indicating the group structure of the covariates. It can be unordered groups.

m

Group multipliers. Default is the square root of group size.

penalty

The penalty to be applied to the model. It is one of glasso, gSCAD, or gMCP.

lambda

A user supplied sequence of lambda values. If it is left unspecified, and the function automatically computes a grid of lambda values.

nlambda

The number of lambda values to use in the regularization path. Default is 100.

rlambda

Smallest value for lambda, as a fraction of the maximum lambda, the data derived entry value, i.e. the smallest value for which all coefficients are zero. The default depends on the sample size relative to the number of covariates. If sample size>#covariates, the default is 0.001, close to zero. If sample size>#covariates, the default is 0.05.

gamma

Tuning parameter of the group SCAD/MCP penalty. Default is 3.7 for SCAD and 3 for MCP.

standardize

Logical flag for variable standardization prior to fitting the model.

thresh

Convergence threshold for one-step coordinate descent. Defaults value is 1E-7.

maxit

Maximum number of passes over the data for all lambda values; default is 1E+5.

Details

The the group SCAD (gSCAD) and group MCP (gMCP) formulations have been presented in Wang et. al 2007, Huang et. al 2012.

Value

aBetaSTD

A standardized coefficient matrix whose columns correspond to nlambda values of lambda.

aBetaO

A coefficient matrix (without standardization) whose columns correspond to nlambda values of lambda.

lambda

The lambda values used.

ll

The log likelihood values.

g

A vector indicating the group structure of the covariates.

Author(s)

Xuan Dang <xuandang11289@gmail.com>

References

Wang, L., Chen, G., and Li, H. Group SCAD regression analysis for microarray time course gene expression data. Bioinformatics 23.12 (2007), pp. 1486-1494.

Huang, J., Breheny, P., and Ma, S. A selective review of group selection in high-dimensional models. Statistical Science 27.4 (2012), pp. 481-499.

Examples

set.seed(200)
N <- 50
p <- 9
x <- matrix(rnorm(N * p), nrow = N)
beta <- c(.65,.65,0,0,.65,.65,0,.65,0)
hx <- exp(x %*% beta) 
ty <- rexp(N,hx) 
tcens <- 1 - rbinom(n=N, prob = 0.2, size = 1)
y <- data.frame(illt=ty, ills=tcens)
names(y) <- c("time", "status")

g <- c(1,1,2,2,3,3,2,3,2)
m <- c(sqrt(2),sqrt(4),sqrt(3))

fit <- grpCox(x,y,g,m,penalty="glasso")
plot.gCoef(fit$aBetaO, fit$g, fit$lambda)

Fit a penalized regression path with overlapping grouped covariates.

Description

Fit the regularization paths for Cox's models with overlapping grouped covariates.

Usage

grpCoxOverlap(X0, y, group, penalty=c("glasso", "gSCAD", "gMCP"), 
lambda=NULL, nlambda=100, rlambda=NULL, gamma=switch(penalty, gSCAD = 3.7, 3),
standardize = TRUE, thresh=1e-3, maxit=1e+4, returnLatent=TRUE)

Arguments

X0

The design matrix.

y

The response vector includes time corresponding to failure/censor times, and status indicating failure (1) or censoring (0).

group

A list of groups, each includes indices of covariates in the group.

penalty

The penalty to be applied to the model. It is one of glasso, gSCAD, or gMCP.

lambda

A user supplied sequence of lambda values. If it is left unspecified, and the function automatically computes a grid of lambda values.

nlambda

The number of lambda values to use in the regularization path. Default is 100.

rlambda

Smallest value for lambda, as a fraction of the maximum lambda, the data derived entry value, i.e. the smallest value for which all coefficients are zero. The default depends on the sample size relative to the number of covariates. If sample size>#covariates, the default is 0.001, close to zero. If sample size>#covariates, the default is 0.05.

gamma

Tuning parameter of the group SCAD/MCP penalty. Default is 3.7 for SCAD and 3 for MCP.

standardize

Logical flag for variable standardization prior to fitting the model.

thresh

Convergence threshold for one-step coordinate descent. Defaults value is 1E-7.

maxit

Maximum number of passes over the data for all lambda values; default is 1E+5.

returnLatent

Return the coefficient matrix in latent space. Default is TRUE.

Details

The the group SCAD (gSCAD) and group MCP (gMCP) formulations have been presented in Wang et. al 2007, Huang et. al 2012.

The method based on the latent group approach (Jacob et al. 2009, Obozinski et al. 2011.)

Value

aBetaLatent

A coefficient matrix whose columns correspond to nlambda values of lambda in latent space.

aBetaOri

A coefficient matrix whose columns correspond to nlambda values of lambda in original space.

lambda

The lambda values used.

ll

The log likelihood values.

group

A list of groups, each includes indices of covariates in the group.

glatent

A vector indicating the group structure of the covariates in latent space.

Author(s)

Xuan Dang <xuandang11289@gmail.com>

References

Wang, L., Chen, G., and Li, H. Group SCAD regression analysis for microarray time course gene expression data. Bioinformatics 23.12 (2007), pp. 1486-1494.

Huang, J., Breheny, P., and Ma, S. A selective review of group selection in high-dimensional models." Statistical Science 27.4 (2012), pp. 481-499.

Jacob, L., Obozinski, G., and Vert, J. P. (2009, June). Group lasso with overlap and graph lasso. In Proceedings of the 26th annual international conference on machine learning, ACM: 433-440.

Obozinski, G., Jacob, L., and Vert, J. P. (2011). Group lasso with overlaps: the latent group lasso approach.

Examples

set.seed(100001)
N <- 50
p <- 6
times <- 1:p
rho <- 0.5
H <- abs(outer(times, times, "-"))
C <- 1 * rho^H
C[cbind(1:p, 1:p)] <- C[cbind(1:p, 1:p)] 
sigma <- matrix(C,p,p)
mu <- rep(0,p)
x <- mvrnorm(n=N, mu, sigma)

beta <- c(0, .8, 1, 2, 1, 0)
hx <- exp(x %*% beta) 
ty <- rexp(N,hx) 
tcens <- 1 - rbinom(n=N, prob = 0.2, size = 1)
y <- data.frame(illt=ty, ills=tcens)
names(y) <- c("time", "status")

group <- list(g1 = c(1,2,3,4), g2 = c(1,2,6), g3 = c(2,3), g4 = c(4,5), g5 = c(5))
fit <- grpCoxOverlap(x, y, group, penalty="glasso", nlambda=50)
# plot the coefficient values in latent space
plot.gCoef(fit$aBetaLatent, fit$glatent, fit$lambda)
# plot the coefficient values in original space
plot.Coef(fit$aBetaOri, fit$lambda)

Plots the coefficient paths

Description

Plots the coefficient values as a function of the lambda values used.

Usage

## S3 method for class 'Coef'
plot(x, lambda, label=TRUE,xlab="log(Lambda)", 
ylab="Coefficients",title=NULL,...)

Arguments

x

A matrix of coefficients.

lambda

The lambda values used.

label

The indices of covariates. Default is TRUE.

xlab

The name of the x-axis.

ylab

The name of the y-axis.

title

The title of the plot.

...

further arguments to plot

Details

A plot is produced, and nothing is returned.

Value

No return value.

Author(s)

Xuan Dang <xuandang11289@gmail.com>

Examples

set.seed(100001)
N <- 50
p <- 6
times <- 1:p
rho <- 0.5
H <- abs(outer(times, times, "-"))
C <- 1 * rho^H
C[cbind(1:p, 1:p)] <- C[cbind(1:p, 1:p)] 
sigma <- matrix(C,p,p)
mu <- rep(0,p)
x <- mvrnorm(n=N, mu, sigma)

beta <- c(0, .8, 1, 2, 1, 0)
hx <- exp(x %*% beta) 
ty <- rexp(N,hx) 
tcens <- 1 - rbinom(n=N, prob = 0.2, size = 1)
y <- data.frame(illt=ty, ills=tcens)
names(y) <- c("time", "status")

group <- list(g1 = c(1,2,3,4), g2 = c(1,2,6), g3 = c(2,3), g4 = c(4,5), g5 = c(5))
fit <- grpCoxOverlap(x, y, group, penalty="glasso", nlambda=50)
# plot the coefficient values in latent space
plot.gCoef(fit$aBetaLatent, fit$glatent, fit$lambda)
# plot the coefficient values in original space
plot.Coef(fit$aBetaOri, fit$lambda)

Plots the coefficient paths with the same color for the covariates in the same group.

Description

Plots the coefficient values as a function of the lambda values used. The covariates in the same group have the same color.

Usage

## S3 method for class 'gCoef'
plot(x,g,lambda,label=TRUE,xlab="log(Lambda)",
ylab="Coefficients", title=NULL,...)

Arguments

x

A matrix of coefficients.

g

A vector indicating the group structure of the covariates.

lambda

The lambda values used.

label

The indices of covariates. Default is TRUE.

xlab

The name of the x-axis.

ylab

The name of the y-axis.

title

The title of the plot.

...

further arguments to plot

Details

A plot is produced, and nothing is returned.

Value

No return value.

Author(s)

Xuan Dang <xuandang11289@gmail.com>

Examples

set.seed(100001)
N <- 50
p <- 6
times <- 1:p
rho <- 0.5
H <- abs(outer(times, times, "-"))
C <- 1 * rho^H
C[cbind(1:p, 1:p)] <- C[cbind(1:p, 1:p)] 
sigma <- matrix(C,p,p)
mu <- rep(0,p)
x <- mvrnorm(n=N, mu, sigma)

beta <- c(0, .8, 1, 2, 1, 0)
hx <- exp(x %*% beta) 
ty <- rexp(N,hx) 
tcens <- 1 - rbinom(n=N, prob = 0.2, size = 1)
y <- data.frame(illt=ty, ills=tcens)
names(y) <- c("time", "status")

group <- list(g1 = c(1,2,3,4), g2 = c(1,2,6), g3 = c(2,3), g4 = c(4,5), g5 = c(5))
fit <- grpCoxOverlap(x, y, group, penalty="glasso", nlambda=50)
# plot the coefficient values in latent space
plot.gCoef(fit$aBetaLatent, fit$glatent, fit$lambda)
# plot the coefficient values in original space
plot.Coef(fit$aBetaOri, fit$lambda)

Plot the cross-validation curve produced by cv.grpCox or cv.grpCoxOverlap

Description

Plots the cross-validation curve, and upper and lower standard deviation curves, as a function of the lambda values used.

Usage

## S3 method for class 'llCV'
plot(x,...)

Arguments

x

fitted cv.grpCox or cv.grpCoxOverlap object

...

further arguments to plot

Details

A plot is produced, and nothing is returned.

Value

No return value.

Author(s)

Xuan Dang <xuandang11289@gmail.com>

Examples

set.seed(200)
N <- 50
p <- 9
x <- matrix(rnorm(N * p), nrow = N)
beta <- c(.65,.65,0,0,.65,.65,0,.65,0)
hx <- exp(x %*% beta) 
ty <- rexp(N,hx) 
tcens <- 1 - rbinom(n=N, prob = 0.2, size = 1)
y <- data.frame(illt=ty, ills=tcens)
names(y) <- c("time", "status")

g <- c(1,1,2,2,3,3,2,3,2)
m <- c(sqrt(2),sqrt(4),sqrt(3))

cvfit <- cv.grpCox(x,y,g,m,penalty="glasso")
plot.llCV(cvfit)