Type: | Package |
Title: | Cooperative Learning for Multi-View Analysis |
Version: | 0.8 |
Date: | 2023-03-30 |
VignetteBuilder: | knitr |
Depends: | R (≥ 3.5.0) |
Description: | Cooperative learning combines the usual squared error loss of predictions with an agreement penalty to encourage the predictions from different data views to agree. By varying the weight of the agreement penalty, we get a continuum of solutions that include the well-known early and late fusion approaches. Cooperative learning chooses the degree of agreement (or fusion) in an adaptive manner, using a validation set or cross-validation to estimate test set prediction error. In the setting of cooperative regularized linear regression, the method combines the lasso penalty with the agreement penalty (Ding, D., Li, S., Narasimhan, B., Tibshirani, R. (2021) <doi:10.1073/pnas.2202113119>). |
License: | GPL-2 |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
SystemRequirements: | C++17 |
Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0), xfun |
Imports: | glmnet, Matrix, methods, RColorBrewer, Rcpp, stats, survival, utils |
Config/testthat/edition: | 3 |
LinkingTo: | Rcpp, RcppEigen |
NeedsCompilation: | yes |
Packaged: | 2023-03-31 19:06:09 UTC; naras |
Author: | Daisy Yi Ding [aut], Robert J. Tibshirani [aut], Balasubramanian Narasimhan [aut, cre], Trevor Hastie [aut], Kenneth Tay [aut], James Yang [aut] |
Maintainer: | Balasubramanian Narasimhan <naras@stanford.edu> |
Repository: | CRAN |
Date/Publication: | 2023-03-31 20:10:02 UTC |
Cooperative learning for multiple views using generalized linear models
Description
This package performs a version of early and late fusion of multiple views using penalized generalized regression.
Extract coefficients from a cv.multiview object
Description
Extract coefficients from a cv.multiview object
Usage
## S3 method for class 'cv.multiview'
coef(object, s = c("lambda.1se", "lambda.min"), ...)
Arguments
object |
Fitted |
s |
Value(s) of the penalty parameter |
... |
This is the mechanism for passing arguments like |
Value
the matrix of coefficients for specified lambda.
Examples
set.seed(1)
x = matrix(rnorm(100*20), 100, 20)
z = matrix(rnorm(100*20), 100, 20)
U = matrix(rnorm(100*5), 100, 5)
for (m in seq(5)){
u = rnorm(100)
x[, m] = x[, m] + u
z[, m] = z[, m] + u
U[, m] = U[, m] + u}
x = scale(x, center = TRUE, scale = FALSE)
z = scale(z, center = TRUE, scale = FALSE)
beta_U = c(rep(0.1, 5))
y = U %*% beta_U + 0.1 * rnorm(100)
fit1 = cv.multiview(list(x=x,z=z), y, rho = 0.3)
coef(fit1, s="lambda.min")
# Binomial
by = 1 * (y > median(y))
fit2 = cv.multiview(list(x=x,z=z), by, family = binomial(), rho = 0.9)
coef(fit2, s="lambda.min")
# Poisson
py = matrix(rpois(100, exp(y)))
fit3 = cv.multiview(list(x=x,z=z), py, family = poisson(), rho = 0.6)
coef(fit3, s="lambda.min")
Extract coefficients from a multiview object
Description
Extract coefficients from a multiview object
Usage
## S3 method for class 'multiview'
coef(object, s = NULL, ...)
Arguments
object |
Fitted "multiview" object. |
s |
Value(s) of the penalty parameter lambda at which predictions are required. Default is the entire sequence used to create the model. |
... |
This is the mechanism for passing arguments like |
Value
a matrix of coefficients for specified lambda.
Examples
# Gaussian
x = matrix(rnorm(100 * 20), 100, 20)
z = matrix(rnorm(100 * 10), 100, 10)
y = rnorm(100)
fit1 = multiview(list(x=x,z=z), y, rho = 0)
coef(fit1, s=0.1)
# Binomial
by = sample(c(0,1), 100, replace = TRUE)
fit2 = multiview(list(x=x,z=z), by, family = binomial(), rho=0.5)
coef(fit2, s=0.1)
# Poisson
py = matrix(rpois(100, exp(y)))
fit3 = multiview(list(x=x,z=z), py, family = poisson(), rho=0.5)
coef(fit3, s=0.1)
Extract an ordered list of standardized coefficients from a multiview
or cv.multiview
object
Description
This function extracts a ranked list of coefficients after the coefficients are standardized by the standard deviation of the corresponding features. The ranking is based on the magnitude of the standardized coefficients. It also outputs the data view to which each coefficient belongs.
Usage
coef_ordered(object, ...)
Arguments
object |
Fitted |
... |
This is the mechanism for passing arguments like |
Details
The output table shows from left to right the data view each coefficient comes from, the column index of the feature in the corresponding data view, the coefficient after being standardized by the standard deviation of the corresponding feature, and the original fitted coefficient.
Value
data frame of consisting of view name, view column, coefficient and standardized coefficient ordered by rank of standardized coefficient.
Examples
# Gaussian
x = matrix(rnorm(100 * 20), 100, 20)
z = matrix(rnorm(100 * 10), 100, 10)
y = rnorm(100)
fit1 = multiview(list(x=x,z=z), y, rho = 0)
coef_ordered(fit1, s=0.1)
# Binomial
by = sample(c(0,1), 100, replace = TRUE)
fit2 = multiview(list(x=x,z=z), by, family = binomial(), rho=0.5)
coef_ordered(fit2, s=0.1)
# Poisson
py = matrix(rpois(100, exp(y)))
fit3 = multiview(list(x=x,z=z), py, family = poisson(), rho=0.5)
coef_ordered(fit3, s=0.1)
Extract an ordered list of standardized coefficients from a cv.multiview object
Description
This function extracts a ranked list of coefficients after the coefficients are standardized by the standard deviation of the corresponding features. The ranking is based on the magnitude of the standardized coefficients. It also outputs the data view to which each coefficient belongs.
Usage
## S3 method for class 'cv.multiview'
coef_ordered(object, s = c("lambda.1se", "lambda.min"), ...)
Arguments
object |
Fitted |
s |
Value(s) of the penalty parameter |
... |
This is the mechanism for passing arguments like |
Details
The output table shows from left to right the data view each coefficient comes from, the column index of the feature in the corresponding data view, the coefficient after being standardized by the standard deviation of the corresponding feature, and the original fitted coefficient.
Value
data frame of consisting of view name, view column, coefficient and standardized coefficient ordered by rank of standardized coefficient.
Examples
set.seed(1)
x = matrix(rnorm(100*20), 100, 20)
z = matrix(rnorm(100*20), 100, 20)
U = matrix(rnorm(100*5), 100, 5)
for (m in seq(5)){
u = rnorm(100)
x[, m] = x[, m] + u
z[, m] = z[, m] + u
U[, m] = U[, m] + u}
x = scale(x, center = TRUE, scale = FALSE)
z = scale(z, center = TRUE, scale = FALSE)
beta_U = c(rep(0.1, 5))
y = U %*% beta_U + 0.1 * rnorm(100)
fit1 = cv.multiview(list(x=x,z=z), y, rho = 0.3)
coef_ordered(fit1, s="lambda.min")
# Binomial
by = 1 * (y > median(y))
fit2 = cv.multiview(list(x=x,z=z), by, family = binomial(), rho = 0.9)
coef_ordered(fit2, s="lambda.min")
# Poisson
py = matrix(rpois(100, exp(y)))
fit3 = cv.multiview(list(x=x,z=z), py, family = poisson(), rho = 0.6)
coef_ordered(fit3, s="lambda.min")
Extract an ordered list of standardized coefficients from a multiview object
Description
This function extracts a ranked list of coefficients after the coefficients are standardized by the standard deviation of the corresponding features. The ranking is based on the magnitude of the standardized coefficients. It also outputs the data view to which each coefficient belongs.
Usage
## S3 method for class 'multiview'
coef_ordered(object, s = NULL, ...)
Arguments
object |
Fitted |
s |
Value(s) of the penalty parameter |
... |
This is the mechanism for passing arguments like |
Details
The output table shows from left to right the data view each coefficient comes from, the column index of the feature in the corresponding data view, the coefficient after being standardized by the standard deviation of the corresponding feature, and the original fitted coefficient.
Value
data frame of consisting of view name, view column, coefficient and standardized coefficient ordered by rank of standardized coefficient.
Examples
# Gaussian
x = matrix(rnorm(100 * 20), 100, 20)
z = matrix(rnorm(100 * 10), 100, 10)
y = rnorm(100)
fit1 = multiview(list(x=x,z=z), y, rho = 0)
coef_ordered(fit1, s=0.1)
# Binomial
by = sample(c(0,1), 100, replace = TRUE)
fit2 = multiview(list(x=x,z=z), by, family = binomial(), rho=0.5)
coef_ordered(fit2, s=0.1)
# Poisson
py = matrix(rpois(100, exp(y)))
fit3 = multiview(list(x=x,z=z), py, family = poisson(), rho=0.5)
coef_ordered(fit3, s=0.1)
Collapse a list of named lists into one list with the same name
Description
Collapse a list of named lists into one list with the same name
Usage
collapse_named_lists(in_list)
Arguments
in_list |
a list of named lists all with same names (not checked for efficiency) |
Value
a single list with named components all concatenated
Elastic net objective function value for Cox regression model
Description
Returns the elastic net objective function value for Cox regression model.
Usage
cox_obj_function(
y,
pred,
weights,
lambda,
alpha,
coefficients,
vp,
view_components,
rho
)
Arguments
y |
Survival response variable, must be a |
pred |
Model's predictions for |
weights |
Observation weights. |
lambda |
A single value for the |
alpha |
The elasticnet mixing parameter, with |
coefficients |
The model's coefficients. |
vp |
Penalty factors for each of the coefficients. |
view_components |
a list of lists containing indices of coefficients and associated covariate (view) pairs |
rho |
the fusion parameter |
Perform k-fold cross-validation for cooperative learning
Description
Does k-fold cross-validation (CV) for multiview and produces a CV curve.
Usage
cv.multiview(
x_list,
y,
family = gaussian(),
rho = 0,
weights = NULL,
offset = NULL,
lambda = NULL,
type.measure = c("default", "mse", "deviance", "class", "auc", "mae", "C"),
nfolds = 10,
foldid = NULL,
alignment = c("lambda", "fraction"),
grouped = TRUE,
keep = FALSE,
trace.it = 0,
...
)
Arguments
x_list |
a list of |
y |
the quantitative response with length equal to |
family |
A description of the error distribution and link function to be used in the model. This is the result of a call to a family function. Default is stats::gaussian. (See stats::family for details on family functions.) |
rho |
the weight on the agreement penalty, default 0. |
weights |
Observation weights; defaults to 1 per observation |
offset |
Offset vector (matrix) as in |
lambda |
A user supplied |
type.measure |
loss to use for cross-validation. Currently
five options, not all available for all models. The default is
|
nfolds |
number of folds - default is 10. Although |
foldid |
an optional vector of values between 1 and |
alignment |
This is an experimental argument, designed to fix
the problems users were having with CV, with possible values
|
grouped |
This is an experimental argument, with default
|
keep |
If |
trace.it |
If |
... |
Other arguments that can be passed to |
Details
The current code can be slow for "large" data sets, e.g. when the number of features is larger than 1000. It can be helpful to see the progress of multiview as it runs; to do this, set trace.it = 1 in the call to multiview or cv.multiview. With this, multiview prints out its progress along the way. One can also pre-filter the features to a smaller set, using the exclude option, with a filter function.
If there are missing values in the feature matrices: we recommend that you center the columns of each feature matrix, and then fill in the missing values with 0.
For example,
x <- scale(x,TRUE,FALSE)
x[is.na(x)] <- 0
z <- scale(z,TRUE,FALSE)
z[is.na(z)] <- 0
Then run multiview in the usual way. It will exploit the assumed shared latent factors to make efficient use of the available data.
The function runs multiview
nfolds+1
times; the first to get the
lambda
sequence, and then the remainder to compute the fit with each
of the folds omitted. The error is accumulated, and the average error and
standard deviation over the folds is computed. Note that cv.multiview
does NOT search for values for rho
. A specific value should be
supplied, else rho=0
is assumed by default. If users would like to
cross-validate rho
as well, they should call cv.multiview
with
a pre-computed vector foldid
, and then use this same fold vector in
separate calls to cv.multiview
with different values of rho
.
Value
an object of class "cv.multiview"
is returned, which is a
list with the ingredients of the cross-validation
fit.
lambda |
the values of |
cvm |
The mean cross-validated error - a vector of length
|
cvsd |
estimate of standard error of
|
cvup |
upper curve = |
cvlo |
lower
curve = |
nzero |
number of non-zero coefficients
at each |
name |
a text string indicating type of measure (for plotting purposes). |
multiview.fit |
a fitted multiview object for the full data. |
lambda.min |
value of
|
lambda.1se |
largest
value of |
fit.preval |
if |
foldid |
if |
index |
a one column matrix with the indices of |
Examples
# Gaussian
# Generate data based on a factor model
set.seed(1)
x = matrix(rnorm(100*20), 100, 20)
z = matrix(rnorm(100*20), 100, 20)
U = matrix(rnorm(100*5), 100, 5)
for (m in seq(5)){
u = rnorm(100)
x[, m] = x[, m] + u
z[, m] = z[, m] + u
U[, m] = U[, m] + u}
x = scale(x, center = TRUE, scale = FALSE)
z = scale(z, center = TRUE, scale = FALSE)
beta_U = c(rep(0.1, 5))
y = U %*% beta_U + 0.1 * rnorm(100)
fit1 = cv.multiview(list(x=x,z=z), y, rho = 0.3)
# plot the cross-validation curve
plot(fit1)
# extract coefficients
coef(fit1, s="lambda.min")
# extract ordered coefficients
coef_ordered(fit1, s="lambda.min")
# make predictions
predict(fit1, newx = list(x[1:5, ],z[1:5,]), s = "lambda.min")
# Binomial
by = 1 * (y > median(y))
fit2 = cv.multiview(list(x=x,z=z), by, family = binomial(), rho = 0.9)
predict(fit2, newx = list(x[1:5, ],z[1:5,]), s = "lambda.min", type = "response")
plot(fit2)
coef(fit2, s="lambda.min")
coef_ordered(fit2, s="lambda.min")
# Poisson
py = matrix(rpois(100, exp(y)))
fit3 = cv.multiview(list(x=x,z=z), py, family = poisson(), rho = 0.6)
predict(fit3, newx = list(x[1:5, ],z[1:5,]), s = "lambda.min", type = "response")
plot(fit3)
coef(fit3, s="lambda.min")
coef_ordered(fit3, s="lambda.min")
Elastic net deviance value
Description
Returns the elastic net deviance value.
Usage
dev_function(y, mu, weights, family)
Arguments
y |
Quantitative response variable. |
mu |
Model's predictions for |
weights |
Observation weights. |
family |
A description of the error distribution and link function to be used in the model. This is the result of a call to a family function. |
Solve weighted least squares (WLS) problem for a single lambda value
Description
Solves the weighted least squares (WLS) problem for a single lambda value. Internal function that users should not call directly.
Usage
elnet.fit(
x,
y,
weights,
lambda,
alpha = 1,
intercept = TRUE,
thresh = 1e-07,
maxit = 1e+05,
penalty.factor = rep(1, nvars),
exclude = c(),
lower.limits = -Inf,
upper.limits = Inf,
warm = NULL,
from.glmnet.fit = FALSE,
save.fit = FALSE
)
Arguments
x |
Input matrix, of dimension |
y |
Quantitative response variable. |
weights |
Observation weights. |
lambda |
A single value for the |
alpha |
The elasticnet mixing parameter, with
|
intercept |
Should intercept be fitted (default=TRUE) or set to zero (FALSE)? |
thresh |
Convergence threshold for coordinate descent. Each inner
coordinate-descent loop continues until the maximum change in the objective
after any coefficient update is less than thresh times the null deviance.
Default value is |
maxit |
Maximum number of passes over the data; default is |
penalty.factor |
Separate penalty factors can be applied to each
coefficient. This is a number that multiplies |
exclude |
Indices of variables to be excluded from the model. Default is none. Equivalent to an infinite penalty factor. |
lower.limits |
Vector of lower limits for each coefficient; default
|
upper.limits |
Vector of upper limits for each coefficient; default
|
warm |
Either a |
from.glmnet.fit |
Was |
save.fit |
Return the warm start object? Default is FALSE. |
Details
WARNING: Users should not call elnet.fit
directly. Higher-level functions
in this package call elnet.fit
as a subroutine. If a warm start object
is provided, some of the other arguments in the function may be overriden.
elnet.fit
is essentially a wrapper around a C++ subroutine which
minimizes
1/2 \sum w_i (y_i - X_i^T \beta)^2 + \sum \lambda \gamma_j
[(1-\alpha)/2 \beta^2+\alpha|\beta|],
over \beta
, where \gamma_j
is the relative penalty factor on the
jth variable. If intercept = TRUE
, then the term in the first sum is
w_i (y_i - \beta_0 - X_i^T \beta)^2
, and we are minimizing over both
\beta_0
and \beta
.
None of the inputs are standardized except for penalty.factor
, which
is standardized so that they sum up to nvars
.
Value
An object with class "glmnetfit" and "glmnet". The list returned has
the same keys as that of a glmnet
object, except that it might have an
additional warm_fit
key.
a0 |
Intercept value. |
beta |
A |
df |
The number of nonzero coefficients. |
dim |
Dimension of coefficient matrix. |
lambda |
Lambda value used. |
dev.ratio |
The fraction of (null) deviance explained. The deviance calculations incorporate weights if present in the model. The deviance is defined to be 2*(loglike_sat - loglike), where loglike_sat is the log-likelihood for the saturated model (a model with a free parameter per observation). Hence dev.ratio=1-dev/nulldev. |
nulldev |
Null deviance (per observation). This is defined to be 2*(loglike_sat -loglike(Null)). The null model refers to the intercept model. |
npasses |
Total passes over the data. |
jerr |
Error flag, for warnings and errors (largely for internal debugging). |
offset |
Always FALSE, since offsets do not appear in the WLS problem. Included for compability with glmnet output. |
call |
The call that produced this object. |
nobs |
Number of observations. |
warm_fit |
If |
Get lambda max for Cox regression model
Description
Return the lambda max value for Cox regression model, used for computing initial lambda values. For internal use only.
Usage
get_cox_lambda_max(
x,
y,
alpha,
weights = rep(1, nrow(x)),
offset = rep(0, nrow(x)),
exclude = c(),
vp = rep(1, ncol(x))
)
Arguments
x |
Input matrix, of dimension |
y |
Survival response variable, must be a |
alpha |
The elasticnet mixing parameter, with |
weights |
Observation weights. |
offset |
Offset for the model. Default is a zero vector of length
|
exclude |
Indices of variables to be excluded from the model. |
vp |
Separate penalty factors can be applied to each coefficient. |
Details
This function is called by cox.path
for the value of lambda max.
When x
is not sparse, it is expected to already by centered and scaled.
When x
is sparse, the function will get its attributes xm
and
xs
for its centering and scaling factors. The value of
lambda_max
changes depending on whether x
is centered and
scaled or not, so we need xm
and xs
to get the correct value.
Helper function to get etas (linear predictions)
Description
Given x, coefficients and intercept, return linear predictions. Wrapper that works with both regular and sparse x. Only works for single set of coefficients and intercept.
Usage
get_eta(x, beta, a0)
Arguments
x |
Input matrix, of dimension |
beta |
Feature coefficients. |
a0 |
Intercept. |
Get null deviance, starting mu and lambda max
Description
Return the null deviance, starting mu and lambda max values for initialization. For internal use only.
Usage
get_start(
x,
y,
weights,
family,
intercept,
is.offset,
offset,
exclude,
vp,
alpha
)
Arguments
x |
Input matrix, of dimension |
y |
Quantitative response variable. |
weights |
Observation weights. |
family |
A description of the error distribution and link function to be
used in the model. This is the result of a call to a family function.
(See |
intercept |
Does the model we are fitting have an intercept term or not? |
is.offset |
Is the model being fit with an offset or not? |
offset |
Offset for the model. If |
exclude |
Indices of variables to be excluded from the model. |
vp |
Separate penalty factors can be applied to each coefficient. |
alpha |
The elasticnet mixing parameter, with |
Details
This function is called by glmnet.path
for null deviance, starting mu
and lambda max values. It is also called by glmnet.fit
when used
without warmstart, but they only use the null deviance and starting mu values.
When x
is not sparse, it is expected to already by centered and scaled.
When x
is sparse, the function will get its attributes xm
and
xs
for its centering and scaling factors.
Note that whether x
is centered & scaled or not, the values of mu
and nulldev
don't change. However, the value of lambda_max
does
change, and we need xm
and xs
to get the correct value.
Build a block row matrix for multiview
Description
Build a block row matrix for multiview
Usage
make_row(x_list, p_x, pair, rho)
Arguments
x_list |
list of x matrices |
p_x |
a list of ncol of elements in x_list |
pair |
an integer vector of two indices |
rho |
the rho value |
Value
a block row of matrix for multiview
Perform cooperative learning using the direct algorithm for two or more views.
Description
multiview
uses glmnet::glmnet()
to do most of its work and
therefore takes many of the same parameters, but an intercept is
always included and several other parameters do not
apply. Such inapplicable arguments are overridden and warnings issued.
Usage
multiview(
x_list,
y,
rho = 0,
family = gaussian(),
weights = NULL,
offset = NULL,
alpha = 1,
nlambda = 100,
lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04),
lambda = NULL,
standardize = TRUE,
intercept = TRUE,
thresh = 1e-07,
maxit = 1e+05,
penalty.factor = rep(1, nvars),
exclude = list(),
lower.limits = -Inf,
upper.limits = Inf,
trace.it = 0
)
Arguments
x_list |
a list of |
y |
the quantitative response with length equal to |
rho |
the weight on the agreement penalty, default 0. |
family |
A description of the error distribution and link function to be used in the model. This is the result of a call to a family function. Default is stats::gaussian. (See stats::family for details on family functions.) |
weights |
observation weights. Can be total counts if responses are proportion matrices. Default is 1 for each observation |
offset |
A vector of length |
alpha |
The elasticnet mixing parameter, with
|
nlambda |
The number of |
lambda.min.ratio |
Smallest value for |
lambda |
A user supplied |
standardize |
Logical flag for x variable standardization,
prior to fitting the model sequence. The coefficients are always
returned on the original scale. Default is
|
intercept |
Should intercept(s) be fitted (default |
thresh |
Convergence threshold for coordinate descent. Each
inner coordinate-descent loop continues until the maximum change
in the objective after any coefficient update is less than
|
maxit |
Maximum number of passes over the data for all lambda values; default is 10^5. |
penalty.factor |
Separate penalty factors can be applied to
each coefficient. This is a number that multiplies |
exclude |
Indices of variables to be excluded from the
model. Default is none. Equivalent to an infinite penalty factor
for the variables excluded (next item). Users can supply instead
an |
lower.limits |
Vector of lower limits for each coefficient;
default |
upper.limits |
Vector of upper limits for each coefficient;
default |
trace.it |
If |
Details
The current code can be slow for "large" data sets, e.g. when the number of features is larger than 1000. It can be helpful to see the progress of multiview as it runs; to do this, set trace.it = 1 in the call to multiview or cv.multiview. With this, multiview prints out its progress along the way. One can also pre-filter the features to a smaller set, using the exclude option, with a filter function.
If there are missing values in the feature matrices: we recommend that you center the columns of each feature matrix, and then fill in the missing values with 0.
For example,
x <- scale(x,TRUE,FALSE)
x[is.na(x)] <- 0
z <- scale(z,TRUE,FALSE)
z[is.na(z)] <- 0
Then run multiview in the usual way. It will exploit the assumed shared latent factors to make efficient use of the available data.
Value
An object with S3 class "multiview","*"
, where "*"
is
"elnet"
, "lognet"
, "multnet"
, "fishnet"
(poisson),
"coxnet"
or "mrelnet"
for the various types of models.
call |
the call that produced this object |
a0 |
Intercept sequence of length |
beta |
For |
lambda |
The actual sequence of |
lambda |
The sequence of lambda values |
mvlambda |
The corresponding sequence of multiview lambda values |
dev.ratio |
The fraction of (null) deviance explained (for
|
nulldev |
Null deviance (per observation). This is defined to be 2*(loglike_sat -loglike(Null)); The NULL model refers to the intercept model, except for the Cox, where it is the 0 model. |
df |
The number of nonzero coefficients for each
value of |
dfmat |
For |
dim |
dimension of coefficient matrix (ices) |
nobs |
number of observations |
npasses |
total passes over the data summed over all lambda values |
offset |
a logical variable indicating whether an offset was included in the model |
jerr |
error flag, for warnings and errors (largely for internal debugging). |
See Also
print
, coef
, coef_ordered
, predict
, and plot
methods for "multiview"
, and the "cv.multiview"
function.
Examples
# Gaussian
x = matrix(rnorm(100 * 20), 100, 20)
z = matrix(rnorm(100 * 10), 100, 10)
y = rnorm(100)
fit1 = multiview(list(x=x,z=z), y, rho = 0)
print(fit1)
# extract coefficients at a single value of lambda
coef(fit1, s = 0.01)
# extract ordered (standardized) coefficients at a single value of lambda
coef_ordered(fit1, s = 0.01)
# make predictions
predict(fit1, newx = list(x[1:10, ],z[1:10, ]), s = c(0.01, 0.005))
# make a path plot of features for the fit
plot(fit1, label=TRUE)
# Binomial
by = sample(c(0,1), 100, replace = TRUE)
fit2 = multiview(list(x=x,z=z), by, family = binomial(), rho=0.5)
predict(fit2, newx = list(x[1:10, ],z[1:10, ]), s = c(0.01, 0.005), type="response")
coef_ordered(fit2, s = 0.01)
plot(fit2, label=TRUE)
# Poisson
py = matrix(rpois(100, exp(y)))
fit3 = multiview(list(x=x,z=z), py, family = poisson(), rho=0.5)
predict(fit3, newx = list(x[1:10, ],z[1:10, ]), s = c(0.01, 0.005), type="response")
coef_ordered(fit3, s = 0.01)
plot(fit3, label=TRUE)
Internal multiview parameters
Description
View and/or change the factory default parameters in multiview
Usage
multiview.control(
fdev = 1e-05,
devmax = 0.999,
eps = 1e-06,
big = 9.9e+35,
mnlam = 5,
pmin = 1e-09,
exmx = 250,
prec = 1e-10,
mxit = 100,
itrace = 0,
epsnr = 1e-06,
mxitnr = 25,
factory = FALSE
)
Arguments
fdev |
minimum fractional change in deviance for stopping path; factory default = 1.0e-5 |
devmax |
maximum fraction of explained deviance for stopping path; factory default = 0.999 |
eps |
minimum value of lambda.min.ratio (see multiview); factory default= 1.0e-6 |
big |
large floating point number; factory default = 9.9e35. Inf in definition of upper.limit is set to big |
mnlam |
minimum number of path points (lambda values) allowed; factory default = 5 |
pmin |
minimum probability for any class. factory default = 1.0e-9. Note that this implies a pmax of 1-pmin. |
exmx |
maximum allowed exponent. factory default = 250.0 |
prec |
convergence threshold for multi response bounds adjustment solution. factory default = 1.0e-10 |
mxit |
maximum iterations for multiresponse bounds adjustment solution. factory default = 100 |
itrace |
If 1 then progress bar is displayed when running |
epsnr |
convergence threshold for |
mxitnr |
maximum iterations for the IRLS loop in |
factory |
If |
Details
If called with no arguments, multiview.control()
returns a list with the
current settings of these parameters. Any arguments included in the call
sets those parameters to the new values, and then silently returns. The
values set are persistent for the duration of the R session.
Value
A list with named elements as in the argument list
See Also
multiview
Examples
multiview.control(fdev = 0) #continue along path even though not much changes
multiview.control() # view current settings
multiview.control(factory = TRUE) # reset all the parameters to their default
Fit a Cox regression model with elastic net regularization for a single value of lambda
Description
Fit a Cox regression model via penalized maximum likelihood for a single value of lambda. Can deal with (start, stop] data and strata, as well as sparse design matrices.
Usage
multiview.cox.fit(
x_list,
x,
y,
rho,
weights,
lambda,
alpha = 1,
offset = rep(0, nobs),
thresh = 1e-10,
maxit = 1e+05,
penalty.factor = rep(1, nvars),
exclude = c(),
lower.limits = -Inf,
upper.limits = Inf,
warm = NULL,
from.cox.path = FALSE,
save.fit = FALSE,
trace.it = 0
)
Arguments
x_list |
a list of |
x |
the |
y |
the quantitative response with length equal to |
rho |
the weight on the agreement penalty, default 0. |
weights |
observation weights. Can be total counts if responses are proportion matrices. Default is 1 for each observation |
lambda |
A single value for the |
alpha |
The elasticnet mixing parameter, with
|
offset |
A vector of length |
thresh |
Convergence threshold for coordinate descent. Each
inner coordinate-descent loop continues until the maximum change
in the objective after any coefficient update is less than
|
maxit |
Maximum number of passes over the data for all lambda values; default is 10^5. |
penalty.factor |
Separate penalty factors can be applied to
each coefficient. This is a number that multiplies |
exclude |
Indices of variables to be excluded from the
model. Default is none. Equivalent to an infinite penalty factor
for the variables excluded (next item). Users can supply instead
an |
lower.limits |
Vector of lower limits for each coefficient;
default |
upper.limits |
Vector of upper limits for each coefficient;
default |
warm |
Either a |
from.cox.path |
Was |
save.fit |
Return the warm start object? Default is FALSE. |
trace.it |
If |
Details
WARNING: Users should not call multiview.cox.fit
directly. Higher-level
functions in this package call multiview.cox.fit
as a subroutine. If a
warm start object is provided, some of the other arguments in the function
may be overriden.
multiview.cox.fit
solves the elastic net problem for a single, user-specified
value of lambda. multiview.cox.fit
works for Cox regression models, including
(start, stop] data and strata. It solves the problem using iteratively
reweighted least squares (IRLS). For each IRLS iteration, multiview.cox.fit
makes a quadratic (Newton) approximation of the log-likelihood, then calls
elnet.fit
to minimize the resulting approximation.
In terms of standardization: multiview.cox.fit
does not standardize x
and weights
. penalty.factor
is standardized so that they sum
up to nvars
.
Value
An object with class "coxnet", "glmnetfit" and "glmnet". The list returned contains more keys than that of a "glmnet" object.
a0 |
Intercept value, |
beta |
A |
df |
The number of nonzero coefficients. |
dim |
Dimension of coefficient matrix. |
lambda |
Lambda value used. |
dev.ratio |
The fraction of (null) deviance explained. The deviance calculations incorporate weights if present in the model. The deviance is defined to be 2*(loglike_sat - loglike), where loglike_sat is the log-likelihood for the saturated model (a model with a free parameter per observation). Hence dev.ratio=1-dev/nulldev. |
nulldev |
Null deviance (per observation). This is defined to be 2*(loglike_sat -loglike(Null)). The null model refers to the 0 model. |
npasses |
Total passes over the data. |
jerr |
Error flag, for warnings and errors (largely for internal debugging). |
offset |
A logical variable indicating whether an offset was included in the model. |
call |
The call that produced this object. |
nobs |
Number of observations. |
warm_fit |
If |
family |
Family used for the model, always "cox". |
converged |
A logical variable: was the algorithm judged to have converged? |
boundary |
A logical variable: is the fitted value on the boundary of the attainable values? |
obj_function |
Objective function value at the solution. |
Fit a Cox regression model with elastic net regularization for a path of lambda values
Description
Fit a Cox regression model via penalized maximum likelihood for a path of lambda values. Can deal with (start, stop] data and strata, as well as sparse design matrices.
Usage
multiview.cox.path(
x_list,
x,
y,
rho = 0,
weights = NULL,
lambda = NULL,
offset = NULL,
alpha = 1,
nlambda = 100,
lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04),
standardize = TRUE,
intercept = TRUE,
thresh = 1e-07,
exclude = integer(0),
penalty.factor = rep(1, nvars),
lower.limits = -Inf,
upper.limits = Inf,
maxit = 1e+05,
trace.it = 0,
nvars,
nobs,
xm,
xs,
control,
vp,
vnames,
is.offset
)
Arguments
x_list |
a list of |
x |
the |
y |
the quantitative response with length equal to |
rho |
the weight on the agreement penalty, default 0. |
weights |
observation weights. Can be total counts if responses are proportion matrices. Default is 1 for each observation |
lambda |
A user supplied |
offset |
A vector of length |
alpha |
The elasticnet mixing parameter, with
|
nlambda |
The number of |
lambda.min.ratio |
Smallest value for |
standardize |
Logical flag for x variable standardization,
prior to fitting the model sequence. The coefficients are always
returned on the original scale. Default is
|
intercept |
Should intercept(s) be fitted (default |
thresh |
Convergence threshold for coordinate descent. Each
inner coordinate-descent loop continues until the maximum change
in the objective after any coefficient update is less than
|
exclude |
Indices of variables to be excluded from the
model. Default is none. Equivalent to an infinite penalty factor
for the variables excluded (next item). Users can supply instead
an |
penalty.factor |
Separate penalty factors can be applied to
each coefficient. This is a number that multiplies |
lower.limits |
Vector of lower limits for each coefficient;
default |
upper.limits |
Vector of upper limits for each coefficient;
default |
maxit |
Maximum number of passes over the data for all lambda values; default is 10^5. |
trace.it |
If |
nvars |
the number of variables (total) |
nobs |
the number of observations |
xm |
the column means vector (could be zeros if |
xs |
the column std dev vector (could be 1s if |
control |
the multiview control object |
vp |
the variable penalities (processed) |
vnames |
the variable names |
is.offset |
a flag indicating if offset is supplied or not |
Details
Sometimes the sequence is truncated before nlambda
values of lambda
have been used. This happens when cox.path
detects that the
decrease in deviance is marginal (i.e. we are near a saturated fit).
Value
An object of class "coxnet" and "glmnet".
a0 |
Intercept value, |
beta |
A |
df |
The number of nonzero coefficients for each value of lambda. |
dim |
Dimension of coefficient matrix. |
lambda |
The actual sequence of lambda values used. When alpha=0, the largest lambda reported does not quite give the zero coefficients reported (lambda=inf would in principle). Instead, the largest lambda for alpha=0.001 is used, and the sequence of lambda values is derived from this. |
dev.ratio |
The fraction of (null) deviance explained. The deviance calculations incorporate weights if present in the model. The deviance is defined to be 2*(loglike_sat - loglike), where loglike_sat is the log-likelihood for the saturated model (a model with a free parameter per observation). Hence dev.ratio=1-dev/nulldev. |
nulldev |
Null deviance (per observation). This is defined to be 2*(loglike_sat -loglike(Null)). The null model refers to the 0 model. |
npasses |
Total passes over the data summed over all lambda values. |
jerr |
Error flag, for warnings and errors (largely for internal debugging). |
offset |
A logical variable indicating whether an offset was included in the model. |
call |
The call that produced this object. |
nobs |
Number of observations. |
Examples
set.seed(2)
nobs <- 100; nvars <- 15
xvec <- rnorm(nobs * nvars)
xvec[sample.int(nobs * nvars, size = 0.4 * nobs * nvars)] <- 0
x <- matrix(xvec, nrow = nobs)
beta <- rnorm(nvars / 3)
fx <- x[, seq(nvars / 3)] %*% beta / 3
ty <- rexp(nobs, exp(fx))
tcens <- rbinom(n = nobs, prob = 0.3, size = 1)
jsurv <- survival::Surv(ty, tcens)
fit1 <- glmnet:::cox.path(x, jsurv)
# works with sparse x matrix
x_sparse <- Matrix::Matrix(x, sparse = TRUE)
fit2 <- glmnet:::cox.path(x_sparse, jsurv)
# example with (start, stop] data
set.seed(2)
start_time <- runif(100, min = 0, max = 5)
stop_time <- start_time + runif(100, min = 0.1, max = 3)
status <- rbinom(n = nobs, prob = 0.3, size = 1)
jsurv_ss <- survival::Surv(start_time, stop_time, status)
fit3 <- glmnet:::cox.path(x, jsurv_ss)
# example with strata
jsurv_ss2 <- glmnet::stratifySurv(jsurv_ss, rep(1:2, each = 50))
fit4 <- glmnet:::cox.path(x, jsurv_ss2)
Fit a GLM with elastic net regularization for a single value of lambda
Description
Fit a generalized linear model via penalized maximum likelihood for a single value of lambda. Can deal with any GLM family.
Usage
multiview.fit(
x_list,
x,
y,
rho,
weights,
lambda,
alpha = 1,
offset = rep(0, nobs),
family = gaussian(),
intercept = TRUE,
thresh = 1e-07,
maxit = 1e+05,
penalty.factor = rep(1, nvars),
exclude = c(),
lower.limits = -Inf,
upper.limits = Inf,
warm = NULL,
from.multiview.path = FALSE,
save.fit = FALSE,
trace.it = 0,
user_lambda = FALSE
)
Arguments
x_list |
a list of |
x |
the column-binded entries of |
y |
the quantitative response with length equal to |
rho |
the weight on the agreement penalty, default 0. |
weights |
observation weights. Can be total counts if responses are proportion matrices. Default is 1 for each observation |
lambda |
A single value for the |
alpha |
The elasticnet mixing parameter, with
|
offset |
A vector of length |
family |
A description of the error distribution and link function to be used in the model. This is the result of a call to a family function. Default is stats::gaussian. (See stats::family for details on family functions.) |
intercept |
Should intercept(s) be fitted (default |
thresh |
Convergence threshold for coordinate descent. Each
inner coordinate-descent loop continues until the maximum change
in the objective after any coefficient update is less than
|
maxit |
Maximum number of passes over the data; default is
|
penalty.factor |
Separate penalty factors can be applied to
each coefficient. This is a number that multiplies |
exclude |
Indices of variables to be excluded from the
model. Default is none. Equivalent to an infinite penalty factor
for the variables excluded (next item). Users can supply instead
an |
lower.limits |
Vector of lower limits for each coefficient;
default |
upper.limits |
Vector of upper limits for each coefficient;
default |
warm |
Either a |
from.multiview.path |
Was |
save.fit |
Return the warm start object? Default is |
trace.it |
Controls how much information is printed to
screen. If |
user_lambda |
a flag indicating if user supplied the lambda sequence |
Details
WARNING: Users should not call multiview.fit
directly. Higher-level functions in this package call
multiview.fit
as a subroutine. If a warm start object is
provided, some of the other arguments in the function may be
overriden.
multiview.fit
solves the elastic net problem for a single,
user-specified value of lambda. multiview.fit
works for any GLM
family. It solves the problem using iteratively reweighted least
squares (IRLS). For each IRLS iteration, multiview.fit
makes a
quadratic (Newton) approximation of the log-likelihood, then calls
elnet.fit
to minimize the resulting approximation.
In terms of standardization: multiview.fit
does not standardize
x
and weights
. penalty.factor
is standardized so that to sum
to nvars
.
Value
An object with class "multiview"
. The list
returned contains more keys than that of a "multiview"
object.
a0 |
Intercept value. |
beta |
A |
df |
The number of nonzero coefficients. |
dim |
Dimension of coefficient matrix. |
lambda |
Lambda value used. |
lambda_scale |
The multiview lambda scale factor |
dev.ratio |
The fraction of (null) deviance explained. The deviance calculations incorporate weights if present in the model. The deviance is defined to be 2*(loglike_sat - loglike), where loglike_sat is the log-likelihood for the saturated model (a model with a free parameter per observation). Hence dev.ratio=1-dev/nulldev. |
nulldev |
Null deviance (per observation). This is defined to be 2*(loglike_sat -loglike(Null)). The null model refers to the intercept model. |
npasses |
Total passes over the data. |
jerr |
Error flag, for warnings and errors (largely for internal debugging). |
offset |
A logical variable indicating whether an offset was included in the model. |
call |
The call that produced this object. |
nobs |
Number of observations. |
warm_fit |
If |
family |
Family used for the model. |
converged |
A logical variable: was the algorithm judged to have converged? |
boundary |
A logical variable: is the fitted value on the boundary of the attainable values? |
obj_function |
Objective function value at the solution. |
Fit a GLM with elastic net regularization for a path of lambda values
Description
Fit a generalized linear model via penalized maximum likelihood for a path of lambda values. Can deal with any GLM family.
Usage
multiview.path(
x_list,
y,
rho = 0,
weights = NULL,
lambda,
nlambda,
user_lambda = FALSE,
alpha = 1,
offset = NULL,
family = gaussian(),
standardize = TRUE,
intercept = TRUE,
thresh = 1e-07,
maxit = 1e+05,
penalty.factor = rep(1, nvars),
exclude = integer(0),
lower.limits = -Inf,
upper.limits = Inf,
trace.it = 0,
x,
nvars,
nobs,
xm,
xs,
control,
vp,
vnames,
start_val,
is.offset
)
Arguments
x_list |
a list of |
y |
the quantitative response with length equal to |
rho |
the weight on the agreement penalty, default 0. |
weights |
observation weights. Can be total counts if responses are proportion matrices. Default is 1 for each observation |
lambda |
A user supplied |
nlambda |
The number of |
user_lambda |
a flag indicating if user supplied the lambda sequence |
alpha |
The elasticnet mixing parameter, with
|
offset |
A vector of length |
family |
A description of the error distribution and link function to be used in the model. This is the result of a call to a family function. Default is stats::gaussian. (See stats::family for details on family functions.) |
standardize |
Logical flag for x variable standardization,
prior to fitting the model sequence. The coefficients are always
returned on the original scale. Default is
|
intercept |
Should intercept(s) be fitted (default |
thresh |
Convergence threshold for coordinate descent. Each
inner coordinate-descent loop continues until the maximum change
in the objective after any coefficient update is less than
|
maxit |
Maximum number of passes over the data for all lambda values; default is 10^5. |
penalty.factor |
Separate penalty factors can be applied to
each coefficient. This is a number that multiplies |
exclude |
Indices of variables to be excluded from the
model. Default is none. Equivalent to an infinite penalty factor
for the variables excluded (next item). Users can supply instead
an |
lower.limits |
Vector of lower limits for each coefficient;
default |
upper.limits |
Vector of upper limits for each coefficient;
default |
trace.it |
If |
x |
the |
nvars |
the number of variables (total) |
nobs |
the number of observations |
xm |
the column means vector (could be zeros if |
xs |
the column std dev vector (could be 1s if |
control |
the multiview control object |
vp |
the variable penalities (processed) |
vnames |
the variable names |
start_val |
the result of first call to |
is.offset |
a flag indicating if offset is supplied or not |
Details
multiview.path
solves the elastic net problem for a path of lambda values.
It generalizes multiview::multiview
in that it works for any GLM family.
Sometimes the sequence is truncated before nlam
values of lambda
have been used. This happens when multiview.path
detects that the decrease
in deviance is marginal (i.e. we are near a saturated fit).
Value
An object with class "multiview"
"glmnetfit"
and "glmnet"
a0 |
Intercept sequence of length |
beta |
A |
df |
The number of nonzero coefficients for each value of lambda. |
dim |
Dimension of coefficient matrix. |
lambda |
The actual sequence of lambda values used. When alpha=0, the largest lambda reported does not quite give the zero coefficients reported (lambda=inf would in principle). Instead, the largest lambda for alpha=0.001 is used, and the sequence of lambda values is derived from this. |
lambda |
The sequence of lambda values |
mvlambda |
The corresponding sequence of multiview lambda values |
dev.ratio |
The fraction of (null) deviance explained. The deviance calculations incorporate weights if present in the model. The deviance is defined to be 2*(loglike_sat - loglike), where loglike_sat is the log-likelihood for the saturated model (a model with a free parameter per observation). Hence dev.ratio=1-dev/nulldev. |
nulldev |
Null deviance (per observation). This is defined to be 2*(loglike_sat -loglike(Null)). The null model refers to the intercept model. |
npasses |
Total passes over the data summed over all lambda values. |
jerr |
Error flag, for warnings and errors (largely for internal debugging). |
offset |
A logical variable indicating whether an offset was included in the model. |
call |
The call that produced this object. |
family |
Family used for the model. |
nobs |
Number of observations. |
Elastic net objective function value
Description
Returns the elastic net objective function value.
Usage
obj_function(
y,
mu,
weights,
family,
lambda,
alpha,
coefficients,
vp,
view_components,
rho
)
Arguments
y |
Quantitative response variable. |
mu |
Model's predictions for |
weights |
Observation weights. |
family |
A description of the error distribution and link function to be used in the model. This is the result of a call to a family function. |
lambda |
A single value for the |
alpha |
The elasticnet mixing parameter, with |
coefficients |
The model's coefficients (excluding intercept). |
vp |
Penalty factors for each of the coefficients. |
view_components |
a list of lists containing indices of coefficients and associated covariate (view) pairs |
rho |
the fusion parameter |
Elastic net penalty value
Description
Returns the elastic net penalty value without the lambda
factor.
Usage
pen_function(coefficients, alpha = 1, vp = 1)
Arguments
coefficients |
The model's coefficients (excluding intercept). |
alpha |
The elasticnet mixing parameter, with |
vp |
Penalty factors for each of the coefficients. |
Details
The penalty is defined as
(1-\alpha)/2 \sum vp_j \beta_j^2 + \alpha \sum vp_j |\beta|.
Note the omission of the multiplicative lambda
factor.
Plot coefficients from a "multiview" object
Description
Produces a coefficient profile plot of the coefficient paths for a fitted
"multiview"
object. The paths are colored by the data views, from which the features come.
Usage
## S3 method for class 'multiview'
plot(x, col_palette = NULL, label = FALSE, ...)
Arguments
x |
A fitted |
col_palette |
A set of colors to use for indicating different views. If |
label |
If |
... |
Other graphical parameters to plot. |
Value
a NULL
value as this function is really meant for its side-effect of generating a plot.
Examples
# Gaussian
x = matrix(rnorm(100 * 20), 100, 20)
z = matrix(rnorm(100 * 10), 100, 10)
y = rnorm(100)
fit1 = multiview(list(x=x,z=z), y, rho = 0)
plot(fit1, label = TRUE)
# Binomial
by = sample(c(0,1), 100, replace = TRUE)
fit2 = multiview(list(x=x,z=z), by, family = binomial(), rho=0.5)
plot(fit2, label=FALSE)
# Poisson
py = matrix(rpois(100, exp(y)))
fit3 = multiview(list(x=x,z=z), py, family = poisson(), rho=0.5)
plot(fit3, label=TRUE)
Make predictions from a "cv.multiview" object.
Description
This function makes predictions from a cross-validated multiview model, using
the stored "multiview"
object, and the optimal value chosen for
lambda
.
Usage
## S3 method for class 'cv.multiview'
predict(object, newx, s = c("lambda.1se", "lambda.min"), ...)
Arguments
object |
Fitted |
newx |
List of new view matrices at which predictions are to be made. |
s |
Value(s) of the penalty parameter |
... |
Not used. Other arguments to predict. |
Details
This function makes it easier to use the results of cross-validation to make a prediction.
Value
The object returned depends on the ... argument which is passed
on to the predict
method for multiview
objects.
Examples
# Gaussian
# Generate data based on a factor model
set.seed(1)
x = matrix(rnorm(100*10), 100, 10)
z = matrix(rnorm(100*10), 100, 10)
U = matrix(rnorm(100*5), 100, 5)
for (m in seq(5)){
u = rnorm(100)
x[, m] = x[, m] + u
z[, m] = z[, m] + u
U[, m] = U[, m] + u}
x = scale(x, center = TRUE, scale = FALSE)
z = scale(z, center = TRUE, scale = FALSE)
beta_U = c(rep(0.1, 5))
y = U %*% beta_U + 0.1 * rnorm(100)
fit1 = cv.multiview(list(x=x,z=z), y, rho = 0.3)
predict(fit1, newx = list(x[1:5, ],z[1:5,]), s = "lambda.min")
# Binomial
by = 1 * (y > median(y))
fit2 = cv.multiview(list(x=x,z=z), by, family = binomial(), rho = 0.9)
predict(fit2, newx = list(x[1:5, ],z[1:5,]), s = "lambda.min", type = "response")
# Poisson
py = matrix(rpois(100, exp(y)))
fit3 = cv.multiview(list(x=x,z=z), py, family = poisson(), rho = 0.6)
predict(fit3, newx = list(x[1:5, ],z[1:5,]), s = "lambda.min", type = "response")
Get predictions from a multiview
fit object
Description
Gives fitted values, linear predictors, coefficients and number of non-zero
coefficients from a fitted multiview
object.
Usage
## S3 method for class 'multiview'
predict(
object,
newx,
s = NULL,
type = c("link", "response", "coefficients", "class", "nonzero"),
exact = FALSE,
newoffset,
...
)
Arguments
object |
Fitted "multiview" object. |
newx |
list of new matrices for |
s |
Value(s) of the penalty parameter lambda at which predictions are required. Default is the entire sequence used to create the model. |
type |
Type of prediction required. Type "link" gives the linear predictors (eta scale); Type "response" gives the fitted values (mu scale). Type "coefficients" computes the coefficients at the requested values for s. Type "nonzero" returns a list of the indices of the nonzero coefficients for each value of s. Type "class" returns class labels for binomial family only. |
exact |
This argument is relevant only when predictions are made at values
of |
newoffset |
If an offset is used in the fit, then one must be supplied for making predictions (except for type="coefficients" or type="nonzero"). |
... |
This is the mechanism for passing arguments like |
Value
The object returned depends on type.
Examples
# Gaussian
x = matrix(rnorm(100 * 20), 100, 20)
z = matrix(rnorm(100 * 20), 100, 20)
y = rnorm(100)
fit1 = multiview(list(x=x,z=z), y, rho = 0)
predict(fit1, newx = list(x[1:10, ],z[1:10, ]), s = c(0.01, 0.005))
# Binomial
by = sample(c(0,1), 100, replace = TRUE)
fit2 = multiview(list(x=x,z=z), by, family = binomial(), rho=0.5)
predict(fit2, newx = list(x[1:10, ],z[1:10, ]), s = c(0.01, 0.005), type = "response")
# Poisson
py = matrix(rpois(100, exp(y)))
fit3 = multiview(list(x=x,z=z), py, family = poisson(), rho=0.5)
predict(fit3, newx = list(x[1:10, ],z[1:10, ]), s = c(0.01, 0.005), type = "response")
Return a new list of x matrices of same shapes as those in x_list
Description
Return a new list of x matrices of same shapes as those in x_list
Usage
reshape_x_to_xlist(x, x_list)
Arguments
x |
the column-binded entries of |
x_list |
a list of |
Make response for coxnet
Description
Internal function to make the response y passed to glmnet suitable for coxnet (i.e. glmnet with family = "cox"). Sanity checks are performed here too.
Usage
response.coxnet(y)
Arguments
y |
Response variable. Either a class "Surv" object or a two-column matrix with columns named 'time' and 'status'. |
Details
If y is a class "Surv" object, this function returns y with no changes. If y is a two-column matrix with columns named 'time' and 'status', it is converted into a "Surv" object.
Value
A class "Surv" object.
Select x_list columns specified by (conformable) list of indices
Description
Select x_list columns specified by (conformable) list of indices
Usage
select_matrix_list_columns(x_list, indices)
Arguments
x_list |
a list of |
indices |
a vector of indices in |
Value
a list of x matrices
Translate from column indices in list of x matrices to indices in
1:nvars
. No sanity checks for efficiency
Description
Translate from column indices in list of x matrices to indices in
1:nvars
. No sanity checks for efficiency
Usage
to_nvar_index(x_list, index_list)
Arguments
x_list |
a list of |
index_list |
a list of column indices for each matrix, including possibly column indices of length 0 |
Value
a vector of indices between 1 and nvars = sum of ncol(x)
for x in x_list
Translate indices in 1:nvars
to column indices in list of x
matrices. No sanity checks
Description
Translate indices in 1:nvars
to column indices in list of x
matrices. No sanity checks
Usage
to_xlist_index(x_list, index)
Arguments
x_list |
a list of |
index |
vector of indices between 1 and nvars = sum of
|
Value
a conformed list of column indices for each matrix, including possibly column indices of length 0
Evaluate the contribution of data views in making prediction
Description
Evaluate the contribution of each data view in making prediction. The function has two options.
If force
is set to NULL
, the data view contribution is benchmarked by the null model.
If force
is set to a list of data views, the contribution is benchmarked by the model fit on
this list of data views, and the function evaluates the marginal contribution of each additional data
view on top of this benchmarking list of views.
The function returns a table showing the percentage improvement in reducing error as compared to the bechmarking model
made by each data view.
Usage
view.contribution(
x_list,
y,
family = gaussian(),
rho,
s = c("lambda.min", "lambda.1se"),
eval_data = c("train", "test"),
weights = NULL,
type.measure = c("default", "mse", "deviance", "class", "auc", "mae", "C"),
x_list_test = NULL,
test_y = NULL,
nfolds = 10,
foldid = NULL,
force = NULL,
...
)
Arguments
x_list |
a list of |
y |
the quantitative response with length equal to |
family |
A description of the error distribution and link function to be used in the model. This is the result of a call to a family function. Default is stats::gaussian. (See stats::family for details on family functions.) |
rho |
the weight on the agreement penalty, default 0. |
s |
Value(s) of the penalty parameter |
eval_data |
If |
weights |
Observation weights; defaults to 1 per observation |
type.measure |
loss to use for cross-validation. Currently
five options, not all available for all models. The default is
|
x_list_test |
A list of |
test_y |
The quantitative response in the test data with length equal to the
number of rows in each |
nfolds |
number of folds - default is 10. Although |
foldid |
an optional vector of values between 1 and |
force |
If |
... |
Other arguments that can be passed to |
Value
a data frame consisting of the view, error metric, and percentage improvement.
Examples
set.seed(3)
# Simulate data based on the factor model
x = matrix(rnorm(200*20), 200, 20)
z = matrix(rnorm(200*20), 200, 20)
w = matrix(rnorm(200*20), 200, 20)
U = matrix(rep(0, 200*10), 200, 10) # latent factors
for (m in seq(10)){
u = rnorm(200)
x[, m] = x[, m] + u
z[, m] = z[, m] + u
w[, m] = w[, m] + u
U[, m] = U[, m] + u}
beta_U = c(rep(2, 5),rep(-2, 5))
y = U %*% beta_U + 3 * rnorm(100)
# Split training and test sets
smp_size_train = floor(0.9 * nrow(x))
train_ind = sort(sample(seq_len(nrow(x)), size = smp_size_train))
test_ind = setdiff(seq_len(nrow(x)), train_ind)
train_X = scale(x[train_ind, ])
test_X = scale(x[test_ind, ])
train_Z <- scale(z[train_ind, ])
test_Z <- scale(z[test_ind, ])
train_W <- scale(w[train_ind, ])
test_W <- scale(w[test_ind, ])
train_y <- y[train_ind, ]
test_y <- y[test_ind, ]
foldid = sample(rep_len(1:10, dim(train_X)[1]))
# Benchmarked by the null model:
rho = 0.3
view.contribution(x_list=list(x=train_X,z=train_Z), train_y, rho = rho,
eval_data = 'train', family = gaussian())
view.contribution(x_list=list(x=train_X,z=train_Z), train_y, rho = rho,
eval_data = 'test', family = gaussian(),
x_list_test=list(x=test_X,z=test_Z), test_y=test_y)
# Force option -- benchmarked by the model train on a specified list of data views:
view.contribution(x_list=list(x=train_X,z=train_Z,w=train_W), train_y, rho = rho,
eval_data = 'train', family = gaussian(), force=list(x=train_X))
Helper function to compute weighted mean and standard deviation
Description
Helper function to compute weighted mean and standard deviation. Deals gracefully whether x is sparse matrix or not.
Usage
weighted_mean_sd(x, weights = rep(1, nrow(x)))
Arguments
x |
Observation matrix. |
weights |
Optional weight vector. |
Value
A list with components.
mean |
vector of weighted means of columns of x |
sd |
vector of weighted standard deviations of columns of x |