Type: | Package |
Title: | The Generalized Semi-Supervised Elastic-Net |
Version: | 1.0.7 |
Date: | 2024-03-04 |
Description: | Implements the generalized semi-supervised elastic-net. This method extends the supervised elastic-net problem, and thus it is a practical solution to the problem of feature selection in semi-supervised contexts. Its mathematical formulation is presented from a general perspective, covering a wide range of models. We focus on linear and logistic responses, but the implementation could be easily extended to other losses in generalized linear models. We develop a flexible and fast implementation, written in 'C++' using 'RcppArmadillo' and integrated into R via 'Rcpp' modules. See Culp, M. 2013 <doi:10.1080/10618600.2012.657139> for references on the Joint Trained Elastic-Net. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Imports: | Rcpp, methods, MASS |
Depends: | stats |
LinkingTo: | Rcpp, RcppArmadillo |
Suggests: | knitr, rmarkdown, glmnet, Metrics, testthat |
VignetteBuilder: | knitr |
URL: | https://github.com/jlaria/s2net |
BugReports: | https://github.com/jlaria/s2net/issues |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.0 |
NeedsCompilation: | yes |
Packaged: | 2024-03-31 08:37:38 UTC; root |
Author: | Juan C. Laria |
Maintainer: | Juan C. Laria <juank.laria@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-03-31 10:30:02 UTC |
Class s2net
Description
This is the main class of this library, implemented in C++ and exposed to R using Rcpp
modules.
It can be used in R directly, although some generic S4 methods have been implemented to make it easier to interact in R.
Methods
- predict
signature(object = "Rcpp_s2net")
: Seepredict_Rcpp_s2net
Fields
beta
:Object of class
matrix
. The fitted model coefficients.intercept
:The model intercept.
Class-Based Methods
initialize(data, loss)
:-
data
-
s2Data
object loss
Loss function: 0 = linear, 1 = logit
setupFista(s2Fista)
:Configures the FISTA internal algorithm.
predict(newX, type)
:-
newX
New data
matrix
to make predictions.type
0 = default, 1 = response, 2 = probs, 3 = class
fit(params, frame, proj)
:-
params
s2Params
objectframe
0 = "JT", 1 = "ExtJT"
proj
0 = no, 1 = yes, 2 = auto
Author(s)
Juan C. Laria
Examples
data("auto_mpg")
train = s2Data(xL = auto_mpg$P1$xL, yL = auto_mpg$P1$yL, xU = auto_mpg$P1$xU)
# We create the C++ object calling the new method (constructor)
obj = new(s2net, train, 0) # 0 = regression
obj
# We call directly the $fit method of obj,
obj$fit(s2Params(lambda1 = 0.01,
lambda2 = 0.01,
gamma1 = 0.05,
gamma2 = 100,
gamma3 = 0.05), 1, 2)
# fitted model
obj$beta
# We can test the results using the unlabeled data
test = s2Data(xL = auto_mpg$P1$xU, yL = auto_mpg$P1$yU, preprocess = train)
ypred = obj$predict(test$xL, 0)
## Not run:
if(require(ggplot2)){
ggplot() +
aes(x = ypred, y = test$yL) + geom_point() +
geom_abline(intercept = 0, slope = 1, linetype = 2)
}
## End(Not run)
Auto MPG Data Set
Description
This dataset was taken from the UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets/Auto+MPG, and processed for the semi-supervised setting (Ryan and Culp, 2015).
Usage
data("auto_mpg")
Format
There are two lists that contain partitions from a data frame with 398 observations on the following 9 variables.
mpg
a numeric vector
cylinders
an ordered factor with levels
3
<4
<5
<6
<8
displacement
a numeric vector
horsepower
a numeric vector
weight
a numeric vector
acceleration
a numeric vector
year
a numeric vector
origin
a factor
Details
This dataset is a slightly modified version of the dataset provided in the StatLib library. In line with the use by Ross Quinlan (1993) in predicting the attribute "mpg", 8 of the original instances were removed because they had unknown values for the "mpg" attribute. "The data concerns city-cycle fuel consumption in miles per gallon, to be predicted in terms of 3 multivalued discrete and 5 continuous attributes." (Quinlan, 1993)
Source
Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/]. Irvine, CA: University of California, School of Information and Computer Science.
References
Ryan, K. J., & Culp, M. V. (2015). On semi-supervised linear regression in covariate shift problems. The Journal of Machine Learning Research, 16(1), 3183-3217.
Examples
data(auto_mpg)
head(auto_mpg$P1$xL)
S3 Methods for s2netR
objects.
Description
Generic predict method. Wrapper for the C++ class method s2net$predict
.
Usage
## S3 method for class 's2netR'
predict(object, newX, type = "default", ...)
Arguments
object |
A |
newX |
A matrix with the data to make predictions. It should be in the same scale as the original data. See |
type |
Type of predictions. One of |
... |
other parameters passed to predict |
Value
A column matrix with predictions.
See Also
Examples
data("auto_mpg")
train = s2Data(xL = auto_mpg$P1$xL, yL = auto_mpg$P1$yL, xU = auto_mpg$P1$xU)
model = s2netR(train,
s2Params(lambda1 = 0.1,
lambda2 = 0,
gamma1 = 0.1,
gamma2 = 100,
gamma3 = 0.1),
loss = "linear",
frame = "ExtJT",
proj = "auto",
fista = s2Fista(5000, 1e-7, 1, 0.8))
valid = s2Data(auto_mpg$P1$xU, auto_mpg$P1$yU, preprocess = train)
ypred = predict(model, valid$xL)
## Not run:
if(require(ggplot2)){
ggplot() +
aes(x = ypred, y = valid$yL) + geom_point() +
geom_abline(intercept = 0, slope = 1, linetype = 2)
}
## End(Not run)
Predict method for s2net
C++ class.
Description
This function provides an interface in R for the method predict
in C++ class s2net
.
Usage
predict_Rcpp_s2net(object, newX, type = "default")
Arguments
object |
An object of class |
newX |
Data to make predictions. Could be a |
type |
Type of predictions. One of |
Details
This method is included as a high-level wrapper of object$predict()
.
Value
Returns a column matrix
with the same number of rows/observations as newX
.
Author(s)
Juan C. Laria
See Also
Print methods for S3 objects
Description
Very simple print methods to show basic information about these simple S3 objects.
Usage
## S3 method for class 's2Data'
print(x, ...)
## S3 method for class 's2Fista'
print(x, ...)
Arguments
x |
S3 object of class |
... |
other parameters passed to print |
See Also
Data wrapper for s2net
.
Description
This function preprocess the data to fit a semi-supervised linear joint trained model.
Usage
s2Data(xL, yL, xU = NULL, preprocess = T)
Arguments
xL |
The labeled data. Could be a |
yL |
The labels associated with |
xU |
The unlabeled data (optional). Could be a |
preprocess |
Should the input data be pre-processed? Possible values are:
Another object of class |
Value
Returns an object of S3 class s2Data
with fields
xL |
Transformed labeled data |
yL |
Transformed labels. If |
xU |
Tranformed unlabeled data |
type |
Type of task. This one is inferred from the response labels. |
base |
Base category for classification |
In addition the following attributes are stored.
pr:rm_cols |
logical vector of removed columns |
pr:center |
column center |
pr:scale |
column scale |
pr:ycenter |
yL center. Regression |
pr:yscale |
yL scale. Regression |
Author(s)
Juan C. Laria
See Also
Examples
data("auto_mpg")
train = s2Data( xL = auto_mpg$P1$xL,
yL = auto_mpg$P1$yL,
xU = auto_mpg$P1$xU,
preprocess = TRUE )
show(train)
# Notice how ordered factor variable $cylinders is handled
# .L (linear) .Q (quadratic) .C (cubic) and .^4
head(train$xL)
#if you want to do validation with the unlabeled data
idx = sample(length(auto_mpg$P1$yU), 200)
train = s2Data(xL = auto_mpg$P1$xL, yL = auto_mpg$P1$yL, xU = auto_mpg$P1$xU[idx, ])
valid = s2Data(xL = auto_mpg$P1$xU[-idx, ], yL = auto_mpg$P1$yU[-idx], preprocess = train)
test = s2Data(xL = auto_mpg$P1$xU[idx, ], yL = auto_mpg$P1$yU[idx], preprocess = train)
train
valid
test
Hyper-parameter wrapper for FISTA.
Description
This is a very simple function that supplies the hyper-parameters for the Fast Iterative Soft-Threshold Algorithm (FISTA) that solves the s2net
minimization problem.
Usage
s2Fista(MAX_ITER_INNER = 5000, TOL = 1e-07, t0 = 2, step = 0.1, use_warmstart = FALSE)
Arguments
MAX_ITER_INNER |
Number of iterations of FISTA |
TOL |
The relative tolerance. The algorith stops when the objective does not improve more than |
t0 |
The initial stepsize for backtracking. |
step |
The scale factor in the stepsize to backtrack until a valid step is found. |
use_warmstart |
Should we use a warm |
Value
Returns an object of S3 class s2Fista
with the input arguments as fields.
References
Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1), 183-202. doi:10.1137/080716542
See Also
Hyper-parameter wrapper for s2net
Description
This is a very simple function that collapses the input parameters into a named vector to supply to C++ methods.
Usage
s2Params(lambda1, lambda2 = 0, gamma1 = 0, gamma2 = 0, gamma3 = 0)
Arguments
lambda1 |
elastic-net regularization parameter - |
lambda2 |
elastic-net regularization parameter - |
gamma1 |
s2net weight hyper-parameter. |
gamma2 |
s2net covariance hyper-parameter (between 1 and |
gamma3 |
s2net shift hyper-parameter (between 0 and 1). |
Value
Returns a named vector of S3 class s2Params
.
See Also
The Generalized Semi-Supervised Elastic-Net
Description
Implements the generalized semi-supervised elastic-net. This method extends the supervised elastic-net problem, and thus it is a practical solution to the problem of feature selection in semi-supervised contexts. Its mathematical formulation is presented from a general perspective, covering a wide range of models. We focus on linear and logistic responses, but the implementation could be easily extended to other losses in generalized linear models. We develop a flexible and fast implementation, written in 'C++' using 'RcppArmadillo' and integrated into R via 'Rcpp' modules. See Culp, M. 2013 <doi:10.1080/10618600.2012.657139> for references on the Joint Trained Elastic-Net.
Details
The DESCRIPTION file:
Package: | s2net |
Type: | Package |
Title: | The Generalized Semi-Supervised Elastic-Net |
Version: | 1.0.7 |
Date: | 2024-03-04 |
Authors@R: | c(person("Juan C.", "Laria",, role = c("aut", "cre"), email = "juank.laria@gmail.com", comment = c(ORCID = "0000-0001-7734-9647")), person("Line H.", "Clemmensen",, role = c("aut"), email = "lkhc@dtu.dk")) |
Description: | Implements the generalized semi-supervised elastic-net. This method extends the supervised elastic-net problem, and thus it is a practical solution to the problem of feature selection in semi-supervised contexts. Its mathematical formulation is presented from a general perspective, covering a wide range of models. We focus on linear and logistic responses, but the implementation could be easily extended to other losses in generalized linear models. We develop a flexible and fast implementation, written in 'C++' using 'RcppArmadillo' and integrated into R via 'Rcpp' modules. See Culp, M. 2013 <doi:10.1080/10618600.2012.657139> for references on the Joint Trained Elastic-Net. |
License: | GPL (>= 2) |
Imports: | Rcpp, methods, MASS |
Depends: | stats |
LinkingTo: | Rcpp, RcppArmadillo |
Suggests: | knitr, rmarkdown, glmnet, Metrics, testthat |
VignetteBuilder: | knitr |
URL: | https://github.com/jlaria/s2net |
BugReports: | https://github.com/jlaria/s2net/issues |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.0 |
Author: | Juan C. Laria [aut, cre] (<https://orcid.org/0000-0001-7734-9647>), Line H. Clemmensen [aut] |
Maintainer: | Juan C. Laria <juank.laria@gmail.com> |
Index of help topics:
Rcpp_s2net-class Class 's2net' auto_mpg Auto MPG Data Set predict.s2netR S3 Methods for 's2netR' objects. predict_Rcpp_s2net Predict method for 's2net' C++ class. print.s2Data Print methods for S3 objects s2Data Data wrapper for 's2net'. s2Fista Hyper-parameter wrapper for FISTA. s2Params Hyper-parameter wrapper for 's2net' s2net The Generalized Semi-Supervised Elastic-Net s2netR Trains a generalized extended linear joint trained model using semi-supervised data. simulate_extra Simulate extrapolated data simulate_groups Simulate data (two groups design)
This package includes a very easy-to-use interface for handling data, with the s2Data
function. The main function of the package is the s2netR
function, which is a wrapper for the Rcpp_s2net
(s2net
) class.
Author(s)
Juan C. Laria [aut, cre] (<https://orcid.org/0000-0001-7734-9647>), Line H. Clemmensen [aut]
References
Laria, J.C., L. Clemmensen (2019). A generalized elastic-net for semi-supervised learning of sparse features.
Sogaard Larsen, J. et. al. (2019). Semi-supervised covariate shift modelling of spectroscopic data.
Ryan, K. J., & Culp, M. V. (2015). On semi-supervised linear regression in covariate shift problems. The Journal of Machine Learning Research, 16(1), 3183-3217.
See Also
Examples
data("auto_mpg")
train = s2Data(xL = auto_mpg$P1$xL, yL = auto_mpg$P1$yL, xU = auto_mpg$P1$xU)
model = s2netR(train,
s2Params(lambda1 = 0.1,
lambda2 = 0,
gamma1 = 0.1,
gamma2 = 100,
gamma3 = 0.1))
# here we tell it to transform the valid data as we did with train.
valid = s2Data(auto_mpg$P1$xU, auto_mpg$P1$yU, preprocess = train)
ypred = predict(model, valid$xL)
## Not run:
if(require(ggplot2)){
ggplot() +
aes(x = ypred, y = valid$yL) + geom_point() +
geom_abline(intercept = 0, slope = 1, linetype = 2)
}
## End(Not run)
Trains a generalized extended linear joint trained model using semi-supervised data.
Description
This function is a wrapper for the class s2net
. It creates the C++ object and fits the model using input data
.
Usage
s2netR(data, params, loss = "default", frame = "ExtJT", proj = "auto",
fista = NULL, S3 = TRUE)
Arguments
data |
A |
params |
A |
loss |
Loss function. One of |
frame |
The semi-supervised frame: |
proj |
Should the unlabeled data be shifted to remove the model's effect? One of |
fista |
Fista setup parameters. An object of class |
S3 |
Boolean: should the method return an S3 object (default) or a C++ object? |
Value
Returns an object of S3 class s2netR
or a C++ object of class s2net
Author(s)
Juan C. Laria
References
Ryan, K. J., & Culp, M. V. (2015). On semi-supervised linear regression in covariate shift problems. The Journal of Machine Learning Research, 16(1), 3183-3217.
See Also
Examples
data("auto_mpg")
train = s2Data(xL = auto_mpg$P1$xL, yL = auto_mpg$P1$yL, xU = auto_mpg$P1$xU)
model = s2netR(train,
s2Params(lambda1 = 0.1,
lambda2 = 0,
gamma1 = 0.1,
gamma2 = 100,
gamma3 = 0.1),
loss = "linear",
frame = "ExtJT",
proj = "auto",
fista = s2Fista(5000, 1e-7, 1, 0.8))
valid = s2Data(auto_mpg$P1$xU, auto_mpg$P1$yU, preprocess = train)
ypred = predict(model, valid$xL)
## Not run:
if(require(ggplot2)){
ggplot() +
aes(x = ypred, y = valid$yL) + geom_point() +
geom_abline(intercept = 0, slope = 1, linetype = 2)
}
## End(Not run)
Simulate extrapolated data
Description
Simulated data scenarios described in the paper from Ryan and Culp (2015).
Usage
simulate_extra(n_source = 100, n_target = 100, p = 1000, shift = 10,
scenario = "same", response = "linear", sigma2 = 2.5)
Arguments
n_source |
Number of source samples (labeled) |
n_target |
Number of target samples (unlabeled) |
p |
Number of variables ( |
shift |
The shift applied to the first 10 columns of xU. |
scenario |
Simulation scenario. One of |
response |
Type of response: |
sigma2 |
The variance of the error term, linear response case. |
Value
A list, with
- xL
data frame with the labeled (source) data
- yL
labels associated with
xL
- xU
data frame with the unlabeled (target) data
- yU
labels associated with
xU
(for validation/testing)
References
Ryan, K. J., & Culp, M. V. (2015). On semi-supervised linear regression in covariate shift problems. The Journal of Machine Learning Research, 16(1), 3183-3217.
See Also
Examples
set.seed(0)
data = simulate_extra()
train = s2Data(data$xL, data$yL, data$xU)
valid = s2Data(data$xU, data$yU, preprocess = train)
model = s2netR(train, s2Params(0.1))
ypred = predict(model, valid$xL)
plot(ypred, valid$yL)
Simulate data (two groups design)
Description
Simulated data scenario described in paper [citation here].
Usage
simulate_groups(n_source = 100, n_target = 100, p = 200, response = "linear")
Arguments
n_source |
Number of labeled observations |
n_target |
Number of unlabeled (target) observations |
p |
Number of variables |
response |
Type of response: |
Value
A list, with
- xL
data frame with the labeled (source) data
- yL
labels associated with
xL
- xU
data frame with the unlabeled (target) data
- yU
labels associated with
xU
(for validation/testing)
Author(s)
Juan C. Laria