| Type: | Package | 
| Title: | The Generalized Semi-Supervised Elastic-Net | 
| Version: | 1.0.7 | 
| Date: | 2024-03-04 | 
| Description: | Implements the generalized semi-supervised elastic-net. This method extends the supervised elastic-net problem, and thus it is a practical solution to the problem of feature selection in semi-supervised contexts. Its mathematical formulation is presented from a general perspective, covering a wide range of models. We focus on linear and logistic responses, but the implementation could be easily extended to other losses in generalized linear models. We develop a flexible and fast implementation, written in 'C++' using 'RcppArmadillo' and integrated into R via 'Rcpp' modules. See Culp, M. 2013 <doi:10.1080/10618600.2012.657139> for references on the Joint Trained Elastic-Net. | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| Imports: | Rcpp, methods, MASS | 
| Depends: | stats | 
| LinkingTo: | Rcpp, RcppArmadillo | 
| Suggests: | knitr, rmarkdown, glmnet, Metrics, testthat | 
| VignetteBuilder: | knitr | 
| URL: | https://github.com/jlaria/s2net | 
| BugReports: | https://github.com/jlaria/s2net/issues | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.0 | 
| NeedsCompilation: | yes | 
| Packaged: | 2024-03-31 08:37:38 UTC; root | 
| Author: | Juan C. Laria | 
| Maintainer: | Juan C. Laria <juank.laria@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2024-03-31 10:30:02 UTC | 
Class s2net
Description
This is the main class of this library, implemented in C++ and exposed to R using Rcpp modules. 
It can be used in R directly, although some generic S4 methods have been implemented to make it easier to interact in R.
Methods
- predict
- signature(object = "Rcpp_s2net"): See- predict_Rcpp_s2net
Fields
- beta:
- Object of class - matrix. The fitted model coefficients.
- intercept:
- The model intercept. 
Class-Based Methods
- initialize(data, loss):
- 
- data
-  s2Dataobject
- loss
- Loss function: 0 = linear, 1 = logit 
 
- setupFista(s2Fista):
- Configures the FISTA internal algorithm. 
- predict(newX, type):
- 
- newX
- New data - matrixto make predictions.
- type
- 0 = default, 1 = response, 2 = probs, 3 = class 
 
- fit(params, frame, proj):
- 
- params
- s2Paramsobject
- frame
- 0 = "JT", 1 = "ExtJT" 
- proj
- 0 = no, 1 = yes, 2 = auto 
 
Author(s)
Juan C. Laria
Examples
data("auto_mpg")
train = s2Data(xL = auto_mpg$P1$xL, yL = auto_mpg$P1$yL,  xU = auto_mpg$P1$xU)
# We create the C++ object calling the new method (constructor)
obj = new(s2net, train, 0) # 0 = regression 
obj
# We call directly the $fit method of obj, 
obj$fit(s2Params(lambda1 = 0.01, 
                   lambda2 = 0.01, 
                   gamma1 = 0.05, 
                   gamma2 = 100, 
                   gamma3 = 0.05), 1, 2)
# fitted model
obj$beta
# We can test the results using the unlabeled data
test = s2Data(xL = auto_mpg$P1$xU, yL = auto_mpg$P1$yU,  preprocess = train)
ypred = obj$predict(test$xL, 0)
## Not run: 
if(require(ggplot2)){
  ggplot() + 
    aes(x = ypred, y = test$yL) + geom_point() + 
    geom_abline(intercept = 0, slope = 1, linetype = 2)
}
## End(Not run)
Auto MPG Data Set
Description
This dataset was taken from the UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets/Auto+MPG, and processed for the semi-supervised setting (Ryan and Culp, 2015).
Usage
data("auto_mpg")Format
There are two lists that contain partitions from a data frame with 398 observations on the following 9 variables.
- mpg
- a numeric vector 
- cylinders
- an ordered factor with levels - 3<- 4<- 5<- 6<- 8
- displacement
- a numeric vector 
- horsepower
- a numeric vector 
- weight
- a numeric vector 
- acceleration
- a numeric vector 
- year
- a numeric vector 
- origin
- a factor 
Details
This dataset is a slightly modified version of the dataset provided in the StatLib library. In line with the use by Ross Quinlan (1993) in predicting the attribute "mpg", 8 of the original instances were removed because they had unknown values for the "mpg" attribute. "The data concerns city-cycle fuel consumption in miles per gallon, to be predicted in terms of 3 multivalued discrete and 5 continuous attributes." (Quinlan, 1993)
Source
Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/]. Irvine, CA: University of California, School of Information and Computer Science.
References
Ryan, K. J., & Culp, M. V. (2015). On semi-supervised linear regression in covariate shift problems. The Journal of Machine Learning Research, 16(1), 3183-3217.
Examples
data(auto_mpg)
head(auto_mpg$P1$xL)
S3 Methods for s2netR objects.
Description
Generic predict method. Wrapper for the C++ class method s2net$predict.
Usage
## S3 method for class 's2netR'
predict(object, newX, type = "default", ...)
Arguments
| object | A  | 
| newX | A matrix with the data to make predictions. It should be in the same scale as the original data. See  | 
| type | Type of predictions. One of  | 
| ... | other parameters passed to predict | 
Value
A column matrix with predictions.
See Also
Examples
data("auto_mpg")
train = s2Data(xL = auto_mpg$P1$xL, yL = auto_mpg$P1$yL,  xU = auto_mpg$P1$xU)
model = s2netR(train, 
                s2Params(lambda1 = 0.1, 
                           lambda2 = 0,
                           gamma1 = 0.1,
                           gamma2 = 100,
                           gamma3 = 0.1),
                loss = "linear",
                frame = "ExtJT",
                proj = "auto",
                fista = s2Fista(5000, 1e-7, 1, 0.8))
valid = s2Data(auto_mpg$P1$xU, auto_mpg$P1$yU, preprocess = train)
ypred = predict(model, valid$xL)
## Not run: 
if(require(ggplot2)){
  ggplot() + 
    aes(x = ypred, y = valid$yL) + geom_point() + 
    geom_abline(intercept = 0, slope = 1, linetype = 2)
}
## End(Not run)
Predict method for s2net C++ class.
Description
This function provides an interface in R for the method predict in C++ class s2net.
Usage
predict_Rcpp_s2net(object, newX, type = "default")
Arguments
| object | An object of class  | 
| newX | Data to make predictions. Could be a  | 
| type | Type of predictions. One of  | 
Details
This method is included as a high-level wrapper of object$predict().
Value
Returns a column matrix with the same number of rows/observations as newX.
Author(s)
Juan C. Laria
See Also
Print methods for S3 objects
Description
Very simple print methods to show basic information about these simple S3 objects.
Usage
## S3 method for class 's2Data'
print(x, ...)
## S3 method for class 's2Fista'
print(x, ...)
Arguments
| x | S3 object of class  | 
| ... | other parameters passed to print | 
See Also
Data wrapper for s2net.
Description
This function preprocess the data to fit a semi-supervised linear joint trained model.
Usage
s2Data(xL, yL, xU = NULL, preprocess = T)
Arguments
| xL | The labeled data. Could be a  | 
| yL | The labels associated with  | 
| xU | The unlabeled data (optional). Could be a  | 
| preprocess | Should the input data be pre-processed? Possible values are: 
 
 Another object of class  | 
Value
Returns an object of S3 class s2Data with fields
| xL | Transformed labeled data | 
| yL | Transformed labels. If  | 
| xU | Tranformed unlabeled data | 
| type | Type of task. This one is inferred from the response labels. | 
| base | Base category for classification  | 
In addition the following attributes are stored.
| pr:rm_cols | logical vector of removed columns | 
| pr:center | column center | 
| pr:scale | column scale | 
| pr:ycenter | yL center. Regression | 
| pr:yscale | yL scale. Regression | 
Author(s)
Juan C. Laria
See Also
Examples
data("auto_mpg")
train = s2Data( xL = auto_mpg$P1$xL,
                  yL = auto_mpg$P1$yL,
                  xU = auto_mpg$P1$xU,
                  preprocess = TRUE )
show(train)
# Notice how ordered factor variable $cylinders is handled 
# .L (linear) .Q (quadratic) .C (cubic) and .^4
head(train$xL) 
#if you want to do validation with the unlabeled data
idx = sample(length(auto_mpg$P1$yU), 200)
train = s2Data(xL = auto_mpg$P1$xL, yL = auto_mpg$P1$yL, xU = auto_mpg$P1$xU[idx, ])
valid = s2Data(xL = auto_mpg$P1$xU[-idx, ], yL = auto_mpg$P1$yU[-idx], preprocess = train)
test = s2Data(xL = auto_mpg$P1$xU[idx, ], yL = auto_mpg$P1$yU[idx], preprocess = train)
train
valid
test
Hyper-parameter wrapper for FISTA.
Description
This is a very simple function that supplies the hyper-parameters for the Fast Iterative Soft-Threshold Algorithm (FISTA) that solves the s2net minimization problem.
Usage
s2Fista(MAX_ITER_INNER = 5000, TOL = 1e-07, t0 = 2, step = 0.1, use_warmstart = FALSE)
Arguments
| MAX_ITER_INNER | Number of iterations of FISTA | 
| TOL | The relative tolerance. The algorith stops when the objective does not improve more than  | 
| t0 | The initial stepsize for backtracking. | 
| step | The scale factor in the stepsize to backtrack until a valid step is found. | 
| use_warmstart | Should we use a warm  | 
Value
Returns an object of S3 class s2Fista with the input arguments as fields.
References
Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1), 183-202. doi:10.1137/080716542
See Also
Hyper-parameter wrapper for s2net
Description
This is a very simple function that collapses the input parameters into a named vector to supply to C++ methods.
Usage
s2Params(lambda1, lambda2 = 0, gamma1 = 0, gamma2 = 0, gamma3 = 0)
Arguments
| lambda1 | elastic-net regularization parameter -  | 
| lambda2 | elastic-net regularization parameter -  | 
| gamma1 | s2net weight hyper-parameter. | 
| gamma2 | s2net covariance hyper-parameter (between 1 and  | 
| gamma3 | s2net shift hyper-parameter (between 0 and 1). | 
Value
Returns a named vector of S3 class s2Params.
See Also
The Generalized Semi-Supervised Elastic-Net
Description
 Implements the generalized semi-supervised elastic-net. This method extends the supervised elastic-net problem, and thus it is a practical solution to the problem of feature selection in semi-supervised contexts. Its mathematical formulation is presented from a general perspective, covering a wide range of models.  We focus on linear and logistic responses, but the implementation could be easily extended to other losses in generalized linear models. We develop a flexible and fast implementation, written in 'C++' using 'RcppArmadillo' and integrated into R via 'Rcpp' modules. See Culp, M. 2013 <doi:10.1080/10618600.2012.657139> for references on the Joint Trained Elastic-Net.
Implements the generalized semi-supervised elastic-net. This method extends the supervised elastic-net problem, and thus it is a practical solution to the problem of feature selection in semi-supervised contexts. Its mathematical formulation is presented from a general perspective, covering a wide range of models.  We focus on linear and logistic responses, but the implementation could be easily extended to other losses in generalized linear models. We develop a flexible and fast implementation, written in 'C++' using 'RcppArmadillo' and integrated into R via 'Rcpp' modules. See Culp, M. 2013 <doi:10.1080/10618600.2012.657139> for references on the Joint Trained Elastic-Net.
Details
The DESCRIPTION file:
| Package: | s2net | 
| Type: | Package | 
| Title: | The Generalized Semi-Supervised Elastic-Net | 
| Version: | 1.0.7 | 
| Date: | 2024-03-04 | 
| Authors@R: | c(person("Juan C.", "Laria",, role = c("aut", "cre"), email = "juank.laria@gmail.com", comment = c(ORCID = "0000-0001-7734-9647")), person("Line H.", "Clemmensen",, role = c("aut"), email = "lkhc@dtu.dk")) | 
| Description: | Implements the generalized semi-supervised elastic-net. This method extends the supervised elastic-net problem, and thus it is a practical solution to the problem of feature selection in semi-supervised contexts. Its mathematical formulation is presented from a general perspective, covering a wide range of models. We focus on linear and logistic responses, but the implementation could be easily extended to other losses in generalized linear models. We develop a flexible and fast implementation, written in 'C++' using 'RcppArmadillo' and integrated into R via 'Rcpp' modules. See Culp, M. 2013 <doi:10.1080/10618600.2012.657139> for references on the Joint Trained Elastic-Net. | 
| License: | GPL (>= 2) | 
| Imports: | Rcpp, methods, MASS | 
| Depends: | stats | 
| LinkingTo: | Rcpp, RcppArmadillo | 
| Suggests: | knitr, rmarkdown, glmnet, Metrics, testthat | 
| VignetteBuilder: | knitr | 
| URL: | https://github.com/jlaria/s2net | 
| BugReports: | https://github.com/jlaria/s2net/issues | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.0 | 
| Author: | Juan C. Laria [aut, cre] (<https://orcid.org/0000-0001-7734-9647>), Line H. Clemmensen [aut] | 
| Maintainer: | Juan C. Laria <juank.laria@gmail.com> | 
Index of help topics:
Rcpp_s2net-class        Class 's2net'
auto_mpg                Auto MPG Data Set
predict.s2netR          S3 Methods for 's2netR' objects.
predict_Rcpp_s2net      Predict method for 's2net' C++ class.
print.s2Data            Print methods for S3 objects
s2Data                  Data wrapper for 's2net'.
s2Fista                 Hyper-parameter wrapper for FISTA.
s2Params                Hyper-parameter wrapper for 's2net'
s2net                   The Generalized Semi-Supervised Elastic-Net
s2netR                  Trains a generalized extended linear joint
                        trained model using semi-supervised data.
simulate_extra          Simulate extrapolated data
simulate_groups         Simulate data (two groups design)
This package includes a very easy-to-use interface for handling data, with the s2Data function. The main function of the package is the s2netR function, which is a wrapper for the Rcpp_s2net (s2net) class. 
Author(s)
Juan C. Laria [aut, cre] (<https://orcid.org/0000-0001-7734-9647>), Line H. Clemmensen [aut]
References
Laria, J.C., L. Clemmensen (2019). A generalized elastic-net for semi-supervised learning of sparse features.
Sogaard Larsen, J. et. al. (2019). Semi-supervised covariate shift modelling of spectroscopic data.
Ryan, K. J., & Culp, M. V. (2015). On semi-supervised linear regression in covariate shift problems. The Journal of Machine Learning Research, 16(1), 3183-3217.
See Also
Examples
data("auto_mpg")
train = s2Data(xL = auto_mpg$P1$xL, yL = auto_mpg$P1$yL,  xU = auto_mpg$P1$xU)
model = s2netR(train, 
                s2Params(lambda1 = 0.1, 
                           lambda2 = 0,
                           gamma1 = 0.1,
                           gamma2 = 100,
                           gamma3 = 0.1))
# here we tell it to transform the valid data as we did with train.
valid = s2Data(auto_mpg$P1$xU, auto_mpg$P1$yU, preprocess = train) 
ypred = predict(model, valid$xL)
## Not run: 
if(require(ggplot2)){
  ggplot() + 
    aes(x = ypred, y = valid$yL) + geom_point() + 
    geom_abline(intercept = 0, slope = 1, linetype = 2)
}
## End(Not run)
Trains a generalized extended linear joint trained model using semi-supervised data.
Description
This function is a wrapper for the class s2net. It creates the C++ object and fits the model using input data.
Usage
s2netR(data, params, loss = "default", frame = "ExtJT", proj = "auto", 
        fista = NULL, S3 = TRUE)
Arguments
| data | A  | 
| params | A  | 
| loss | Loss function. One of  | 
| frame | The semi-supervised frame:  | 
| proj | Should the unlabeled data be shifted to remove the model's effect? One of  | 
| fista | Fista setup parameters. An object of class  | 
| S3 | Boolean: should the method return an S3 object (default) or a C++ object? | 
Value
Returns an object of S3 class s2netR or a C++ object of class s2net
Author(s)
Juan C. Laria
References
Ryan, K. J., & Culp, M. V. (2015). On semi-supervised linear regression in covariate shift problems. The Journal of Machine Learning Research, 16(1), 3183-3217.
See Also
Examples
data("auto_mpg")
train = s2Data(xL = auto_mpg$P1$xL, yL = auto_mpg$P1$yL,  xU = auto_mpg$P1$xU)
model = s2netR(train, 
                s2Params(lambda1 = 0.1, 
                           lambda2 = 0,
                           gamma1 = 0.1,
                           gamma2 = 100,
                           gamma3 = 0.1),
                loss = "linear",
                frame = "ExtJT",
                proj = "auto",
                fista = s2Fista(5000, 1e-7, 1, 0.8))
valid = s2Data(auto_mpg$P1$xU, auto_mpg$P1$yU, preprocess = train)
ypred = predict(model, valid$xL)
## Not run: 
if(require(ggplot2)){
  ggplot() + 
    aes(x = ypred, y = valid$yL) + geom_point() + 
    geom_abline(intercept = 0, slope = 1, linetype = 2)
}
## End(Not run)
Simulate extrapolated data
Description
Simulated data scenarios described in the paper from Ryan and Culp (2015).
 
Usage
simulate_extra(n_source = 100, n_target = 100, p = 1000, shift = 10, 
               scenario = "same", response = "linear", sigma2 = 2.5)
Arguments
| n_source | Number of source samples (labeled) | 
| n_target | Number of target samples (unlabeled) | 
| p | Number of variables (  | 
| shift | The shift applied to the first 10 columns of xU. | 
| scenario | Simulation scenario. One of  | 
| response | Type of response:  | 
| sigma2 | The variance of the error term, linear response case. | 
Value
A list, with
- xL
- data frame with the labeled (source) data 
- yL
- labels associated with - xL
- xU
- data frame with the unlabeled (target) data 
- yU
- labels associated with - xU(for validation/testing)
References
Ryan, K. J., & Culp, M. V. (2015). On semi-supervised linear regression in covariate shift problems. The Journal of Machine Learning Research, 16(1), 3183-3217.
See Also
Examples
set.seed(0)
data = simulate_extra()
train = s2Data(data$xL, data$yL, data$xU)
valid = s2Data(data$xU, data$yU, preprocess = train)
model = s2netR(train, s2Params(0.1))
ypred = predict(model, valid$xL)
plot(ypred, valid$yL)
Simulate data (two groups design)
Description
Simulated data scenario described in paper [citation here].
 
Usage
simulate_groups(n_source = 100, n_target = 100, p = 200, response = "linear")
Arguments
| n_source | Number of labeled observations | 
| n_target | Number of unlabeled (target) observations | 
| p | Number of variables | 
| response | Type of response:  | 
Value
A list, with
- xL
- data frame with the labeled (source) data 
- yL
- labels associated with - xL
- xU
- data frame with the unlabeled (target) data 
- yU
- labels associated with - xU(for validation/testing)
Author(s)
Juan C. Laria