Version: | 0.5-2 |
Date: | 2021-02-03 |
Title: | Exact Variable-Subset Selection in Linear Regression |
Description: | Exact and approximation algorithms for variable-subset selection in ordinary linear regression models. Either compute all submodels with the lowest residual sum of squares, or determine the single-best submodel according to a pre-determined statistical criterion. Hofmann et al. (2020) <doi:10.18637/jss.v093.i03>. |
Depends: | R (≥ 3.5.0) |
SystemRequirements: | C++11 |
Imports: | stats, graphics, utils |
License: | GPL (≥ 3) |
URL: | https://github.com/marc-hofmann/lmSubsets.R |
NeedsCompilation: | yes |
Packaged: | 2021-02-07 18:25:40 UTC; marc |
Author: | Marc Hofmann [aut, cre],
Cristian Gatu [aut],
Erricos J. Kontoghiorghes [aut],
Ana Colubi [aut],
Achim Zeileis |
Maintainer: | Marc Hofmann <marc.hofmann@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2021-02-07 20:10:13 UTC |
Package lmSubsets
Description
Variable-subset selection in ordinary linear regression.
Author(s)
Marc Hofmann (marc.hofmann@gmail.com)
Cristian Gatu (cgatu@info.uaic.ro)
Erricos J. Kontoghiorghes (erricos@dcs.bbk.ac.uk)
Ana Colubi (ana.colubi@gmail.com)
Achim Zeileis (Achim.Zeileis@R-project.org)
References
Hofmann M, Gatu C, Kontoghiorghes EJ, Colubi A, Zeileis A (2020). lmSubsets: Exact variable-subset selection in linear regression for R. Journal of Statistical Software, 93, 1–21. doi: 10.18637/jss.v093.i03.
Hofmann M, Gatu C, Kontoghiorghes EJ (2007). Efficient algorithms for computing the best subset regression models for large-scale problems. Computational Statistics \& Data Analysis, 52, 16–29. doi: 10.1016/j.csda.2007.03.017.
Gatu C, Kontoghiorghes EJ (2006). Branch-and-bound algorithms for computing the best subset regression models. Journal of Computational and Graphical Statistics, 15, 139–156. doi: 10.1198/106186006x100290.
See Also
Home page: https://github.com/marc-hofmann/lmSubsets.R
Extract AIC values from a subset regression
Description
Evaluate Akaike's information criterion (AIC) for the specified submodels.
Usage
## S3 method for class 'lmSubsets'
AIC(object, size, best = 1, ..., k = 2, na.rm = TRUE, drop = TRUE)
## S3 method for class 'lmSelect'
AIC(object, best = 1, ..., k = 2, na.rm = TRUE, drop = TRUE)
Arguments
object |
|
size |
|
best |
|
... |
ignored |
k |
|
na.rm |
|
drop |
|
Value
double[]
—the AIC values
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regressionAIC()
for the S3 generic
Air pollution and mortality
Description
Data relating air pollution and mortality, frequently used for illustrations in ridge regression and related tasks.
Usage
data(AirPollution)
Format
A data frame containing 60 observations on 16 variables.
- precipitation
average annual precipitation in inches
- temperature1
average January temperature in degrees Fahrenheit
- temperature7
average July temperature in degrees Fahrenheit
- age
percentage of 1960 SMSA population aged 65 or older
- household
average household size
- education
median school years completed by those over 22
- housing
percentage of housing units which are sound and with all facilities
- population
population per square mile in urbanized areas, 1960
- noncauc
percentage of non-Caucasian population in urbanized areas, 1960
- whitecollar
percentage employed in white collar occupations
- income
percentage of families with income < USD 3000
- hydrocarbon
relative hydrocarbon pollution potential
- nox
relative nitric oxides potential
- so2
relative sulphur dioxide potential
- humidity
annual average percentage of relative humidity at 13:00
- mortality
total age-adjusted mortality rate per 100,000
Source
http://lib.stat.cmu.edu/datasets/pollution
References
McDonald GC, Schwing RC (1973). Instabilities of regression estimates relating air pollution to mortality. Technometrics, 15, 463–482.
Miller AJ (2002). Subset selection in regression. New York: Chapman and Hall.
Examples
## load data (with logs for relative potentials)
data("AirPollution", package = "lmSubsets")
for (i in 12:14) AirPollution[[i]] <- log(AirPollution[[i]])
## fit subsets
lm_all <- lmSubsets(mortality ~ ., data = AirPollution)
plot(lm_all)
## refit best model
lm6 <- refit(lm_all, size = 6)
summary(lm6)
Extract BIC values from a subset regression
Description
Evaluate the Bayesian information criterion (BIC) for the specified submodels.
Usage
## S3 method for class 'lmSubsets'
BIC(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE)
## S3 method for class 'lmSelect'
BIC(object, best = 1, ..., na.rm = TRUE, drop = TRUE)
Arguments
object |
|
size |
|
best |
|
... |
ignored |
na.rm |
|
drop |
|
Value
double[]
—the BIC values
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regressionBIC()
for the S3 generic
Temperature observations and numerical weather predictions for Innsbruck
Description
00UTC temperature observations and corresponding 24-hour reforecast ensemble means from the Global Ensemble Forecast System (GEFS, Hamill et al. 2013) for SYNOP station Innsbruck Airport (11120; 47.260, 11.357) from 2011-01-01 to 2015-12-31.
Usage
data(IbkTemperature)
Format
A data frame containing 1824 daily observations/forecasts for 42
variables. The first column (temp
) contains temperature
observations at 00UTC (coordinated universal time), columns 2–37 are
24-hour lead time GEFS reforecast ensemble means for different
variables (see below). Columns 38–42 are deterministic time
trend/season patterns.
- temp
observed temperature at Innsbruck Airport (deg
C
)- tp
total accumulated precipitation (
kg~m^{-2}
)- t2m
temperature at 2 meters (
K
)- u10m
U-component of wind at 10 meters (
m~s^{-1}
)- v10m
V-component of wind at 10 meters (
m~s^{-1}
)- u80m
U-component of wind at 80 meters (
m~s^{-1}
)- v80m
U-component of wind at 80 meters (
m~s^{-1}
)- cape
convective available potential energy (
J~kg^{-1}
)- ci
convective inhibition (
J~kg^{-1}
)- sdlwrf
surface downward long-wave radiation flux (
W~m^{-2}
)- sdswrf
surface downward short-wave radiation flux (
W~m^{-2}
)- sulwrf
surface upward long-wave radiation flux (
W~m^{-2}
)- suswrf
surface upward short-wave radiation flux (
W~m^{-2}
)- ghf
ground heat flux (
W~m^{-2}
)- slhnf
surface latent heat net flux (
W~m^{-2}
)- sshnf
surface sensible heat net flux (
W~m^{-2}
)- mslp
mean sea level pressure (
Pa
)- psfc
surface pressure (
Pa
)- pw
precipitable water (
kg~m^{-2}
)- vsmc
volumetric soil moisture content (fraction)
- sh2m
specific humidity at 2 meters (
kg~kg^{-1}
)- tcc
total cloud cover (percent)
- tcic
total column-integrated condensate (
kg~m^{-2}
)- tsfc
skin temperature (
K
)- tmax2m
maximum temperature (
K
)- tmin2m
minimum temperature (
K
)- st
soil temperature (0–10 cm below surface) (
K
)- ulwrf
upward long-wave radiation flux (
W~m^{-2}
)- wr
water runoff (
kg~m^{-2}
)- we
water equivalent of accumulated snow depth (
kg~m^{-2}
)- wp
wind mixing energy (
J
)- w850
vertical velocity at 850 hPa surface (
Pa~s^{-1}
)- t2pvu
temperature on 2 PVU surface (
K
)- p2pvu
pressure on 2 PVU surface (
Pa
)- u2pvu
U-component of wind on 2 PVU surface (
m~s^{-1}
)- v2pvu
U-component of wind on 2 PVU surface (
m~s^{-1}
)- pv
Potential vorticity on 320 K isentrope (
K~m^{2}~kg^{-1}~s^{-1}
)- time
time in years
- sin, cos
sine and cosine component of annual harmonic pattern
- sin2, cos2
sine and cosine component of bi-annual harmonic pattern
Source
Observations: https://www.ogimet.com/synops.phtml.en. Reforecasts: https://psl.noaa.gov/forecasts/reforecast2/.
References
Hamill TM, Bates GT, Whitaker JS, Murray DR, Fiorino M, Galarneau Jr. TJ, Zhu Y, Lapenta W (2013). NOAA's second-generation global medium-range ensemble reforecast data set. Bulletin of the American Meteorological Society, 94(10), 1553–1565. doi: 10.1175/BAMS-D-12-00014.1.
Examples
## load data and omit missing values
data("IbkTemperature", package = "lmSubsets")
IbkTemperature <- na.omit(IbkTemperature)
## fit a simple climatological model for the temperature
## with a linear trend and annual/bi-annual harmonic seasonal pattern
CLIM <- lm(temp ~ time + sin + cos + sin2 + cos2,
data = IbkTemperature)
## fit a simple MOS with 2-meter temperature forecast in addition
## to the climatological model
MOS0 <- lm(temp ~ t2m + time + sin + cos + sin2 + cos2,
data = IbkTemperature)
## graphical comparison and MOS summary
plot(temp ~ time, data = IbkTemperature, type = "l", col = "darkgray")
lines(fitted(MOS0) ~ time, data = IbkTemperature, col = "darkred")
lines(fitted(CLIM) ~ time, data = IbkTemperature, lwd = 2)
MOS0
## best subset selection of remaining variables for the MOS
## (i.e., forcing the regressors of m1 into the model)
MOS1_all <- lmSubsets(temp ~ ., data = IbkTemperature,
include = c("t2m", "time", "sin", "cos", "sin2", "cos2"))
plot(MOS1_all)
image(MOS1_all, size = 8:20)
## -> Note that soil temperature and maximum temperature are selected
## in addition to the 2-meter temperature
## best subset selection of all variables
MOS2_all <- lmSubsets(temp ~ ., data = IbkTemperature)
plot(MOS2_all)
image(MOS2_all, size = 2:20)
## -> Note that 2-meter temperature is not selected into the best
## BIC model but soil-temperature (and maximum temperature) are used instead
## refit the best BIC subset selections
MOS1 <- refit(lmSelect(MOS1_all))
MOS2 <- refit(lmSelect(MOS2_all))
## compare BIC
BIC(CLIM, MOS0, MOS1, MOS2)
## compare RMSE
sqrt(sapply(list(CLIM, MOS0, MOS1, MOS2), deviance)/
nrow(IbkTemperature))
## compare coefficients
cf0 <- coef(CLIM)
cf1 <- coef(MOS0)
cf2 <- coef(MOS1)
cf3 <- coef(MOS2)
names(cf2) <- gsub("^x", "", names(coef(MOS1)))
names(cf3) <- gsub("^x", "", names(coef(MOS2)))
nam <- unique(c(names(cf0), names(cf1), names(cf2), names(cf3)))
cf <- matrix(NA, nrow = length(nam), ncol = 4,
dimnames = list(nam, c("CLIM", "MOS0", "MOS1", "MOS2")))
cf[names(cf0), 1] <- cf0
cf[names(cf1), 2] <- cf1
cf[names(cf2), 3] <- cf2
cf[names(cf3), 4] <- cf3
print(round(cf, digits = 3), na.print = "")
Extract the ceofficients from a subset regression
Description
Return the coefficients for the specified submodels.
Usage
## S3 method for class 'lmSubsets'
coef(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE)
## S3 method for class 'lmSelect'
coef(object, best = 1, ..., na.rm = TRUE, drop = TRUE)
Arguments
object |
|
size |
|
best |
|
... |
ignored |
na.rm |
|
drop |
|
Value
double[,]
, "data.frame"
—the submodel coefficients
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regressioncoef()
for the S3 generic
Extract the deviance from a subset regression
Description
Return the deviance for the specified submodels.
Usage
## S3 method for class 'lmSubsets'
deviance(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE)
## S3 method for class 'lmSelect'
deviance(object, best = 1, ..., na.rm = TRUE, drop = TRUE)
Arguments
object |
|
size |
|
best |
|
... |
ignored |
na.rm |
|
drop |
|
Value
double[]
, "data.frame"
—the submodel deviances
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regressiondeviance()
for the S3 generic
Extract the fitted values from a subset regression
Description
Return the fitted values for the specified submodel.
Usage
## S3 method for class 'lmSubsets'
fitted(object, size, best = 1, ...)
## S3 method for class 'lmSelect'
fitted(object, best = 1, ...)
Arguments
object |
|
size |
|
best |
|
... |
ignored |
Value
double[]
—the fitted values
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regressionfitted()
for the S3 generic
Extract a formula from a subset regression
Description
Return the formula for the specified submodel.
Usage
## S3 method for class 'lmSubsets'
formula(x, size, best = 1, ...)
## S3 method for class 'lmSelect'
formula(x, best, ...)
Arguments
x |
|
size |
|
best |
|
... |
ignored |
Value
"formula"
—the submodel formula
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regressionformula()
for the S3 generic
Heatmap of a subset regression
Description
Plot a heatmap of the specified submodels.
Usage
## S3 method for class 'lmSubsets'
image(x, size = NULL, best = 1, which = NULL, hilite, hilite_penalty,
main, sub, xlab = NULL, ylab, ann = par("ann"), axes = TRUE,
col = c("gray40", "gray90"), lab = "lab",
col_hilite = cbind("red", "pink"), lab_hilite = "lab",
pad_size = 3, pad_best = 1, pad_which = 3, axis_pos = -4,
axis_tck = -4, axis_lab = -10, ...)
## S3 method for class 'lmSelect'
image(x, best = NULL, which = NULL, hilite, hilite_penalty,
main, sub = NULL, xlab = NULL, ylab, ann = par("ann"),
axes = TRUE, col = c("gray40", "gray90"), lab = "lab",
col_hilite = cbind("red", "pink"), lab_hilite = "lab",
pad_best = 2, pad_which = 2, axis_pos = -4, axis_tck = -4,
axis_lab = -10, ...)
Arguments
x |
|
size , best |
submodels to be plotted |
which |
regressors to be plotted |
hilite , hilite_penalty |
submodels to be highlighted |
main , sub , xlab , ylab |
main, sub-, and axis titles |
ann |
annotate plot |
axes |
plot axes |
col , lab |
color and label style |
col_hilite , lab_hilite |
highlighting style |
pad_size , pad_best , pad_which |
padding |
axis_pos , axis_tck , axis_lab |
position of axes, tick length, and position of labels |
... |
ignored |
Value
invisible(x)
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regression
Examples
## data
data("AirPollution", package = "lmSubsets")
#################
## lmSubsets ##
#################
lm_all <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 20)
## heatmap
image(lm_all, best = 1:3)
## highlight 5 best (BIC)
image(lm_all, best = 1:3, hilite = 1:5, hilite_penalty = "BIC")
################
## lmSelect ##
################
## default criterion: BIC
lm_best <- lmSelect(lm_all)
## highlight 5 best (AIC)
image(lm_best, hilite = 1:5, hilite_penalty = "AIC")
## axis labels
image(lm_best, lab = c("bold(lab)", "lab"), hilite = 1,
lab_hilite = "underline(lab)")
Best-subset regression
Description
Best-variable-subset selection in ordinary linear regression.
Usage
lmSelect(formula, ...)
## Default S3 method:
lmSelect(formula, data, subset, weights, na.action,
model = TRUE, x = FALSE, y = FALSE, contrasts = NULL,
offset, ...)
Arguments
formula , data , subset , weights , na.action , model , x , y , contrasts , offset |
standard formula interface |
... |
forwarded to |
Details
The lmSelect()
generic provides various methods to conveniently
specify the regressor and response variables. The standard formula
interface (see lm()
) can be used, or the model
information can be extracted from an already fitted "lm"
object. The model matrix and response can also be passed in directly.
After processing the arguments, the call is forwarded to
lmSelect_fit()
.
Value
"lmSelect"
—a list
containing the components returned
by lmSelect_fit()
Further components include call
, na.action
,
weights
, offset
, contrasts
, xlevels
,
terms
, mf
, x
, and y
. See
lm()
for more information.
See Also
lmSelect.matrix()
for the matrix interfacelmSelect.lmSubsets()
for coercing an all-subsets regressionlmSelect_fit()
for the low-level interfacelmSubsets()
for all-subsets regression
Examples
## load data
data("AirPollution", package = "lmSubsets")
###################
## basic usage ##
###################
## fit 20 best subsets (BIC)
lm_best <- lmSelect(mortality ~ ., data = AirPollution, nbest = 20)
lm_best
## summary statistics
summary(lm_best)
## visualize
plot(lm_best)
########################
## custom criterion ##
########################
## the same as above, but with a custom criterion:
M <- nrow(AirPollution)
ll <- function (rss) {
-M/2 * (log(2 * pi) - log(M) + log(rss) + 1)
}
aic <- function (size, rss, k = 2) {
-2 * ll(rss) + k * (size + 1)
}
bic <- function (size, rss) {
aic(size, rss, k = log(M))
}
lm_cust <- lmSelect(mortality ~ ., data = AirPollution,
penalty = bic, nbest = 20)
lm_cust
Best-subset regression
Description
Coerce an all-subsets regression.
Usage
## S3 method for class 'lmSubsets'
lmSelect(formula, penalty = "BIC", ...)
Arguments
formula |
|
penalty |
|
... |
ignored |
Details
Computes a best-subset regression from an all-subsets regression.
Value
"lmSelect"
—a best-subset regression
See Also
lmSelect()
for the S3 genericlmSubsets()
for all-subsets regression
Examples
data("AirPollution", package = "lmSubsets")
lm_all <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 20)
lm_best <- lmSelect(lm_all)
lm_best
Best-subset regression
Description
Matrix interface to best-variable-subset selection in ordinary linear regression.
Usage
## S3 method for class 'matrix'
lmSelect(formula, y, intercept = TRUE, ...)
Arguments
formula |
|
y |
|
intercept |
|
... |
forwarded to
|
Details
This is a utility interface. Use the standard formula interface wherever possible.
Value
"lmSelect"
—a best-subset regression
See Also
lmSelect()
for the S3 genericlmSelect.default()
for the standard formula interface
Best-subset regression
Description
Low-level interface to best-variable-subset selection in ordinary linear regression.
Usage
lmSelect_fit(x, y, weights = NULL, offset = NULL, include = NULL,
exclude = NULL, penalty = "BIC", tolerance = 0,
nbest = 1, ..., pradius = NULL)
Arguments
x |
|
y |
|
weights |
|
offset |
|
include |
|
exclude |
|
penalty |
|
tolerance |
|
nbest |
|
... |
ignored |
pradius |
|
Details
The best variable-subset model is determined, where the "best" model is the one with the lowest information criterion value. The information criterion belongs to the AIC family.
The regression data is specified with the x
, y
,
weights
, and offset
parameters. See
lm.fit()
for further details.
To force regressors into or out of the regression, a list of
regressors can be passed as an argument to the include
or
exclude
parameters, respectively.
The information criterion is specified with the penalty
parameter. Accepted values are "AIC"
, "BIC"
, or a
"numeric"
value representing the penalty-per-model-parameter.
A custom selection criterion may be specified by passing an R
function as an argument. The expected signature is function
(size, rss)
, where size
is the number of predictors (including
the intercept, if any), and rss
is the residual sum of squares.
The function must be non-decreasing in both parameters.
An approximation tolerance
can be specified to speed up the
search.
The number of returned submodels is determined by the nbest
parameter.
The preordering radius is given with the pradius
parameter.
Value
A list
with the following components:
NOBS |
|
nobs |
|
nvar |
|
weights |
|
intercept |
|
include |
|
exclude |
|
size |
|
ic |
information criterion |
tolerance |
|
nbest |
|
submodel |
|
subset |
|
References
Hofmann M, Gatu C, Kontoghiorghes EJ, Colubi A, Zeileis A (2020). lmSubsets: Exact variable-subset selection in linear regression for R. Journal of Statistical Software, 93, 1–21. doi: 10.18637/jss.v093.i03.
See Also
lmSelect()
for the high-level interfacelmSubsets_fit()
for all-subsets regression
Examples
data("AirPollution", package = "lmSubsets")
x <- as.matrix(AirPollution[, names(AirPollution) != "mortality"])
y <- AirPollution[, names(AirPollution) == "mortality"]
f <- lmSelect_fit(x, y)
f
All-subsets regression
Description
All-variable-subsets selection in ordinary linear regression.
Usage
lmSubsets(formula, ...)
## Default S3 method:
lmSubsets(formula, data, subset, weights, na.action,
model = TRUE, x = FALSE, y = FALSE, contrasts = NULL,
offset, ...)
Arguments
formula , data , subset , weights , na.action , model , x , y , contrasts , offset |
standard formula interface |
... |
fowarded to |
Details
The lmSubsets()
generic provides various methods to
conveniently specify the regressor and response variables. The
standard formula interface (see lm()
) can be used,
or the model information can be extracted from an already fitted
"lm"
object. The model matrix and response can also be passed
in directly.
After processing of the arguments, the call is forwarded to
lmSubsets_fit()
.
Value
"lmSubsets"
—a list
containing the components returned
by lmSubsets_fit()
Further components include call
, na.action
,
weights
, offset
, contrasts
, xlevels
,
terms
, mf
, x
, and y
. See
lm()
for more information.
See Also
lmSubsets.matrix()
for the"matrix"
interfacelmSubsets_fit()
for the low-level interfacelmSelect()
for best-subset regression
Examples
## load data
data("AirPollution", package = "lmSubsets")
###################
## basic usage ##
###################
## canonical example: fit all subsets
lm_all <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 5)
lm_all
## plot RSS and BIC
plot(lm_all)
## summary statistics
summary(lm_all)
############################
## forced in-/exclusion ##
############################
lm_force <- lmSubsets(lm_all, include = c("nox", "so2"),
exclude = "whitecollar")
lm_force
All-subsets regression
Description
Matrix interface to all-variable-subsets selection in ordinary linear regression.
Usage
## S3 method for class 'matrix'
lmSubsets(formula, y, intercept = TRUE, ...)
Arguments
formula |
|
y |
|
intercept |
|
... |
forwarded to
|
Details
This is a utility interface. Use the standard formula interface wherever possible.
Value
"lmSubsets"
—an all-subsets regression
See Also
lmSubsets()
for the S3 genericlmSubsets.default()
for the standard formula interface
Examples
data("AirPollution", package = "lmSubsets")
x <- as.matrix(AirPollution)
lm_mat <- lmSubsets(x, y = "mortality")
lm_mat
All-subsets regression
Description
Low-level interface to all-variable-subsets selection in ordinary linear regression.
Usage
lmSubsets_fit(x, y, weights = NULL, offset = NULL, include = NULL,
exclude = NULL, nmin = NULL, nmax = NULL,
tolerance = 0, nbest = 1, ..., pradius = NULL)
Arguments
x |
|
y |
|
weights |
|
offset |
|
include |
|
exclude |
|
nmin |
|
nmax |
|
tolerance |
|
nbest |
|
... |
ignored |
pradius |
|
Details
The best variable-subset model for every subset size is determined, where the "best" model is the one with the lowest residual sum of squares (RSS).
The regression data is specified with the x
, y
,
weights
, and offset
parameters. See
lm.fit()
for further details.
To force regressors into or out of the regression, a list of
regressors can be passed as an argument to the include
or
exclude
parameters, respectively.
The scope of the search can be limited to a range of subset sizes by
setting nmin
and nmax
, the minimum and maximum number of
regressors allowed in the regression, respectively.
A tolerance
vector can be specified to speed up the search,
where tolerance[j]
is the approximation tolerance applied to
subset models of size j
.
The number of submodels returned for each subset size is determined by
the nbest
parameter.
The preordering radius is given with the pradius
parameter.
Value
A list
with the following components:
NOBS |
|
nobs |
|
nvar |
|
weights |
|
intercept |
|
include |
|
exclude |
|
size |
|
tolerance |
|
nbest |
|
submodel |
|
subset |
|
References
Hofmann M, Gatu C, Kontoghiorghes EJ, Colubi A, Zeileis A (2020). lmSubsets: Exact variable-subset selection in linear regression for R. Journal of Statistical Software, 93, 1–21. doi: 10.18637/jss.v093.i03.
See Also
lmSubsets()
for the high-level interfacelmSelect_fit()
for best-subset regression
Examples
data("AirPollution", package = "lmSubsets")
x <- as.matrix(AirPollution[, names(AirPollution) != "mortality"])
y <- AirPollution[, names(AirPollution) == "mortality"]
f <- lmSubsets_fit(x, y)
f
Extract the log-likelihood from a subset regression
Description
Return the log-likelihood of the the specified submodels.
Usage
## S3 method for class 'lmSubsets'
logLik(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE)
## S3 method for class 'lmSelect'
logLik(object, best = 1, ..., na.rm = TRUE, drop = TRUE)
Arguments
object |
|
size |
|
best |
|
... |
ignored |
na.rm |
|
drop |
|
Value
double[]
—the log-likelihoods
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regressionlogLik()
for the S3 generic
Extract the model frame from a subset regression
Description
Return the model frame.
Usage
## S3 method for class 'lmSubsets'
model.frame(formula, ...)
## S3 method for class 'lmSelect'
model.frame(formula, ...)
Arguments
formula |
|
... |
forwarded to |
Value
"data.frame"
—the model frame
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regressionmodel.frame()
for the S3 generic
Extract a model matrix from a subset regression
Description
Returns the model matrix for the specified submodel.
Usage
## S3 method for class 'lmSubsets'
model.matrix(object, size, best = 1, ...)
## S3 method for class 'lmSelect'
model.matrix(object, best, ...)
Arguments
object |
|
size |
|
best |
|
... |
forwarded to |
Value
double[,]
—the model matrix
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regressionmodel.matrix()
for the S3 generic
Model response
Description
Extract the model response.
Usage
model_response(data, ...)
## Default S3 method:
model_response(data, type = "any", ...)
Arguments
data |
an object |
type |
|
... |
further arguments |
Details
The default method simply forwards the call to
model.response()
.
Value
double[]
—the model response
See Also
model.response()
for the default implementation
Extract the model response from a subset regression
Description
Return the model response.
Usage
## S3 method for class 'lmSubsets'
model_response(data, ...)
## S3 method for class 'lmSelect'
model_response(data, ...)
Arguments
data |
|
... |
ignored |
Value
double[]
—the model response
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regressionmodel_response()
for the S3 generic
Plot a subset regression
Description
Plot the deviance of the selected submodels, as well as a specified information criterion.
Usage
## S3 method for class 'lmSubsets'
plot(x, penalty = "BIC", xlim, ylim_rss, ylim_ic, type_rss = "o",
type_ic = "o", main, sub, xlab, ylab_rss, ylab_ic, legend_rss,
legend_ic, ann = par("ann"), axes = TRUE, lty_rss = c(1, 3),
pch_rss = c(16, 21), col_rss = "black", bg_rss = "white",
lty_ic = c(1, 3), pch_ic = c(16, 21), col_ic = "red",
bg_ic = "white", ...)
## S3 method for class 'lmSelect'
plot(x, xlim, ylim, type = "o", main, sub, xlab, ylab, legend,
ann = par("ann"), axes = TRUE, lty = 1, pch = 16, col = "red",
bg = "white", ...)
Arguments
x |
|
penalty |
the information criterion |
xlim , ylim , ylim_rss , ylim_ic |
x and y limits |
type , type_rss , type_ic |
type of plot |
main , sub |
main and sub-title |
xlab , ylab , ylab_rss , ylab_ic |
axis titles |
legend , legend_rss , legend_ic |
plot legend |
ann |
annotate plot |
axes |
plot axes |
lty , lty_rss , lty_ic |
line type |
pch , pch_rss , pch_ic |
plotting character |
col , col_rss , col_ic |
color |
bg , bg_rss , bg_ic |
background color |
... |
further graphical parameters |
Value
invisible(x)
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regressionplot()
for the S3 generic
Examples
## load data
data("AirPollution", package = "lmSubsets")
#################
## lmSubsets ##
#################
lm_all <- lmSubsets(mortality ~ ., data = AirPollution, nbest = 5)
plot(lm_all)
################
## lmSelect ##
################
lm_best <- lmSelect(mortality ~ ., data = AirPollution, nbest = 20)
plot(lm_best)
Refitting models
Description
Generic function for refitting a model on a subset or reweighted data set.
Usage
refit(object, ...)
Arguments
object |
an object to be refitted |
... |
forwarded arguments |
Details
The refit
generic is a new function for refitting a certain
model object on multiple versions of a data set (and is hence
different from update
). Applications refit models after some
kind of model selection, e.g., variable subset selection,
partitioning, reweighting, etc.
The generic is similar to the one provided in modeltools and fxregime (and should fulfill the same purpose). To avoid dependencies, it is also provided here.
Value
"lm"
—the refitted model
Refit a subset regression
Description
Fit the specified submodel and return the obtained "lm"
object.
Usage
## S3 method for class 'lmSubsets'
refit(object, size, best = 1, ...)
## S3 method for class 'lmSelect'
refit(object, best = 1, ...)
Arguments
object |
|
size |
|
best |
|
... |
ignored |
Value
"lm"
—the fitted model
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regressionrefit()
for the S3 generic
Examples
## load data
data("AirPollution", package = "lmSubsets")
## fit subsets
lm_all <- lmSubsets(mortality ~ ., data = AirPollution)
## refit best model
lm5 <- refit(lm_all, size = 5)
summary(lm5)
Extract the residuals from all-subsets regression
Description
Return the residuals for the specified submodel.
Usage
## S3 method for class 'lmSubsets'
residuals(object, size, best = 1, ...)
## S3 method for class 'lmSelect'
residuals(object, best = 1, ...)
Arguments
object |
|
size |
|
best |
|
... |
ignored |
Value
double[]
—the residuals
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regressionresiduals()
for the S3 generic
Extract the residual standard deviation from a subset regression
Description
Return the residual standard deviation for the specified submodels.
Usage
## S3 method for class 'lmSubsets'
sigma(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE)
## S3 method for class 'lmSelect'
sigma(object, best = 1, ..., na.rm = TRUE, drop = TRUE)
Arguments
object |
|
size |
|
best |
|
... |
ignored |
na.rm |
|
drop |
|
Value
double[]
—the residual standard deviations
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regressionsigma()
for the S3 generic
Summarize a subset regression
Description
Evaluate summary statistics for the selected submodels.
Usage
## S3 method for class 'lmSubsets'
summary(object, ..., na.rm = TRUE)
## S3 method for class 'lmSelect'
summary(object, ..., na.rm = TRUE)
Arguments
object |
|
... |
ignored |
na.rm |
if |
Value
"summary.lmSubsets"
, "summary.lmSelect"
—a subset
regression summary
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regression
Extract variable names from a subset regression
Description
Return the variable names for the specified submodels.
Usage
## S3 method for class 'lmSubsets'
variable.names(object, size, best = 1, ..., na.rm = TRUE, drop = TRUE)
## S3 method for class 'lmSelect'
variable.names(object, best = 1, ..., na.rm = TRUE, drop = TRUE)
Arguments
object |
|
size |
|
best |
|
... |
ignored |
na.rm |
|
drop |
|
Value
logical[,]
, "data.frame"
—the variable names
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regressionvariable.names()
for the S3 generic
Extract the variance-covariance matrix from a subset regression
Description
Return the variance-covariance matrix for the specified submodel.
Usage
## S3 method for class 'lmSubsets'
vcov(object, size, best = 1, ...)
## S3 method for class 'lmSelect'
vcov(object, best = 1, ...)
Arguments
object |
|
size |
|
best |
|
... |
ignored |
Value
double[,]
—the variance-covariance matrix
See Also
lmSubsets()
for all-subsets regressionlmSelect()
for best-subset regressionvcov()
for the S3 generic