Title: | Addons for the 'mclust' Package |
Version: | 0.9.2 |
Date: | 2025-04-03 |
Description: | Extend the functionality of the 'mclust' package for Gaussian finite mixture modeling by including: density estimation for data with bounded support (Scrucca, 2019 <doi:10.1002/bimj.201800174>); modal clustering using MEM (Modal EM) algorithm for Gaussian mixtures (Scrucca, 2021 <doi:10.1002/sam.11527>); entropy estimation via Gaussian mixture modeling (Robin & Scrucca, 2023 <doi:10.1016/j.csda.2022.107582>); Gaussian mixtures modeling of financial log-returns (Scrucca, 2024 <doi:10.3390/e26110907>). |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
URL: | https://mclust-org.github.io/mclustAddons/ |
Depends: | mclust (≥ 6.1.1), R (≥ 4.3) |
Imports: | cli, doParallel, doRNG (≥ 1.6), foreach, graphics, grDevices, iterators, knitr (≥ 1.12), parallel, Rcpp (≥ 1.0), rmarkdown (≥ 0.9), stats, utils |
LinkingTo: | Rcpp (≥ 1.0), RcppArmadillo (≥ 14.4) |
VignetteBuilder: | knitr |
ByteCompile: | true |
Encoding: | UTF-8 |
NeedsCompilation: | yes |
Repository: | CRAN |
RoxygenNote: | 7.3.2 |
Packaged: | 2025-04-03 06:23:12 UTC; luca |
Author: | Luca Scrucca |
Maintainer: | Luca Scrucca <luca.scrucca@unibo.it> |
Date/Publication: | 2025-04-03 06:40:02 UTC |
Addons for the mclust package
Description
An R package extending the functionality of the mclust package (Scrucca et al. 2916, 2023) for Gaussian finite mixture modeling by including:
density estimation for data with bounded support (Scrucca, 2019)
modal clustering using MEM algorithm for Gaussian mixtures (Scrucca, 2021)
entropy estimation via Gaussian mixture modeling (Robin & Scrucca, 2023)
For a quick introduction to mclustAddons see the vignette A quick tour of mclustAddons.
Author(s)
Maintainer: Luca Scrucca luca.scrucca@unibo.it (ORCID) [copyright holder]
References
Scrucca L., Fop M., Murphy T. B. and Raftery A. E. (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models, The R Journal, 8/1, 205-233. doi:10.32614/RJ-2016-021
Scrucca L., Fraley C., Murphy T.B., Raftery A.E. (2023) Model-Based Clustering, Classification, and Density Estimation Using mclust in R. Chapman and Hall/CRC. doi:10.1201/9781003277965
Scrucca L. (2019) A transformation-based approach to Gaussian mixture density estimation for bounded data. Biometrical Journal, 61:4, 873–888. doi:10.1002/bimj.201800174
Scrucca L. (2021) A fast and efficient Modal EM algorithm for Gaussian mixtures. Statistical Analysis and Data Mining, 14:4, 305–314. doi:10.1002/sam.11527
Robin S. and Scrucca L. (2023) Mixture-based estimation of entropy. Computational Statistics & Data Analysis, 177, 107582. doi:10.1016/j.csda.2022.107582
See Also
densityMclustBounded()
for density estimation of bounded data;
MclustMEM()
for modal clustering;
EntropyGMM()
for entropy estimation.
Gaussian mixture-based estimation of entropy
Description
Compute an estimate of the (differential) entropy from a Gaussian Mixture Model (GMM) fitted using the mclust package.
Usage
EntropyGMM(object, ...)
## S3 method for class 'densityMclust'
EntropyGMM(object, ...)
## S3 method for class 'densityMclustBounded'
EntropyGMM(object, ...)
## S3 method for class 'Mclust'
EntropyGMM(object, ...)
## S3 method for class 'data.frame'
EntropyGMM(object, ...)
## S3 method for class 'matrix'
EntropyGMM(object, ...)
EntropyGauss(sigma)
nats2bits(x)
bits2nats(x)
Arguments
object |
An object of class If a |
... |
Further arguments passed to or from other methods. |
sigma |
A symmetric covariance matrix. |
x |
A vector of values. |
Details
For more details see
vignette("mclustAddons")
Value
-
EntropyGMM()
returns an estimate of the entropy based on a estimated Gaussian mixture model (GMM) fitted using the mclust package. If a matrix of data values is provided, a GMM is preliminary fitted to the data and then the entropy computed. -
EntropyGauss()
returns the entropy for a multivariate Gaussian distribution with covariance matrixsigma
. -
nats2bits()
andbits2nats()
convert input values in nats to bits, and viceversa. Information-theoretic quantities have different units depending on the base of the logarithm used: nats are expressed in base-2 logarithms, whereas bits in natural logarithms.
Author(s)
Luca Scrucca
References
Robin S. and Scrucca L. (2023) Mixture-based estimation of entropy. Computational Statistics & Data Analysis, 177, 107582. doi:10.1016/j.csda.2022.107582
See Also
mclust::Mclust()
, mclust::densityMclust()
.
Examples
X = iris[,1:4]
mod = densityMclust(X, plot = FALSE)
h = EntropyGMM(mod)
h
bits2nats(h)
EntropyGMM(X)
Modeling log-returns distribution via Gaussian Mixture Models
Description
Gaussian mixtures for modeling the distribution of financial log-returns.
Usage
GMMlogreturn(y, ...)
## S3 method for class 'GMMlogreturn'
summary(object, ...)
Arguments
y |
A numeric vector providing the log-returns of a financial stock. |
... |
Further arguments passed to |
object |
An object of class |
Details
Let P_t
be the price of a financial stock for the current time frame
(day for instance), and P_{t-1}
the price of the previous time frame.
The log-return at time t
is defined as:
y_t = \log( \frac{P_t}{P_{t-1}} )
A univariate heteroscedastic GMM using Bayesian regularization
(as described in mclust::priorControl()
) is fitted to the observed
log-returns. The number of mixture components is automatically selected
by BIC, unless specified with the optional G
argument.
Value
Returns an object of class 'GMMlogreturn'
.
Author(s)
Luca Scrucca
References
Scrucca L. (2024) Entropy-based volatility analysis of financial log-returns using Gaussian mixture models. Entropy, 26(11), 907. doi:10.3390/e26110907
See Also
VaR.GMMlogreturn()
, ES.GMMlogreturn()
.
Examples
data(gold)
head(gold)
mod = GMMlogreturn(gold$log.returns)
summary(mod)
plot(mod, what = "density", data = gold$log.returns,
xlab = "log-returns", col = 4, lwd = 2)
Modal EM algorithm for Gaussian Mixtures
Description
A function implementing a fast and efficient Modal EM algorithm for Gaussian mixtures.
Usage
GaussianMixtureMEM(
data,
pro,
mu,
sigma,
control = list(eps = 1e-05, maxiter = 1000, stepsize = function(t) 1 - exp(-0.1 * t),
denoise = TRUE, alpha = 0.01, keep.path = FALSE),
...
)
Arguments
data |
A numeric vector, matrix, or data frame of observations.
Categorical variables are not allowed. If a matrix or data frame, rows
correspond to observations ( |
pro |
A |
mu |
A |
sigma |
A |
control |
A list of control parameters:
|
... |
Further arguments passed to or from other methods. |
Value
Returns a list containing the following elements:
-
n
The number of input data points. -
d
The number of variables/features. -
parameters
The Gaussian mixture parameters. -
iter
The number of iterations of MEM algorithm. -
nmodes
The number of modes estimated by the MEM algorithm. -
modes
The coordinates of modes estimated by MEM algorithm. -
path
If requested, the coordinates of full paths to modes for each data point. -
logdens
The log-density at the estimated modes. -
logvol
The log-volume used for denoising (if requested). -
classification
The modal clustering classification of input data points.
Author(s)
Luca Scrucca
References
Scrucca L. (2021) A fast and efficient Modal EM algorithm for Gaussian mixtures. Statistical Analysis and Data Mining, 14:4, 305–314. doi: 10.1002/sam.11527
See Also
Model-based clustering for bounded data
Description
Clustering of bounded data via transformation-based approach for Gaussian mixtures.
Usage
MclustBounded(data, ...)
## S3 method for class 'MclustBounded'
summary(object, classification = TRUE, parameters = FALSE, ...)
Arguments
data |
A numeric vector, matrix, or data frame of observations. If a matrix or data frame, rows correspond to observations and columns correspond to variables. |
... |
Further arguments passed to |
object |
An object of class |
classification |
A logical, if |
parameters |
A logical, if |
Details
For more details see
vignette("mclustAddons")
Value
Returns an object of class 'MclustBounded'
.
Author(s)
Luca Scrucca
References
Scrucca L. (2024) A model-based clustering approach for bounded data using transformation-based Gaussian mixture models. Under review. arXiv pre-print available at http://arxiv.org/abs/2412.13572
See Also
densityMclustBounded()
, predict.MclustBounded()
, plot.MclustBounded()
.
Recover parameters in the original scale
Description
Given a GMM for bounded data, computes the means and variances in the original scale from the estimated mixture components parameters dataset using simulations.
Usage
MclustBoundedParameters(object, nsim = 1e+06, ...)
Arguments
object |
An object of class |
nsim |
An integer specifying the number of simulations to employ. |
... |
Further arguments passed to or from other methods. |
Examples
x = rlnorm(1000, 0, 1)
mod = densityMclustBounded(x, lbound = 0, lambda = 0)
summary(mod, parameters = TRUE)
plot(mod, what = "density")
# transformed parameters (from log-normal distribution)
# mean
with(mod$parameters,
exp(mean + 0.5*variance$sigmasq))
# var
with(mod$parameters,
(exp(variance$sigmasq) - 1)*exp(2*mean + variance$sigmasq))
# using simulations
MclustBoundedParameters(mod)
Modal EM algorithm for Gaussian Mixtures fitted via mclust package
Description
Modal-clustering estimation by applying the Modal EM algorithm to Gaussian mixtures fitted using the mclust package.
Usage
MclustMEM(object, data = NULL, ...)
## S3 method for class 'MclustMEM'
summary(object, ...)
Arguments
object |
An object of class |
data |
If provided, a numeric vector, matrix, or data frame of
observations. If a matrix or data frame, rows correspond to observations
( |
... |
Further arguments passed to or from other methods. |
Details
For more details see
vignette("mclustAddons")
Value
Returns an object of class 'MclustMEM'
with elements described in
GaussianMixtureMEM()
.
Author(s)
Luca Scrucca
References
Scrucca L. (2021) A fast and efficient Modal EM algorithm for Gaussian mixtures. Statistical Analysis and Data Mining, 14:4, 305–314. doi:10.1002/sam.11527
See Also
GaussianMixtureMEM()
, plot.MclustMEM()
.
Examples
data(Baudry_etal_2010_JCGS_examples, package = "mclust")
plot(ex4.1)
GMM <- Mclust(ex4.1)
plot(GMM, what = "classification")
MEM <- MclustMEM(GMM)
MEM
summary(MEM)
plot(MEM)
plot(ex4.4.2)
GMM <- Mclust(ex4.4.2)
plot(GMM, what = "classification")
MEM <- MclustMEM(GMM)
MEM
summary(MEM)
plot(MEM, addDensity = FALSE)
Financial risk measures
Description
Generic functions for computing Value-at-Risk (VaR) and Expected Shortfall (ES).
Usage
VaR(object, ...)
ES(object, ...)
Arguments
object |
An object of class specific for the method. |
... |
Further arguments passed to or from other methods. |
Risk measures from Gaussian mixtures modeling
Description
Value-at-Risk (VaR) and Expected Shortfall (ES) from the fit of
Gaussian mixtures provided by GMMlogreturn()
function.
Usage
## S3 method for class 'GMMlogreturn'
VaR(object, alpha, ...)
## S3 method for class 'GMMlogreturn'
ES(object, alpha, ...)
Arguments
object |
An object of class |
alpha |
A vector of values in the interval |
... |
Further arguments passed to or from other methods. |
Details
VaR(\alpha
) is the maximum potential loss over a specified time
horizon with probability equal to the confidence level 1-\alpha
.
ES(\alpha
) is the expected loss given that the loss exceeds the
VaR(\alpha
) level.
Value
Returns a numerical value corresponding to VaR or ES at given level(s).
References:
Ruppert Matteson (2015) Statistics and Data Analysis for Financial Engineering, Springer, Chapter 19.
Cizek Hardle Weron (2011) Statistical Tools for Finance and Insurance, 2nd ed., Springer, Chapter 2.
Examples
z = sample(1:2, size = 250, replace = TRUE, prob = c(0.8, 0.2))
y = double(length(z))
y[z == 1] = rnorm(sum(z == 1), 0, 1)
y[z == 2] = rnorm(sum(z == 2), -0.5, 2)
GMM = GMMlogreturn(y)
alpha = seq(0.01, 0.1, by = 0.001)
matplot(alpha, data.frame(VaR = VaR(GMM, alpha),
ES = ES(GMM, alpha)),
type = "l", col = c(2,4), lty = 1, lwd = 2,
xlab = expression(alpha), ylab = "Loss")
legend("topright", col = c(2,4), lty = 1, lwd = 2,
legend = c("VaR", "ES"), inset = 0.02)
Cumulative distribution and quantiles of univariate model-based mixture density estimation for bounded data
Description
Compute the cumulative density function (cdf) or quantiles of a
one-dimensional density for bounded data estimated via the
transformation-based approach for Gaussian mixtures in
densityMclustBounded()
.
Diagnostic plots for density estimation of bounded data via transformation-based approach of Gaussian mixtures. Only available for the one-dimensional case.
The two diagnostic plots for density estimation in the one-dimensional case are discussed in Loader (1999, pp- 87-90).
Usage
cdfDensityBounded(object, data, ngrid = 100, ...)
quantileDensityBounded(object, p, ...)
densityMclustBounded.diagnostic(
object,
type = c("cdf", "qq"),
col = c("black", "black"),
lwd = c(2, 1),
lty = c(1, 1),
legend = TRUE,
grid = TRUE,
...
)
Arguments
object |
An object of class |
data |
A numeric vector of evaluation points. |
ngrid |
The number of points in a regular grid to be used as evaluation
points if no |
... |
Additional arguments. |
p |
A numeric vector of probabilities corresponding to quantiles. |
type |
The type of graph requested:
|
col |
A pair of values for the color to be used for plotting, respectively, the estimated CDF and the empirical cdf. |
lwd |
A pair of values for the line width to be used for plotting, respectively, the estimated CDF and the empirical cdf. |
lty |
A pair of values for the line type to be used for plotting, respectively, the estimated CDF and the empirical cdf. |
legend |
A logical indicating if a legend must be added to the plot of fitted CDF vs the empirical CDF. |
grid |
A logical indicating if a |
Details
The cdf is evaluated at points given by the optional argument data
.
If not provided, a regular grid of length ngrid
for the evaluation
points is used.
The quantiles are computed using bisection linear search algorithm.
Value
cdfDensityBounded()
returns a list of x
and y
values providing,
respectively, the evaluation points and the estimated cdf.
quantileDensityBounded()
returns a vector of quantiles.
No return value, called for side effects.
Author(s)
Luca Scrucca
References
Loader C. (1999), Local Regression and Likelihood. New York, Springer.
See Also
densityMclustBounded()
, plot.densityMclustBounded()
.
densityMclustBounded()
, plot.densityMclustBounded()
.
Examples
# univariate case with lower bound
x <- rchisq(200, 3)
dens <- densityMclustBounded(x, lbound = 0)
xgrid <- seq(-2, max(x), length=1000)
cdf <- cdfDensityBounded(dens, xgrid)
str(cdf)
plot(xgrid, pchisq(xgrid, df = 3), type = "l", xlab = "x", ylab = "CDF")
lines(cdf, col = 4, lwd = 2)
q <- quantileDensityBounded(dens, p = c(0.01, 0.1, 0.5, 0.9, 0.99))
cbind(quantile = q, cdf = cdfDensityBounded(dens, q)$y)
plot(cdf, type = "l", col = 4, xlab = "x", ylab = "CDF")
points(q, cdfDensityBounded(dens, q)$y, pch = 19, col = 4)
# univariate case with lower & upper bounds
x <- rbeta(200, 5, 1.5)
dens <- densityMclustBounded(x, lbound = 0, ubound = 1)
xgrid <- seq(-0.1, 1.1, length=1000)
cdf <- cdfDensityBounded(dens, xgrid)
str(cdf)
plot(xgrid, pbeta(xgrid, 5, 1.5), type = "l", xlab = "x", ylab = "CDF")
lines(cdf, col = 4, lwd = 2)
q <- quantileDensityBounded(dens, p = c(0.01, 0.1, 0.5, 0.9, 0.99))
cbind(quantile = q, cdf = cdfDensityBounded(dens, q)$y)
plot(cdf, type = "l", col = 4, xlab = "x", ylab = "CDF")
points(q, cdfDensityBounded(dens, q)$y, pch = 19, col = 4)
# univariate case with lower bound
x <- rchisq(200, 3)
dens <- densityMclustBounded(x, lbound = 0)
plot(dens, x, what = "diagnostic")
# or
densityMclustBounded.diagnostic(dens, type = "cdf")
densityMclustBounded.diagnostic(dens, type = "qq")
# univariate case with lower & upper bounds
x <- rbeta(200, 5, 1.5)
dens <- densityMclustBounded(x, lbound = 0, ubound = 1)
plot(dens, x, what = "diagnostic")
# or
densityMclustBounded.diagnostic(dens, type = "cdf")
densityMclustBounded.diagnostic(dens, type = "qq")
Model-based mixture density estimation for bounded data
Description
Density estimation for bounded data via transformation-based approach for Gaussian mixtures.
Usage
densityMclustBounded(
data,
G = NULL,
modelNames = NULL,
criterion = c("BIC", "ICL"),
lbound = NULL,
ubound = NULL,
lambda = c(-3, 3),
prior = NULL,
initialization = NULL,
nstart = 25,
parallel = FALSE,
seed = NULL,
...
)
## S3 method for class 'densityMclustBounded'
summary(object, parameters = FALSE, ...)
Arguments
data |
A numeric vector, matrix, or data frame of observations. If a matrix or data frame, rows correspond to observations and columns correspond to variables. |
G |
An integer vector specifying the numbers of mixture components. By
default |
modelNames |
A vector of character strings indicating the Gaussian
mixture models to be fitted on the transformed-data space. See
|
criterion |
A character string specifying the information criterion for
model selection. Possible values are |
lbound |
Numeric vector proving lower bounds for variables. |
ubound |
Numeric vector proving upper bounds for variables. |
lambda |
A numeric vector providing the range (min and max) of searched values for the transformation parameter(s). If a matrix is provided, then for each variable a row should be provided containing the range of lambda values for the transformation parameter. If a variable must have a fixed lambda value, the provided min and max values should be equal. See examples below. |
prior |
A function specifying a prior for Bayesian regularization of
Gaussian mixtures. See |
initialization |
A list containing one or more of the following components:
|
nstart |
An integer value specifying the number of replications of
k-means clustering to be used for initializing the EM algorithm. See
|
parallel |
An optional argument which allows to specify if the search over all possible models should be run sequentially (default) or in parallel. For a single machine with multiple cores, possible values are:
In all the cases described above, at the end of the search the cluster is automatically stopped by shutting down the workers. If a cluster of multiple machines is available, evaluation of the fitness
function can be executed in parallel using all, or a subset of, the cores
available to the machines belonging to the cluster. However, this option
requires more work from the user, who needs to set up and register a
parallel back end. In this case the cluster must be explicitely stopped
with |
seed |
An integer value containing the random number generator state. This argument can be used to replicate the result of k-means initialisation strategy. Note that if parallel computing is required, the doRNG package must be installed. |
... |
Further arguments passed to or from other methods. |
object |
An object of class |
parameters |
A logical, if |
Details
For more details see
vignette("mclustAddons")
Value
Returns an object of class 'densityMclustBounded'
.
Author(s)
Luca Scrucca
References
Scrucca L. (2019) A transformation-based approach to Gaussian mixture density estimation for bounded data. Biometrical Journal, 61:4, 873–888. doi:10.1002/bimj.201800174
See Also
predict.densityMclustBounded()
, plot.densityMclustBounded()
.
Examples
# univariate case with lower bound
x <- rchisq(200, 3)
xgrid <- seq(-2, max(x), length=1000)
f <- dchisq(xgrid, 3) # true density
dens <- densityMclustBounded(x, lbound = 0)
summary(dens)
summary(dens, parameters = TRUE)
plot(dens, what = "BIC")
plot(dens, what = "density")
lines(xgrid, f, lty = 2)
plot(dens, what = "density", data = x, breaks = 15)
# univariate case with lower & upper bounds
x <- rbeta(200, 5, 1.5)
xgrid <- seq(-0.1, 1.1, length=1000)
f <- dbeta(xgrid, 5, 1.5) # true density
dens <- densityMclustBounded(x, lbound = 0, ubound = 1)
summary(dens)
plot(dens, what = "BIC")
plot(dens, what = "density")
plot(dens, what = "density", data = x, breaks = 9)
# bivariate case with lower bounds
x1 <- rchisq(200, 3)
x2 <- 0.5*x1 + sqrt(1-0.5^2)*rchisq(200, 5)
x <- cbind(x1, x2)
plot(x)
dens <- densityMclustBounded(x, lbound = c(0,0))
summary(dens, parameters = TRUE)
plot(dens, what = "BIC")
plot(dens, what = "density")
plot(dens, what = "density", type = "hdr")
plot(dens, what = "density", type = "persp")
# specify different ranges for the lambda values of each variable
dens1 <- densityMclustBounded(x, lbound = c(0,0),
lambda = matrix(c(-2,2,0,1), 2, 2, byrow=TRUE))
# set lambda = 0 fixed for the second variable
dens2 <- densityMclustBounded(x, lbound = c(0,0),
lambda = matrix(c(0,1,0,0), 2, 2, byrow=TRUE))
dens[c("lambdaRange", "lambda", "loglik", "df")]
dens1[c("lambdaRange", "lambda", "loglik", "df")]
dens2[c("lambdaRange", "lambda", "loglik", "df")]
Gold price log-returns
Description
Gold price log-returns for the year 2023 obtained from Yahoo Finance using
the quantmod
R package.
Code used to download, format, and save the data:
gold = quantmod::getSymbols("GC=F", src = "yahoo", auto.assign = FALSE) gold = quantmod::dailyReturn(gold, type = "log") gold = data.frame("date" = as.Date(zoo::index(gold)), "log.returns" = as.vector(gold$daily.returns), row.names = NULL)
Format
A data frame with the following variables:
- date
Date (format: yyyy-mmm-dd).
- log.returns
Daily log-return.
Internal mclustAddons functions
Description
Internal functions not intended to be called directly by users.
Usage
as.MclustBounded(x, ...)
## Default S3 method:
as.MclustBounded(x, ...)
## S3 method for class 'densityMclustBounded'
as.MclustBounded(x, ...)
as.densityMclustBounded(x, ...)
## Default S3 method:
as.densityMclustBounded(x, ...)
## S3 method for class 'MclustBounded'
as.densityMclustBounded(x, ...)
Arguments
x |
An object of class specific for the method. |
... |
Further arguments passed to or from other methods. |
Marginal parameters from fitted GMMs via mclust
Description
Function to compute the marginal parameters from a fitted Gaussian mixture models.
Usage
mclustMarginalParams(object, ...)
gmm2margParams(pro, mu, sigma, ...)
Arguments
object |
An object of class |
... |
Further arguments passed to or from other methods. |
pro |
A vector of mixing proportions for each mixture component. |
mu |
A matrix of mean vectors for each mixture component. For
a |
sigma |
An array of covariance matrices for each mixture component.
For a |
Details
Given a G
-component GMM with estimated mixture weight \pi_k
,
mean vector \mu_{k}
, and covariance matrix \Sigma_{k}
, for
mixture component k = 1, \dots, G
, then the marginal distribution has:
mean vector
\mu = \sum_{k=1}^G \pi_k \mu_k
covariance matrix
\Sigma = \sum_{k=1}^G \pi_k \Sigma_k + \pi_k (\mu_k - \mu)'(\mu_k - \mu)
Value
Returns a list of two components for the mean and covariance of the marginal distribution.
Author(s)
Luca Scrucca
References
Frühwirth-Schnatter S. (2006) Finite Mixture and Markov Switching Models, Springer, Sec. 6.1.1
See Also
mclust::Mclust()
, mclust::densityMclust()
.
Examples
x = iris[,1:4]
mod = Mclust(x, G = 3)
mod$parameters$pro
mod$parameters$mean
mod$parameters$variance$sigma
mclustMarginalParams(mod)
Plotting method for model-based clustering of bounded data
Description
Plotting method for model-based clustering of bounded data
Usage
## S3 method for class 'MclustBounded'
plot(x, what = c("BIC", "classification", "uncertainty"), dimens = NULL, ...)
Arguments
x |
An object of class |
what |
A string specifying the type of graph requested. Available choices are:
|
dimens |
A vector of integers specifying the dimensions of the coordinate projections. |
... |
Further arguments passed to or from other methods. |
Value
No return value, called for side effects.
Plotting method for modal-clustering based on Gaussian Mixtures
Description
Plots for MclustMEM
objects.
Usage
## S3 method for class 'MclustMEM'
plot(
x,
dimens = NULL,
addDensity = TRUE,
addPoints = TRUE,
symbols = NULL,
colors = NULL,
cex = NULL,
labels = NULL,
cex.labels = NULL,
gap = 0.2,
...
)
Arguments
x |
An object of class |
dimens |
A vector of integers specifying the dimensions of the coordinate projections. |
addDensity |
A logical indicating whether or not to add density estimates to the plot. |
addPoints |
A logical indicating whether or not to add data points to the plot. |
symbols |
Either an integer or character vector assigning a plotting
symbol to each unique class in |
colors |
Either an integer or character vector assigning a color to
each unique class in |
cex |
A vector of numerical values specifying the size of the plotting
symbol for each unique class in |
labels |
A vector of character strings for labelling the variables. The
default is to use the column dimension names of |
cex.labels |
A numerical value specifying the size of the text labels. |
gap |
A numerical argument specifying the distance between subplots
(see |
... |
Further arguments passed to or from other methods. |
Value
No return value, called for side effects.
Author(s)
Luca Scrucca
References
Scrucca L. (2021) A fast and efficient Modal EM algorithm for Gaussian mixtures. Statistical Analysis and Data Mining, 14:4, 305–314. doi: 10.1002/sam.11527
See Also
Examples
# 1-d example
GMM <- Mclust(iris$Petal.Length)
MEM <- MclustMEM(GMM)
plot(MEM)
# 2-d example
data(Baudry_etal_2010_JCGS_examples)
GMM <- Mclust(ex4.1)
MEM <- MclustMEM(GMM)
plot(MEM)
plot(MEM, addPoints = FALSE)
plot(MEM, addDensity = FALSE)
# 3-d example
GMM <- Mclust(ex4.4.2)
MEM <- MclustMEM(GMM)
plot(MEM)
plot(MEM, addPoints = FALSE)
plot(MEM, addDensity = FALSE)
Plotting method for model-based mixture density estimation for bounded data
Description
Plots for mclustDensityBounded
objects.
Usage
## S3 method for class 'densityMclustBounded'
plot(x, what = c("BIC", "density", "diagnostic"), data = NULL, ...)
Arguments
x |
An object of class |
what |
The type of graph requested:
|
data |
Optional data points. |
... |
Further available arguments:
|
Value
No return value, called for side effects.
Author(s)
Luca Scrucca
References
Scrucca L. (2019) A transformation-based approach to Gaussian mixture density estimation for bounded data. Biometrical Journal, 61:4, 873–888. doi:10.1002/bimj.201800174
See Also
densityMclustBounded()
, predict.densityMclustBounded()
.
Examples
# univariate case with lower bound
x <- rchisq(200, 3)
dens <- densityMclustBounded(x, lbound = 0)
plot(dens, what = "BIC")
plot(dens, what = "density", data = x, breaks = 15)
# univariate case with lower & upper bound
x <- rbeta(200, 5, 1.5)
dens <- densityMclustBounded(x, lbound = 0, ubound = 1)
plot(dens, what = "BIC")
plot(dens, what = "density", data = x, breaks = 9)
# bivariate case with lower bounds
x1 <- rchisq(200, 3)
x2 <- 0.5*x1 + sqrt(1-0.5^2)*rchisq(200, 5)
x <- cbind(x1, x2)
dens <- densityMclustBounded(x, lbound = c(0,0))
plot(dens, what = "density")
plot(dens, what = "density", data = x)
plot(dens, what = "density", type = "hdr")
plot(dens, what = "density", type = "persp")
Model-based clustering estimation for bounded data
Description
Predict clustering for univariate and multivariate bounded data based on
Gaussian finite mixture models estimated by MclustBounded()
.
Usage
## S3 method for class 'MclustBounded'
predict(object, newdata, ...)
Arguments
object |
An object of class |
newdata |
A numeric vector, matrix, or data frame of observations. If
missing the density is computed for the input data obtained from the call to
|
... |
Further arguments passed to or from other methods. |
Value
Returns a list of with the following components:
-
classification
A factor of predicted cluster labels for newdata. -
z
A matrix whose[i,k]
th entry is the probability thati
th observation innewdata
belongs to thek
th cluster.
Author(s)
Luca Scrucca
References
Scrucca L. (2019) A transformation-based approach to Gaussian mixture density estimation for bounded data. Biometrical Journal, 61:4, 873–888. doi:10.1002/bimj.201800174
See Also
MclustBounded()
, plot.MclustBounded()
.
Model-based mixture density estimation for bounded data
Description
Predict density estimates for univariate and multivariate bounded data
based on Gaussian finite mixture models estimated by
densityMclustBounded()
.
Usage
## S3 method for class 'densityMclustBounded'
predict(
object,
newdata,
what = c("dens", "cdens", "z"),
logarithm = FALSE,
...
)
Arguments
object |
An object of class |
newdata |
A numeric vector, matrix, or data frame of observations. If
missing the density is computed for the input data obtained from the call to
|
what |
A character string specifying what to retrieve: |
logarithm |
A logical value indicating whether or not the logarithm of the densities/probabilities should be returned. |
... |
Further arguments passed to or from other methods. |
Value
Returns a vector or a matrix of values evaluated at newdata
depending
on the argument what
(see above).
Author(s)
Luca Scrucca
References
Scrucca L. (2019) A transformation-based approach to Gaussian mixture density estimation for bounded data. Biometrical Journal, 61:4, 873–888. doi:10.1002/bimj.201800174
See Also
densityMclustBounded()
, plot.densityMclustBounded()
.
Examples
y <- sample(0:1, size = 200, replace = TRUE, prob = c(0.6, 0.4))
x <- y*rchisq(200, 3) + (1-y)*rchisq(200, 10)
dens <- densityMclustBounded(x, lbound = 0)
summary(dens)
plot(dens, what = "density", data = x, breaks = 11)
xgrid <- seq(0, max(x), length = 201)
densx <- predict(dens, newdata = xgrid, what = "dens")
cdensx <- predict(dens, newdata = xgrid, what = "cdens")
cdensx <- sweep(cdensx, MARGIN = 2, FUN = "*", dens$parameters$pro)
plot(xgrid, densx, type = "l", lwd = 2)
matplot(xgrid, cdensx, type = "l", col = 3:4, lty = 2:3, lwd = 2, add = TRUE)
z <- predict(dens, newdata = xgrid, what = "z")
matplot(xgrid, z, col = 3:4, lty = 2:3, lwd = 2, ylab = "Posterior probabilities")
Racial data
Description
Proportion of white student enrollment in 56 school districts in Nassau County (Long Island, New York), for the 1992-1993 school year.
Format
A data frame with the following variables:
- District
School district.
- PropWhite
Proportion of white student enrolled.
Source
Simonoff, S.J. (1996) Smoothing Methods in Statistics, Springer-Verlag, New York, p. 52
Range–power transformation
Description
Functions to compute univariate range–power transformation and its back-transform.
Usage
rangepowerTransform(x, lbound = NULL, ubound = NULL, lambda = 1)
rangepowerBackTransform(y, lbound = NULL, ubound = NULL, lambda = 1)
Arguments
x |
A numeric vector of data values. |
lbound |
A numerical value of variable lower bound. |
ubound |
A numerical value of variable upper bound. |
lambda |
A numerical value for the power transformation. |
y |
A numeric vector of transformed data values. |
Details
The range-power transformation can be applied to variables with bounded support.
Lower bound case
Suppose x
is a univariate random variable with lower bounded support
\mathcal{S}_{\mathcal{X}} \equiv (l,\infty)
, where l > -\infty
.
Consider a preliminary range transformation defined as x \mapsto
(x - l)
, which maps \mathcal{S}_{\mathcal{X}} \to \mathbb{R}^{+}
.
The range-power transformation is a continuous monotonic transformation
defined as
t(x; \lambda) =
\begin{cases} \dfrac{(x-l)^{\lambda} - 1}{\lambda} &
\quad\text{if}\; \lambda \ne 0 \\[1ex] \log(x-l) &
\quad\text{if}\; \lambda = 0
\end{cases}
with back-transformation function
t^{-1}(y; \lambda) =
\begin{cases} (\lambda y + 1)^{1/\lambda} + l &
\quad\text{if}\; \lambda \ne 0 \\[1ex] \exp(y)+l &
\quad\text{if}\; \lambda = 0
\end{cases}
Lower and upper bound case
Suppose x
is a univariate random variable with bounded support
\mathcal{S}_{\mathcal{X}} \equiv (l,u)
, where -\infty < l < u <
+\infty
. Consider a preliminary range transformation defined as
x \mapsto (x - l)/(u - x)
, which maps \mathcal{S}_{\mathcal{X}}
\to \mathbb{R}^{+}
.
In this case, the range-power transformation is a continuous monotonic
transformation defined as
t(x; \lambda) =
\begin{cases}
\dfrac{ \left( \dfrac{x-l}{u-x} \right)^{\lambda} - 1}{\lambda} &
\quad\text{if}\; \lambda \ne 0 \\[2ex]
\log \left( \dfrac{x-l}{u-x} \right) &
\quad\text{if}\; \lambda = 0,
\end{cases}
with back-transformation function
t^{-1}(y; \lambda) =
\begin{cases}
\dfrac{l + u (\lambda y + 1)^{1/\lambda}}{1+(\lambda y + 1)^{1/\lambda}} &
\quad\text{if}\; \lambda \ne 0 \\[1ex] \dfrac{l + u \exp(y)}{1+\exp(y)} &
\quad\text{if}\; \lambda = 0
\end{cases}
Value
Returns a vector of transformed or back-transformed values.
Author(s)
Luca Scrucca
References
Scrucca L. (2019) A transformation-based approach to Gaussian mixture density estimation for bounded data. Biometrical Journal, 61:4, 873–888. doi:10.1002/bimj.201800174
See Also
Examples
# Lower bound case
x = rchisq(1000, 5)
y = rangepowerTransform(x, lbound = 0, lambda = 1/3)
par(mfrow=c(2,2))
hist(x, main = NULL, breaks = 21); rug(x)
hist(y, xlab = "y = t(x)", main = NULL, breaks = 21); rug(y)
xx = rangepowerBackTransform(y, lbound = 0, lambda = 1/3)
hist(xx, xlab = "t^-1(y) = x", main = NULL, breaks = 21); rug(xx)
plot(x, xx, ylab = "t^-1(y)"); abline(0,1)
# Lower and upper bound case
x = rbeta(1000, 2, 1)
y = rangepowerTransform(x, lbound = 0, ubound = 1, lambda = 0)
par(mfrow=c(2,2))
hist(x, main = NULL, breaks = 21); rug(x)
hist(y, xlab = "y = t(x)", main = NULL, breaks = 21); rug(y)
xx = rangepowerBackTransform(y, lbound = 0, ubound = 1, lambda = 0)
hist(xx, xlab = "t^-1(y) = x", main = NULL, breaks = 21); rug(xx)
plot(x, xx, ylab = "t^-1(y)"); abline(0,1)
Suicide data
Description
Lengths of treatment spells (in days) of control patients in suicide study.
Format
A vector of containing the lengths (days) of 86 spells of psychiatric treatment undergone by patients used as controls in a study of suicide risks.
Source
Silverman, B. W. (1986) Density Estimation, Chapman & Hall, Tab 2.1.