Title: | Implements Parsimonious Finite Mixtures of Multivariate Elliptical Leptokurtic-Normals |
Version: | 1.1 |
Description: | A way to fit Parsimonious Finite Mixtures of Multivariate Elliptical Leptokurtic-Normals. Two methods of estimation are implemented. |
Date: | 2023-09-08 |
Encoding: | UTF-8 |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Imports: | stats |
NeedsCompilation: | yes |
RoxygenNote: | 7.2.3 |
Packaged: | 2023-09-09 11:23:52 UTC; ryanbrowne |
Author: | Ryan Browne [aut, cre] (0000-0003-4543-0218), Luca Bagnato [ctb], Antonio Punzo [ctb] |
Maintainer: | Ryan Browne <rpbrowne@uwaterloo.ca> |
Repository: | CRAN |
Date/Publication: | 2023-09-09 12:00:02 UTC |
EM for the finite mixtures of MLN
Description
Performs a number of iterations of the EM for the multivariate elliptical leptokurtic-normal (MLN) distribution until the tolerance for the lack progress or the maximum number of iterations is reached. An implementation of parsimonious clustering models via the eigen-decomposition of the scatter matrix and allowing the concentration parameter to be varying, equal or fixed across components.
Usage
EM(
data = NULL,
G = 2,
model = NULL,
kml = c(1, 0, 1),
n = 10,
epsilon = 0.01,
gpar0 = NULL,
estimation = 1,
label = NULL
)
Arguments
data |
A n x p matrix of observations. |
G |
A integer determine the number of components of the mixture model. |
model |
a character of length 4 such as "VVVV", indicating the model; the covariance and beta parameters. The 1st position controls, lambda, the volume; "V" varying across components or "E" equal across components. The 2nd position controls the eigenvalues; V" varying across components, "E" equal across components or "I" the identity matrix. The 3rd position controls the orientation; "V" varying across components, "E" equal across components or "I" the identity matrix. The 4th position controls the concentration, beta; "V" varying across components, "E" equal across components, "F" fixed at the maximum value. |
kml |
a vector of length 3 indicating, the number of k-means starts, number of random starts and the number of EM iterations used for each start |
n |
The maximum number of EM iterations. |
epsilon |
The tolerance for the stopping rule; lack of progress. The default is 1e-6 but it depends on the dataset. |
gpar0 |
A list of model parameters . |
estimation |
If 1 (default) use the fixed point iterations and if 2 the MM algorithm. |
label |
If |
Value
A list with following items
loglik - A vector of the loglikelihood values
gpar - A list containing the parameters values
z - A n x G matrix of the posterior probabilities
map - A vector the maximum a posteriori derived from z
label - The input provided.
numpar - The number of free parameters in the fitted model.
maxLoglik - The largest value from loglik.
Examples
x1 = rmln(n=100, d=4, mu=rep(5,4), diag(4), beta=2)
x2 = rmln(n=100, d=4, mu=rep(-5,4), diag(4), beta=2)
x = rbind( x1,x2)
mlnFit = EM(data=x, G=2, model="VVVF")
Compare the two methods of estimation
Description
Compare the two methods of estimation for fitting a finite mixture of multivariate elliptical leptokurtic-normal distributions; fixed point iterations and MM algorithm.
Usage
compareEstimation(
mod = NULL,
data = NULL,
G = NULL,
n = 10^4,
tol = 1e-06,
wt = NULL,
n0 = 25,
lab = NULL
)
Arguments
mod |
A character of length 4 such as "VVVV", indicating the model; the covariance and beta parameters. |
data |
A n x p matrix of observations. |
G |
The number of components to fit. |
n |
The maximum number of EM iterations. |
tol |
The tolerance for the stopping rule; lack of progress. The default is 1e-6 but it depends on the dataset. |
wt |
a (n x d) matrix of weights for initialization if NULL, then a random weight matrix is generated. |
n0 |
Given wt, the number of iterations used to obtain the initial parameters |
lab |
Using given labels (lab) as starting values. |
Value
A vector of times, number of iterations and log-likelihood values.
Parsimonious model-based clustering with the multivariate elliptical leptokurtic-normal
Description
Performs parsimonious clustering with the multivariate elliptical leptokurtic-normal (MLN). There are 14 possible scale matrix structure and 2 for the kurtosis parameter for a total of 28 models.
Usage
pmln(
data = NULL,
G = 1:3,
covModels = NULL,
betaModels = "B",
kml = c(1, 0, 1),
label = NULL,
scale.data = TRUE,
veo = FALSE,
iterMax = 1000,
tol = 1e-08,
pprogress = FALSE,
method = "FP"
)
Arguments
data |
A n x p matrix of observations. |
G |
A integer determine the number of components of the mixture model. |
covModels |
if NULL fit 14 possible scale matrix structures. Otherwise a character vector where each element has length 3. e.g. c("VVV", "EEE") A character of length 4 such as "VVVV", indicating the model; the covariance and beta parameters. The 1st position controls, lambda, the volume; "V" varying across components or "E" equal across components. The 2nd position controls the eigenvalues; V" varying across components, "E" equal across components or "I" the identity matrix. The 3rd position controls the orientation; "V" varying across components, "E" equal across components or "I" the identity matrix. |
betaModels |
set to "V", "E", "B", "F". "V" varying across components, "E" equal across components, "B" consider both "V" & "E", "F" fixed at the maximum value. |
kml |
a vector of length 3 indicating, the number of k-means starts, number of random starts and the number of EM iterations used for each start |
label |
If |
scale.data |
Should the data be scaled before clustering. The default is TRUE. |
veo |
"Variables exceed observations". If TRUE, fit the model even though the number variables in the model exceeds the number of observations. |
iterMax |
The maximum number of EM iterations for each model fitted. |
tol |
The tol for the stopping rule; lack of progress. The default is 1e-6 but it depends on the data set. |
pprogress |
If TRUE print the progress of the function. |
method |
If FP use the fixed point iteration method otherwise if MM use the MM method. |
Value
A list of
startobject - A statement on how the models were initialized
gpar - A list of parameter values for the model choosen by the BIC
loglik - A vector of the log-likelihoods values
z - A n x G matrix of the posterior probabilities from the model choosen by the BIC
map - A vector the maximum a posteriori derived from z
BIC - An array with dimensions (G, number of fitted models, 3). The last dimension indices the loglik, number of free parameters and BIC for each fitted model.
bicModel - Information as list on the model choosen by the BIC.
Examples
x1 = rmln(n=100, d=4, mu=rep(5,4), diag(4), beta=2)
x2 = rmln(n=100, d=4, mu=rep(-5,4), diag(4), beta=2)
x = rbind( x1,x2)
mlnFit = pmln(data=x, G=2, covModels=c("VVV", "EEE"), betaModels="B")
Generate realizations from the multivariate elliptical leptokurtic-normal distribution
Description
This function calculates the log cumulative density function for the multivariate-t with scale matrix equal to the identity matrix. It finds the mode and then uses Gaussian quadrature to estimate the integral.
Usage
rmln(n = NULL, d = NULL, mu = NULL, Sigma = NULL, beta = NULL)
Arguments
n |
number of observations |
d |
the dimension of the observations |
mu |
location parameter of length d |
Sigma |
(d x d) scatter matrix |
beta |
the concentration parameter |
Value
A (n x d) matrix of realizations
Examples
x = rmln(n=10, d=4, mu=rep(0,4), diag(4), beta=2)