Version: | 0.6.1 |
Date: | 2025-05-20 |
Title: | Fitting of Univariate Mixture Distributions to Data using Various Approaches |
Depends: | R (≥ 2.0.1) |
Description: | Methods for fitting mixture distributions to univariate data using expectation maximization, HWHM and other methods. Supports Gaussian, Cauchy, Student's t and von Mises mixtures. For more details see Merkys (2018) https://www.lvb.lt/permalink/370LABT_NETWORK/1m6ui06/alma9910036312108451. |
License: | GPL-2 |
NeedsCompilation: | yes |
Packaged: | 2025-05-20 08:53:17 UTC; andrius |
Author: | Andrius Merkys [aut, cre] |
Maintainer: | Andrius Merkys <andrius.merkys@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-05-23 12:02:06 UTC |
Absolute Convergence Check.
Description
Compare two values to tell whether an optimization process has converged.
Usage
abs_convergence( p_now, p_prev, epsilon = 1e-6 )
Arguments
p_now |
function value of i-th iteration. |
p_prev |
function value of i-1-th iteration. |
epsilon |
convergence criterion |
Value
TRUE if deemed to have converged, FALSE otherwise
Author(s)
Andrius Merkys
Bhattacharyya distance for univariate Gaussian distributions.
Description
Measures Bhattacharyya distance for two univariate Gaussian distributions.
Usage
bhattacharyya_dist( mu1, mu2, sigma1, sigma2 )
Arguments
mu1 |
mean of the first Gaussian distribution. |
mu2 |
mean of the second Gaussian distribution. |
sigma1 |
standard deviation of the first Gaussian distribution. |
sigma2 |
standard deviation of the second Gaussian distribution. |
Value
Bhattacharyya distance as double.
Author(s)
Andrius Merkys
Bayesian Information Criterion (BIC)
Description
Calculates Bayesian Information Criterion (BIC) for any type of mixture model. Log-likelihood function has to be provided.
Usage
bic( x, p, llf )
Arguments
x |
data vector |
p |
vector of mixture model parameters |
llf |
function calculating log-likelihood, called as llf( x, p ) |
Value
Bayesian Information Criterion value.
Author(s)
Andrius Merkys
Estimate Cauchy Mixture parameters using Expectation Maximization.
Description
Estimates parameters for Caucy mixture using Expectation Maximization algorithm.
Usage
cmm_fit_em( x, p, epsilon = c( 0.000001, 0.000001, 0.000001 ),
iter.cauchy = 20, debug = FALSE, implementation = "C" )
Arguments
x |
data vector |
p |
initialization vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, gamma1, gamma2, ..., gamman ), where Ai is the proportion of i-th component, mui is the center of i-th component and gammai is the Cauchy scale of i-th component. |
epsilon |
tolerance threshold for convergence. Structure of epsilon is epsilon = c( epsilon_A, epsilon_mu, epsilon_gamma ), where epsilon_A is threshold for component proportions, epsilon_mu is threshold for component centers and epsilon_gamma is threshold for component Cauchy scales. |
iter.cauchy |
number of iterations to fit a single Cauchy component. |
debug |
flag to turn the debug prints on/off. |
implementation |
flag to switch between C (default) and R implementations. |
Value
Vector of mixture parameters, whose structure is the same as of input parameter's p.
Author(s)
Andrius Merkys
References
Ferenc Nahy. Parameter Estimation of the Cauchy Distribution in Information Theory Approach (2006). Journal of Universal Computer Science
Estimate Cauchy Mixture Parameters Using Derivatives and Half-Width at Half-Maximum Method.
Description
Estimate Cauchy mixture parameters using derivatives and half-width at half-maximum (HWHM) method. The method smooths the histogram before attempting to locate the modes. Then it describes them using HWHM.
Usage
cmm_fit_hwhm_spline_deriv( x, y )
Arguments
x |
data vector |
y |
response vector for x |
Value
Parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, gamma1, gamma2, ..., gamman ), where Ai is the proportion of i-th component, mui is the center of i-th component and gammai is the Cauchy scale of i-th component.
Author(s)
Andrius Merkys
Estimate Cauchy Mixture parameters using Expectation Maximization.
Description
Estimate an initialization vector for Cauchy mixture fitting via Expectation Maximization. Proportions are set to equal, centers are equispaced through the whole domain of input sample, and scales are set to 1.
Usage
cmm_init_vector( x, m, implementation = "C" )
Arguments
x |
data vector |
m |
number of mixture components |
implementation |
flag to switch between C (default) and R implementations. |
Value
Parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, gamma1, gamma2, ..., gamman ), where Ai is the proportion of i-th component, mui is the center of i-th component and gammai is the Cauchy scale of i-th component.
Author(s)
Andrius Merkys
Estimate Cauchy Mixture parameters using Expectation Maximization.
Description
Estimate an initialization vector for Cauchy mixture fitting using k-means. R implementation of k-means in kmeans() is used to find data point assignment to clusters. Then several iterations of Cauchy mixture fitting (per Nahy 2006) is used to derive mixture parameters.
Usage
cmm_init_vector_kmeans( x, m, iter.cauchy = 20 )
Arguments
x |
data vector |
m |
number of mixture components |
iter.cauchy |
number of iterations to fit a single Cauchy component. |
Value
Parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, gamma1, gamma2, ..., gamman ), where Ai is the proportion of i-th component, mui is the center of i-th component and gammai is the Cauchy scale of i-th component.
Author(s)
Andrius Merkys
References
Ferenc Nahy. Parameter Estimation of the Cauchy Distribution in Information Theory Approach (2006). Journal of Universal Computer Science
Intersections of Two Cauchy Distributions
Description
Finds intersections of two Cauchy distributions by finding roots of a quadratic equation.
Usage
cmm_intersections( p )
Arguments
p |
parameter vector of 6 parameters. Structure of p vector is p = c( A1, A2, mu1, mu2, gamma1, gamma2 ), where Ai is the proportion of i-th component, mui is the location of i-th component, gammai is the Cauchy scale of i-th component. |
Value
A vector of x values of intersections (zero, one or two). Returns NaN if both distributions are identical.
Author(s)
Andrius Merkys
Density of The Cauchy-Gaussian Distribution
Description
Density function for the Cauchy-Gaussian distribution, according to Eqn. 2 of Swami (2000).
Usage
dcgmm( x, p )
Arguments
x |
data vector |
p |
parameter vector of 5*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, e1, e2, ..., en, mu1, mu2, ..., mun, gamma1, gamma2, ..., gamman, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, ei is the proportion of Cauchy subcomponent of i-th component, mui is the center of i-th component, gammai is the Cauchy concentration of i-th component and sigmai is the Gaussian standard deviation of i-th component. |
Value
A vector.
Author(s)
Andrius Merkys
References
Swami, A. Non-Gaussian mixture models for detection and estimation in heavy-tailed noise 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), 2000, 6, 3802-3805
Density of The Cauchy Mixture Distribution
Description
Density function for the Cauchy mixture distribution.
Usage
dcmm( x, p, implementation = "C" )
Arguments
x |
data vector |
p |
parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, gamma1, gamma2, ..., gamman ), where Ai is the proportion of i-th component, mui is the location of i-th component, gammai is the Cauchy scale of i-th component. |
implementation |
flag to switch between C (default) and R implementations. |
Value
A vector.
Author(s)
Andrius Merkys
The Gaussian Mixture Distribution
Description
Density function for the Gaussian mixture distribution.
Usage
dgmm( x, p, normalise_proportions = FALSE, restrict_sigmas = FALSE,
implementation = "C" )
Arguments
x |
data vector |
p |
parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the location of i-th component, sigmai is the scale of i-th component. |
normalise_proportions |
if TRUE, make component proportions sum up to 1 by dividing each one of them by their sum (R implementation only). |
restrict_sigmas |
if TRUE, skip components with scales less or equal to zero (R implementation only). |
implementation |
flag to switch between C (default) and R implementations. |
Value
A vector.
Author(s)
Andrius Merkys
Calculate Approximate Value of The Digamma Function.
Description
Calculates approximate value of the digamma function using first eight non-zero members of asymptotic expression for digamma(x). Implemented according to Wikipedia.
Usage
digamma_approx( x )
Arguments
x |
data vector |
Value
Digamma function value.
Author(s)
Andrius Merkys
References
Users of Wikipedia. Digamma function. https://en.wikipedia.org/w/index.php?title=Digamma_function&oldid=708779689
Density of The Student's t Model
Description
Density function for the Student's t Model. Wrapper around R's dt(), supporting center and concentration parameters.
Usage
ds( x, c, s, ni )
Arguments
x |
data vector |
c |
center |
s |
concentration |
ni |
degrees of freedom |
Value
A vector.
Author(s)
Andrius Merkys
Density of The Student's t Mixture Model
Description
Density function for the Student's t Mixture Model.
Usage
dsmm( x, p )
Arguments
x |
data vector |
p |
parameter vector of 4*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn, ni1, ni2, ..., nin ), where Ai is the proportion of i-th component, mui is the center of i-th component, ki is the concentration of i-th component and nii is the degrees of freedom of i-th component. |
Value
A vector.
Author(s)
Andrius Merkys
Density of The von Mises Mixture Model.
Description
Density function for the von Mises Mixture Model.
Usage
dvmm( x, p, implementation = "C" )
Arguments
x |
data vector |
p |
parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn ), where Ai is the proportion of i-th component, mui is the center of i-th component and ki is the concentration of i-th component. |
implementation |
flag to switch between C (default) and R implementations. |
Value
A vector.
Author(s)
Andrius Merkys
Estimate Gaussian Mixture parameters using Expectation Maximization.
Description
Estimates parameters for Gaussian mixture using Expectation Maximization algorithm.
Usage
gmm_fit_em( x, p, w = numeric(), epsilon = c( 0.000001, 0.000001, 0.000001 ),
debug = FALSE, implementation = "C", ... )
Arguments
x |
data vector |
p |
initialization vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the center of i-th component and sigmai is the scale of i-th component. |
w |
weights of data points, must have the same length as the data vector; if not given or has different length, equal weights are assumed. |
epsilon |
tolerance threshold for convergence. Structure of epsilon is epsilon = c( epsilon_A, epsilon_mu, epsilon_sigma ), where epsilon_A is threshold for component proportions, epsilon_mu is threshold for component centers and epsilon_sigma is threshold for component scales. |
debug |
flag to turn the debug prints on/off. |
implementation |
flag to switch between C (default) and R implementations. |
... |
additional arguments passed to gmm_fit_em_R() when R implementation is used. |
Value
Vector of mixture parameters, whose structure is the same as of input parameter's p.
Author(s)
Andrius Merkys
Estimate Gaussian Mixture Parameters Using Half-Width at Half-Maximum Method.
Description
Estimate Gaussian mixture parameters using half-width at half-maximum (HWHM) method. Given a histogram, the method attempts to locate most prominent modes and describe them using HWHM.
Usage
gmm_fit_hwhm( x, y, n )
Arguments
x |
data vector |
y |
response vector for x |
n |
number of mixture components |
Value
Parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the location of i-th component, sigmai is the scale of i-th component.
Author(s)
Andrius Merkys
Estimate Gaussian Mixture Parameters Using Derivatives and Half-Width at Half-Maximum Method.
Description
Estimate Gaussian mixture parameters using derivatives and half-width at half-maximum (HWHM) method. The method smooths the histogram before attempting to locate the modes. Then it describes them using HWHM.
Usage
gmm_fit_hwhm_spline_deriv( x, y )
Arguments
x |
data vector |
y |
response vector for x |
Value
Parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the location of i-th component, sigmai is the scale of i-th component.
Author(s)
Andrius Merkys
Estimate Gaussian Mixture parameters from kmeans.
Description
Estimates parameters for Gaussian mixture using kmeans.
Usage
gmm_fit_kmeans( x, n )
Arguments
x |
data vector |
n |
number of mixture components |
Value
Vector of 3*n mixture parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the location of i-th component, sigmai is the scale of i-th component.
Author(s)
Andrius Merkys
Estimate Gaussian Mixture parameters using Expectation Maximization.
Description
Estimate an initialization vector for Gaussian mixture fitting via Expectation Maximization. Proportions and scales are set to equal, centers are equispaced through the whole domain of input sample.
Usage
gmm_init_vector( x, n, implementation = "C" )
Arguments
x |
data vector |
n |
number of mixture components |
implementation |
flag to switch between C (default) and R implementations. |
Value
Parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the location of i-th component, sigmai is the scale of i-th component.
Author(s)
Andrius Merkys
Estimate Gaussian Mixture parameters using Expectation Maximization.
Description
Estimate an initialization vector for Gaussian mixture fitting using k-means. R implementation of k-means in kmeans() is used to find data point assignment to clusters.
Usage
gmm_init_vector_kmeans( x, m )
Arguments
x |
data vector |
m |
number of mixture components |
Value
Parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the location of i-th component, sigmai is the scale of i-th component.
Author(s)
Andrius Merkys
Estimate Gaussian Mixture parameters using Expectation Maximization.
Description
Estimate an initialization vector for Gaussian mixture fitting using (weighted) quantiles. Proportions and scales are set to equal, centers are placed at equispaced quantiles.
Usage
gmm_init_vector_quantile( x, m, w = numeric() )
Arguments
x |
data vector |
m |
number of mixture components |
w |
weight vector |
Value
Parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the location of i-th component, sigmai is the scale of i-th component.
Author(s)
Andrius Merkys
Intersections of Two Gaussian Distributions
Description
Finds intersections of two Gaussian distributions by finding roots of a quadratic equation.
Usage
gmm_intersections( p )
Arguments
p |
parameter vector of 6 parameters. Structure of p vector is p = c( A1, A2, mu1, mu2, sigma1, sigma2 ), where Ai is the proportion of i-th component, mui is the location of i-th component, sigmai is the scale of i-th component. |
Value
A vector of x values of intersections (zero, one or two). Returns NaN if both distributions are identical.
Author(s)
Andrius Merkys
Merge two Gaussian components into one.
Description
Merges ith and jth components of Gaussian mixture
model. Implemented in the same venue as in
mergeparameters
of fpc
.
Usage
gmm_merge_components( x, p, i, j )
Arguments
x |
data vector |
p |
vector of Gaussian mixture parameters. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn ), where n is number of mixture components, Ai is the proportion of i-th component, mui is the center of i-th component, ki is the concentration of i-th component. |
i |
index of the first component to be merged. Component with this index will be replaced by a merged one in the output. |
j |
index of the second component to be merged. Component with this index will be removed in the output. |
Value
Vector of mixture parameters, whose structure is the same as of input parameter's p.
Author(s)
Andrius Merkys
References
Hennig, C. Methods for merging Gaussian mixture components Advances in Data Analysis and Classification, Springer Nature, 2010, 4, 3-34
The Gaussian Mixture Distribution
Description
Calculates the posterior probability of a Gaussian mixture with n components. Internally, it attempts to maximize log-likelihood of data by calling optim() and returns the list as received from optim().
Usage
gmm_size_probability( x, n, method = "SANN" )
Arguments
x |
data vector |
n |
number of mixture components |
method |
optimization method passed to optim() |
Value
List representing the converged optim() run.
Author(s)
Andrius Merkys
The Gaussian Mixture Distribution
Description
Calculates the posterior probability of a Gaussian mixture with n components. Internally, it bins the data vector and calls nls() to optimize the mixture fit. Returns the list of the same form as received from optim().
Usage
gmm_size_probability_nls( x, n, bins = 100, trace = FALSE )
Arguments
x |
data vector |
n |
number of mixture components |
bins |
number of bins |
trace |
should debug trace be printed? |
Value
List of the same form as received from optim().
Author(s)
Andrius Merkys
Gradient Descent
Description
Simple implementation of gradient descent method. Given a derivative function, it follows its decrease until convergence criterion is met.
Usage
gradient_descent( gradfn, start, gamma = 0.1, ..., epsilon = 0.01 )
Arguments
gradfn |
derivative function |
start |
starting value |
gamma |
learning rate |
... |
additional arguments passed to derivative function |
epsilon |
convergence threshold for absolute squared difference |
Value
log-likelihood
Author(s)
Andrius Merkys
Kullback–Leibler Divergence of ith Student's t Mixture component.
Description
Measures Kullback–Leibler divergence of ith Student's t Mixture component using Dirac's delta function. Implemented according to Chen et al. (2004).
Usage
kldiv( x, p, k )
Arguments
x |
data vector |
p |
vector of Student's t mixture parameters. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn, ni1, ni2, ..., nin ), where n is number of mixture components, Ai is the proportion of i-th component, mui is the center of i-th component, ki is the concentration of i-th component and nii is the degrees of freedom of i-th component. |
k |
number of the component. |
Value
Kullback–Leibler divergence as double.
Author(s)
Andrius Merkys
References
Chen, S.; Wang, H. & Luo, B. Greedy EM Algorithm for Robust T-Mixture Modeling Third International Conference on Image and Graphics (ICIG'04), Institute of Electrical & Electronics Engineers (IEEE), 2004, 548–551
K-Means Clustering for Points on Circle
Description
Perform k-means clustering on angular data (in degrees).
Usage
kmeans_circular( x, centers, iter.max = 10 )
Arguments
x |
data vector |
centers |
vector of initial centers (in degrees) |
iter.max |
maximum number of iterations |
Value
Vector of the same length as centers defining cluster centers (in degrees).
Author(s)
Andrius Merkys
Log-likelihood for Cauchy Mixture
Description
Calculates log-likelihood for a given data vector using a Cauchy mixture distribution.
Usage
llcmm( x, p, implementation = "C" )
Arguments
x |
data vector |
p |
parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, gamma1, gamma2, ..., gamman ), where Ai is the proportion of i-th component, mui is the center of i-th component and gammai is the Cauchy scale of i-th component. |
implementation |
flag to switch between C (default) and R implementations. |
Value
log-likelihood
Author(s)
Andrius Merkys
Log-likelihood for Gaussian Mixture
Description
Calculates log-likelihood for a given data vector using a Gaussian mixture distribution.
Usage
llgmm( x, p, implementation = "C" )
Arguments
x |
data vector |
p |
parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the center of i-th component and sigmai is the scale of i-th component. |
implementation |
flag to switch between C (default) and R implementations. |
Value
log-likelihood
Author(s)
Andrius Merkys
Log-likelihood for Gaussian Mixture
Description
Calculates log-likelihood for a given data vector using a Gaussian mixture distribution. This is a straightforward implementation, different from llgmm() in that that it does not detect and shortcut edge cases.
Usage
llgmm_conservative( x, p )
Arguments
x |
data vector |
p |
parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the center of i-th component and sigmai is the scale of i-th component. |
Value
log-likelihood
Author(s)
Andrius Merkys
Opposite Log-likelihood for Gaussian Mixture
Description
Calculates opposite log-likelihood for a given data vector using a Gaussian mixture distribution.
Usage
llgmm_opposite( x, p )
Arguments
x |
data vector |
p |
parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the center of i-th component and sigmai is the scale of i-th component. |
Value
opposite log-likelihood (negated log-likelihood value)
Author(s)
Andrius Merkys
Log-likelihood for Student's t Mixture
Description
Calculates log-likelihood for a given data vector using a Student's t mixture distribution.
Usage
llsmm( x, p )
Arguments
x |
data vector |
p |
parameter vector of 4*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn, ni1, ni2, ..., nin ), where Ai is the proportion of i-th component, mui is the center of i-th component, ki is the concentration of i-th component and nii is the degrees of freedom of i-th component. |
Value
log-likelihood
Author(s)
Andrius Merkys
Log-likelihood for von Mises Mixture
Description
Calculates log-likelihood for a given data vector using a von Mises mixture distribution.
Usage
llvmm( x, p, implementation = "C" )
Arguments
x |
data vector |
p |
parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn ), where Ai is the proportion of i-th component, mui is the center of i-th component and ki is the concentration of i-th component. |
implementation |
flag to switch between C (default) and R implementations. |
Value
log-likelihood
Author(s)
Andrius Merkys
Opposite Log-likelihood for von Mises Mixture
Description
Calculates opposite log-likelihood for a given data vector using a von Mises mixture distribution.
Usage
llvmm_opposite( x, p )
Arguments
x |
data vector |
p |
parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn ), where Ai is the proportion of i-th component, mui is the center of i-th component and ki is the concentration of i-th component. |
Value
opposite log-likelihood (negated log-likelihood value)
Author(s)
Andrius Merkys
Mixture Distribution Modeling
Description
Draw a PNG histogram with a mixture density on top of it for each iteration of mixture optimization process.
Usage
mk_fit_images( h, l, prefix = "img_" )
Arguments
h |
histogram object, as returned from hist() |
l |
list containing model vectors |
prefix |
prefix of file name to write |
Author(s)
Andrius Merkys
Mixture Distribution Modeling
Description
Plot a circular histogram.
Usage
plot_circular_hist( x, breaks = 72, ball = 0.5, ... )
Arguments
x |
data vector |
breaks |
number of breaks in histogram |
ball |
radius of the drawn circle |
... |
parameters passed to plot() |
Author(s)
Andrius Merkys
Mixture Distribution Modeling
Description
Draw a PNG histogram with a mixture density on top of it.
Usage
plot_density( x, model, density_f, width, height,
cuts = 400, main = "",
filename = NULL,
obs_good = c(), obs_bad = c(),
scale_density = FALSE )
Arguments
x |
data vector |
cuts |
number of breaks in histogram |
main |
main title of the plot |
model |
model passed to density_f() |
density_f |
probability density function |
filename |
name of the file to write |
width |
image width, passed to png() |
height |
image height, passed to png() |
obs_good |
vector of values to mark with rug() in green color |
obs_bad |
vector of values to mark with rug() in red color |
scale_density |
should probability density be scaled? |
Author(s)
Andrius Merkys
Find one real polynomial root using Newton–Raphson method.
Description
Finds one real polynomial root using Newton–Raphson method, implemented according to Wikipedia.
Usage
polyroot_NR( p, init = 0, epsilon = 1e-6, debug = FALSE, implementation = "C" )
Arguments
p |
vector of polynomial coefficients. |
init |
initial value. |
epsilon |
tolerance threshold for convergence. |
debug |
flag to turn the debug prints on/off. |
implementation |
flag to switch between C (default) and R implementations. |
Value
Real polynomial root.
Author(s)
Andrius Merkys
References
Users of Wikipedia. Newton's method. https://en.wikipedia.org/w/index.php?title=Newton%27s_method&oldid=710342140
Penalized Sum of Squared Differences Using Gaussian Mixture Distribution
Description
Given two vectors of same length and a Gaussian mixture, calculate the penalized sum of squared differences (SSD) between the first vector and Gaussian mixture densities measured at points from second vector. Penalties are included for proportions and scales that are less than or equal to 0.
Usage
pssd( x, y, p )
Arguments
x |
data vector |
y |
response vector |
p |
parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the location of i-th component, sigmai is the scale of i-th component. |
Value
Penalized sum of squared differences.
Author(s)
Andrius Merkys
Penalized Sum of Squared Differences Using Gaussian Mixture Distribution
Description
Gradient (derivative) function of pssd().
Usage
pssd_gradient( x, y, p )
Arguments
x |
data vector |
y |
response vector |
p |
parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the location of i-th component, sigmai is the scale of i-th component. |
Value
Gradient values measured at x.
Author(s)
Andrius Merkys
Ratio Convergence Check.
Description
Compare two values to tell whether an optimization process has converged. The absolute difference between values of two iterations is divided by the value of previous iteration and compared to the epsilon value.
Usage
ratio_convergence( p_now, p_prev, epsilon = 1e-6 )
Arguments
p_now |
function value of i-th iteration. |
p_prev |
function value of i-1-th iteration. |
epsilon |
convergence criterion |
Value
TRUE if deemed to have converged, FALSE otherwise
Author(s)
Andrius Merkys
Random Sample of The Cauchy Mixture Distribution
Description
Generates a random sample of the Cauchy mixture distribution.
Usage
rcmm( n, p )
Arguments
n |
sample size |
p |
parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, gamma1, gamma2, ..., gamman ), where Ai is the proportion of i-th component, mui is the location of i-th component, gammai is the Cauchy scale of i-th component. |
Value
A vector.
Author(s)
Andrius Merkys
Random Sample of the Gaussian Mixture Distribution
Description
Generates a random sample of the Gaussian mixture distribution.
Usage
rgmm( n, p )
Arguments
n |
data vector |
p |
parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the location of i-th component, sigmai is the scale of i-th component. |
Value
A vector.
Author(s)
Andrius Merkys
Nelder-Mead's Simplex Method for Function Minimization.
Description
Generate initial simplices for simplex().
Usage
rsimplex_start( seed, n, lower, upper )
Arguments
seed |
seed for random number generator |
n |
number of simplices |
lower |
vector with lower bounds of each dimension |
upper |
vector with upper bounds of each dimension |
Value
A list with n simplices.
Author(s)
Andrius Merkys
Random Sample of the von Mises Mixture Model.
Description
Generates a random sample of the von Mises Mixture Model.
Usage
rvmm( n, p )
Arguments
n |
sample size |
p |
parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn ), where Ai is the proportion of i-th component, mui is the center of i-th component and ki is the concentration of i-th component. |
Value
A vector.
Author(s)
Andrius Merkys
References
Best & Fisher. Efficient Simulation of the von Mises Distribution. Journal of the RSS, Series C, 1979, 28, 152-157.
Estimate Student's t distribution parameters using Batch Approximation Algorithm.
Description
Estimates parameters for univariate Student's t distribution parameters using Batch Approximation Algorithm, according to Fig. 2 of Aeschliman et al. (2010).
Usage
s_fit_primitive( x )
Arguments
x |
data vector |
Value
Vector c( mu, k, ni )
, where
mu is the center,
k is the concentration and
ni is the degrees of freedom of the distribution.
Author(s)
Andrius Merkys
References
Aeschliman, C.; Park, J. & Kak, A. C. A Novel Parameter Estimation Algorithm for the Multivariate t-Distribution and Its Application to Computer Vision European Conference on Computer Vision 2010, 2010 https://engineering.purdue.edu/RVL/Publications/Aeschliman2010ANovel.pdf
Nelder-Mead's Simplex Method for Function Minimization.
Description
Nelder-Mead's Simplex Method for Function Minimization.
Usage
simplex( fn, start, ..., epsilon = 0.000001, alpha = 1,
gamma = 2, rho = 0.5, delta = 0.5, trace = FALSE )
Arguments
fn |
minimized function, has to accept the argmin vector as first parameter |
start |
start vector |
... |
other parameters passed to the minimized function |
epsilon |
convergence criterion |
alpha |
reflection coefficient |
gamma |
expansion coefficient |
rho |
contraction coefficient |
delta |
shrink coefficient |
trace |
should debug trace be printed? |
Value
Vector yielding the minimum value of the minimized function
Author(s)
Andrius Merkys
References
Nelder, J. A. & Mead, R. A Simplex Method For Function Minimization. The Computer Journal, 1965, 308-313.
Users of Wikipedia. Nelder-Mead method. https://en.wikipedia.org/w/index.php?title=Nelder%E2%80%93Mead_method&oldid=1287347131
Estimate Student's t Mixture parameters using Expectation Maximization.
Description
Estimates parameters for Student's t mixture using Expectation Maximization algorithm. Calls smm_fit_em_APK10().
Usage
smm_fit_em( x, p, ... )
Arguments
x |
data vector |
p |
initialization vector of 4*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn, ni1, ni2, ..., nin ), where Ai is the proportion of i-th component, mui is the center of i-th component, ki is the concentration of i-th component and nii is the degrees of freedom of i-th component. |
... |
additional arguments passed to smm_fit_em_GNL08(). |
Value
Vector of mixture parameters, whose structure is the same as of input parameter's p.
Author(s)
Andrius Merkys
Estimate Student's t Mixture parameters using Expectation Maximization.
Description
Estimates parameters for univariate Student's t mixture using Expectation Maximization algorithm, according to Fig. 2 of Aeschliman et al. (2010).
Usage
smm_fit_em_APK10( x, p, epsilon = c( 1e-6, 1e-6, 1e-6, 1e-6 ),
collect.history = FALSE, debug = FALSE )
Arguments
x |
data vector |
p |
initialization vector of 4*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn, ni1, ni2, ..., nin ), where Ai is the proportion of i-th component, mui is the center of i-th component, ki is the concentration of i-th component and nii is the degrees of freedom of i-th component. |
epsilon |
tolerance threshold for convergence. Structure of epsilon is epsilon = c( epsilon_A, epsilon_mu, epsilon_k, epsilon_ni ), where epsilon_A is threshold for component proportions, epsilon_mu is threshold for component centers, epsilon_k is threshold for component concentrations and epsilon_ni is threshold for component degrees of freedom. |
collect.history |
flag to turn accumulation of estimation history on/off. |
debug |
flag to turn the debug prints on/off. |
Value
A list.
Author(s)
Andrius Merkys
References
Aeschliman, C.; Park, J. & Kak, A. C. A Novel Parameter Estimation Algorithm for the Multivariate t-Distribution and Its Application to Computer Vision European Conference on Computer Vision 2010, 2010 https://engineering.purdue.edu/RVL/Publications/Aeschliman2010ANovel.pdf
Greedily estimate Student's t Mixture parameters using Expectation Maximization.
Description
Estimates (greedily) parameters for univariate Student's t mixture using Expectation Maximization algorithm, implemented according to Chen et al. (2004). The algorithm relies upon smm_fit_em_GNL08() to estimate mixture parameters iteratively.
Usage
smm_fit_em_CWL04( x, p, collect.history = FALSE, debug = FALSE,
... )
Arguments
x |
data vector |
p |
initialization vector of 4*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn, ni1, ni2, ..., nin ), where Ai is the proportion of i-th component, mui is the center of i-th component, ki is the concentration of i-th component and nii is the degrees of freedom of i-th component. |
collect.history |
logical. If set to TRUE, a list of parameter values of all iterations is returned. |
debug |
flag to turn the debug prints on/off. |
... |
parameters passed to smm_fit_em_GNL08(). |
Value
A list.
Author(s)
Andrius Merkys
References
Chen, S.; Wang, H. & Luo, B. Greedy EM Algorithm for Robust T-Mixture Modeling Third International Conference on Image and Graphics (ICIG'04), Institute of Electrical & Electronics Engineers (IEEE), 2004, 548–551
Estimate Student's t Mixture parameters using Expectation Maximization.
Description
Estimates parameters for univariate Student's t mixture using Expectation Maximization algorithm, according to Eqns. 12–17 of Gerogiannis et al. (2009).
Usage
smm_fit_em_GNL08( x, p, epsilon = c( 1e-6, 1e-6, 1e-6, 1e-6 ),
collect.history = FALSE, debug = FALSE,
min.sigma = 1e-256, min.ni = 1e-256,
max.df = 1000, max.steps = Inf,
polyroot.solution = 'jenkins_taub',
convergence = abs_convergence,
unif.component = FALSE )
Arguments
x |
data vector |
p |
initialization vector of 4*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn, ni1, ni2, ..., nin ), where Ai is the proportion of i-th component, mui is the center of i-th component, ki is the concentration of i-th component and nii is the degrees of freedom of i-th component. |
epsilon |
tolerance threshold for convergence. Structure of epsilon is epsilon = c( epsilon_A, epsilon_mu, epsilon_k, epsilon_ni ), where epsilon_A is threshold for component proportions, epsilon_mu is threshold for component centers, epsilon_k is threshold for component concentrations and epsilon_ni is threshold for component degrees of freedom. |
collect.history |
logical. If set to TRUE, a list of parameter values of all iterations is returned. |
debug |
flag to turn the debug prints on/off. |
min.sigma |
minimum value of sigma |
min.ni |
minimum value of degrees of freedom |
max.df |
maximum value of degrees of freedom |
max.steps |
maximum number of steps, may be infinity |
polyroot.solution |
polyroot finding method used to approximate digamma function. Possible values are 'jenkins_taub' and 'newton_raphson'. |
convergence |
function to use for convergence checking. Must accept function values of the last two iterations and return TRUE or FALSE. |
unif.component |
should a uniform component for outliers be added, as suggested by Cousineau & Chartier (2010)? |
Value
A list.
Author(s)
Andrius Merkys
References
Gerogiannis, D.; Nikou, C. & Likas, A. The mixtures of Student's t-distributions as a robust framework for rigid registration. Image and Vision Computing, Elsevier BV, 2009, 27, 1285–1294 https://www.cs.uoi.gr/~arly/papers/imavis09.pdf
Cousineau, D. & Chartier, S. Outliers detection and treatment: a review. International Journal of Psychological Research, 2010, 3, 58–67 https://revistas.usb.edu.co/index.php/IJPR/article/view/844
Estimate Student's t Mixture parameters using Expectation Maximization.
Description
Estimate an initialization vector for Student's t mixture fitting via Expectation Maximization. Proportions are set to be equal, centers are equispaced through the whole domain of input sample, concentrations and degrees of freedom are set to 1.
Usage
smm_init_vector( x, n )
Arguments
x |
data vector |
n |
number of mixture components |
Value
Parameter vector of 4*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn, ni1, ni2, ..., nin ), where Ai is the proportion of i-th component, mui is the center of i-th component, ki is the concentration of i-th component and nii is the degrees of freedom of i-th component.
Author(s)
Andrius Merkys
Estimate Student's t Mixture parameters using Expectation Maximization.
Description
Estimate an initialization vector for Student's t mixture fitting via Expectation Maximization. R implementation of k-means in kmeans() is used to find data point assignment to clusters. s_fit_primitive() is then used to estimate component parameters for each cluster.
Usage
smm_init_vector_kmeans( x, m )
Arguments
x |
data vector |
m |
number of mixture components |
Value
Parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the location of i-th component, sigmai is the scale of i-th component.
Author(s)
Andrius Merkys
Split a component of Student's t-distribution in two.
Description
Splits a component of Student's t-distribution mixture. Implemented according to Eqns. 30–36 of Chen et al. (2004).
Usage
smm_split_component( p, alpha = 0.5, beta = 0.5, u = 0.5 )
Arguments
p |
vector of Student's t mixture parameters. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn, ni1, ni2, ..., nin ), where n is number of mixture components, Ai is the proportion of i-th component, mui is the center of i-th component, ki is the concentration of i-th component and nii is the degrees of freedom of i-th component. |
alpha |
split proportion for component proportions |
beta |
split proportion for component concentrations |
u |
split proportion for component centers |
Value
Vector of parameters for resulting two-component mixture, whose structure is the same as of input parameter's p.
Author(s)
Andrius Merkys
References
Chen, S.-B. & Luo, B. Robust t-mixture modelling with SMEM algorithm Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826), Institute of Electrical & Electronics Engineers (IEEE), 2004, 6, 3689–3694
Sum of Squared Differences Using Gaussian Mixture Distribution
Description
Given two vectors of same length and a Gaussian mixture, calculate the sum of squared differences (SSD) between the first vector and Gaussian mixture densities measured at points from second vector.
Usage
ssd( x, y, p )
Arguments
x |
data vector |
y |
response vector |
p |
parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the location of i-th component, sigmai is the scale of i-th component. |
Value
Sum of squared differences.
Author(s)
Andrius Merkys
Sum of Squared Differences Using Gaussian Mixture Distribution
Description
Gradient (derivative) function of ssd().
Usage
ssd_gradient( x, y, p )
Arguments
x |
data vector |
y |
response vector |
p |
parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, sigma1, sigma2, ..., sigman ), where Ai is the proportion of i-th component, mui is the location of i-th component, sigmai is the scale of i-th component. |
Value
Gradient values measured at x.
Author(s)
Andrius Merkys
Estimate von Mises Mixture parameters using Expectation Maximization.
Description
Estimates parameters for univariate von Mises mixture using Expectation Maximization algorithm.
Usage
vmm_fit_em( x, p, epsilon = c( 0.000001, 0.000001, 0.000001 ),
debug = FALSE, implementation = "C" )
Arguments
x |
data vector |
p |
initialization vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn ), where Ai is the proportion of i-th component, mui is the center of i-th component and ki is the concentration of i-th component. |
epsilon |
tolerance threshold for convergence. Structure of epsilon is epsilon = c( epsilon_A, epsilon_mu, epsilon_k ), where epsilon_A is threshold for component proportions, epsilon_mu is threshold for component centers and epsilon_k is threshold for component concentrations. |
debug |
flag to turn the debug prints on/off. |
implementation |
flag to switch between C (default) and R implementations. |
Value
Vector of mixture parameters, whose structure is the same as of input parameter's p.
Author(s)
Andrius Merkys
References
Banerjee et al. Expectation Maximization for Clustering on Hyperspheres (2003), manuscript, accessible on: https://web.archive.org/web/20130120061240/http://www.lans.ece.utexas.edu/~abanerjee/papers/05/banerjee05a.pdf
Estimate von Mises Mixture parameters using Expectation Maximization.
Description
Estimates parameters for univariate von Mises mixture using Expectation Maximization algorithm. In this version stopping criterion is the difference between parameters in the subsequent iterations.
Usage
vmm_fit_em_by_diff( x, p, epsilon = c( 0.000001, 0.000001, 0.000001 ),
debug = FALSE, implementation = "C" )
Arguments
x |
data vector |
p |
initialization vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn ), where Ai is the proportion of i-th component, mui is the center of i-th component and ki is the concentration of i-th component. |
epsilon |
tolerance threshold for convergence. Structure of epsilon is epsilon = c( epsilon_A, epsilon_mu, epsilon_k ), where epsilon_A is threshold for component proportions, epsilon_mu is threshold for component centers and epsilon_k is threshold for component concentrations. |
debug |
flag to turn the debug prints on/off. |
implementation |
flag to switch between C (default) and R implementations. |
Value
Vector of mixture parameters, whose structure is the same as of input parameter's p.
Author(s)
Andrius Merkys
References
Banerjee et al. Expectation Maximization for Clustering on Hyperspheres (2003), manuscript, accessible on: https://web.archive.org/web/20130120061240/http://www.lans.ece.utexas.edu/~abanerjee/papers/05/banerjee05a.pdf
Estimate von Mises Mixture parameters using Expectation Maximization.
Description
Estimates parameters for univariate von Mises mixture using Expectation Maximization algorithm. In this version stopping criterion is the difference between log-likelihood estimates of subsequent iterations.
Usage
vmm_fit_em_by_ll( x, p, epsilon = .Machine$double.eps,
debug = FALSE, implementation = "C" )
Arguments
x |
data vector |
p |
initialization vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn ), where Ai is the proportion of i-th component, mui is the center of i-th component and ki is the concentration of i-th component. |
epsilon |
tolerance threshold for convergence |
debug |
flag to turn the debug prints on/off. |
implementation |
flag to switch between C (default) and R implementations. |
Value
Vector of mixture parameters, whose structure is the same as of input parameter's p.
Author(s)
Andrius Merkys
References
Banerjee et al. Expectation Maximization for Clustering on Hyperspheres (2003), manuscript, accessible on: https://web.archive.org/web/20130120061240/http://www.lans.ece.utexas.edu/~abanerjee/papers/05/banerjee05a.pdf
Estimate von Mises Mixture parameters using Expectation Maximization.
Description
Estimate an initialization vector for von Mises mixture fitting via Expectation Maximization. Proportions are set to equal, centers are equispaced through the whole domain of input sample, and concentrations are set to (m/(12*180))^2.
Usage
vmm_init_vector( m, implementation = "C" )
Arguments
m |
number of mixture components |
implementation |
flag to switch between C (default) and R implementations. |
Value
Parameter vector of 3*n parameters, where n is number of mixture components. Structure of p vector is p = c( A1, A2, ..., An, mu1, mu2, ..., mun, k1, k2, ..., kn ), where Ai is the proportion of i-th component, mui is the center of i-th component and ki is the concentration of i-th component.
Author(s)
Andrius Merkys
Calculate Weighted Median.
Description
Calculated weighted median.
Usage
wmedian( x, w, start = 1, end = length( x ) )
Arguments
x |
sample vector |
w |
weights vector |
start |
start index (default: 1) |
end |
end index (default: last index in x) |
Value
Median
Author(s)
Andrius Merkys
References
Users of Wikipedia. Weighted median. https://en.wikipedia.org/w/index.php?title=Weighted_median&oldid=690896947