--- title: "Latent Class Discriminant Analysis" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Latent Class Discriminant Analysis} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` ## Overview The `lcda` package provides latent class discriminant analysis methods for categorical predictors. The main functions are: - `lcda()` for class-specific latent class models. - `cclcda()` for common-components latent class models. - `cclcda2()` for common-components models with class-conditional mixing weights. All manifest variables and class labels must be integer-coded and start at 1. ## Background The methods in `lcda` implement local discrimination for discrete variables using latent class analysis (LCA). The key idea is to replace a single class-conditional distribution with a finite mixture of locally independent components. This lets each class capture heterogeneity while keeping the model tractable for categorical data. Let `K` be the number of classes, `M` the number of latent components, `D` the number of manifest variables, and `R_d` the number of outcomes for variable `d`. The indicator `x_dr` equals 1 if variable `d` takes outcome `r` and 0 otherwise. ## Models ### LCDA (class-specific mixtures) Each class has its own latent class model: $$ f_k(x) = \sum_{m=1}^{M_k} w_{mk} \prod_{d=1}^D \prod_{r=1}^{R_d} \theta_{mkdr}^{x_{dr}} $$ Classification follows the Bayes decision rule: $$ \hat{k}(x) = \arg\max_k \pi_k f_k(x) $$ ### CCLCDA (common components) Common-components models share the component distributions across classes, while allowing class-specific mixing weights: $$ f_k(x) = \sum_{m=1}^{M} w_{mk} \prod_{d=1}^D \prod_{r=1}^{R_d} \theta_{mdr}^{x_{dr}} $$ `cclcda()` first estimates the shared LCA on the pooled data and then derives class-conditional weights. `cclcda2()` estimates weights and response probabilities jointly in each EM step. ## Estimation and model selection Parameter estimation uses the EM algorithm with random starts (see `nrep`). Model selection can be guided by AIC, BIC, the likelihood ratio statistic (Gsq), and the Pearson chi-square statistic (Chisq). For common-components models, additional quality measures are provided: - Weighted entropy, measuring the purity of latent components. - Weighted Gini, an alternative impurity measure. - A chi-square test of independence between latent components and classes. These are reported in the fitted model objects returned by `cclcda()` and `cclcda2()`. ## Example: CCL-CDA2 on Iris ```{r} library(lcda) data(iris) iris_cat <- within(iris, { Sepal.Length <- as.integer(cut(Sepal.Length, breaks = c(-Inf, 5.1, 5.8, 6.4, Inf))) Sepal.Width <- as.integer(cut(Sepal.Width, breaks = c(-Inf, 2.8, 3.0, 3.3, Inf))) Petal.Length <- as.integer(cut(Petal.Length, breaks = c(-Inf, 1.6, 4.35, 5.1, Inf))) Petal.Width <- as.integer(cut(Petal.Width, breaks = c(-Inf, 0.3, 1.3, 1.8, Inf))) Species3 <- as.integer(Species) }) model <- cclcda2( Species3 ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris_cat, m = 1 ) model$bic ``` ## References Bücker, M., Szepannek, G., Weihs, C. (2010). Local Classification of Discrete Variables by Latent Class Models. In: Locarek-Junge, H., Weihs, C. (eds) Classification as a Tool for Research. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10745-0_13 Bücker, M. (2008). Lokale Diskrimination diskreter Daten. Diplomarbeit, Fakultaet Statistik, TU Dortmund.