--- title: "Cellwise Robust Multi-Group Gaussian Mixture Model" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Cellwise Robust Multi-Group Gaussian Mixture Model} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, warning = FALSE, fig.dim = c(7, 4.5), comment = "#>" ) ``` This vignette reproduces the weather example described in Puchhammer, Wilms and Filzmoser (2025). The original data is from Geosphere Austria (2022) and included in this package. ```{r setup} library(ssMRCD) library(ggplot2) library(dplyr) ``` ## Data Preparation The original data from GeoSphere Austria (2022) is pre-cleaned and saved in the data frame object `weatherAUT2021`. Additional information can be found on the helping page. ```{r metadata, eval = FALSE} # get meta data for the data set ? weatherAUT2021 ``` ```{r load data, eval = TRUE} # load the data data("weatherAUT2021") # inspect the data head(weatherAUT2021) # select variables, station names and number of observations data = weatherAUT2021 %>% select(p:rel) stations = weatherAUT2021$name n = dim(data)[1] ``` The predefined groups are based in the underlying geographical landscape consisting of Alpine mountains, hills and flatter areas in Austria. ```{r build groups} # build 5 groups of observations based on spatial proximity and geography cut_lon = c(min(weatherAUT2021$lon)-0.2, 12, 16, max(weatherAUT2021$lon) + 0.2) cut_lat = c(min(weatherAUT2021$lat)-0.2, 48, max(weatherAUT2021$lat) + 0.2) groups = ssMRCD::groups_gridbased(weatherAUT2021$lon, weatherAUT2021$lat, cut_lon, cut_lat) N = length(unique(groups)) table(groups) ``` ```{r run model} # calculate MG-GMM model = cellMGGMM(X = data, groups = groups, nsteps = 100, alpha = 0.5, maxcond = 100) ``` ```{r mixture probabilities} # mixture probabilities cat("Pi (in %):\n") round(model$pi_groups*100, 2) ``` ```{r percentage of outlier} # percentage of outliers cat("% Outliers per group and variable:\n") round(sapply(1:N, function(x) colMeans(1-model$W[groups == x, ]))*100, 2) ``` ```{r residuals} # calculate residuals res = residuals_mggmm(X = data, groups = groups, Sigma = model$Sigma, mu = model$mu, probs = model$probs, W = model$W) ``` ## References GeoSphere Austria (2022): . Puchhammer P., Wilms I. and Filzmoser P. (2025): A smooth multi-group Gaussian Mixture Model for cellwise robust covariance estimation.