--- title: "Robust standard errors with parglm and the sandwich package and regression tables with gtsummary" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Robust standard errors with parglm and the sandwich package and regression tables with gtsummary} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = requireNamespace("sandwich", quietly = TRUE) && requireNamespace("lmtest", quietly = TRUE) ) ``` Since `parglm()` returns a standard `glm` object, it works directly with the [`sandwich`](https://cran.r-project.org/package=sandwich) package for heteroskedasticity-consistent (HC) and cluster-robust standard errors. The [`lmtest`](https://cran.r-project.org/package=lmtest) package provides `coeftest()` for displaying results with alternative covariance matrices. ## Setup ```{r setup, message = FALSE} library(parglm) library(sandwich) library(lmtest) ``` We simulate a Poisson dataset with 20 clusters of 10 observations each, where a cluster-level random effect induces within-cluster correlation. ```{r simulate} set.seed(1) n <- 200 cluster_id <- rep(1:20, each = 10) x1 <- rnorm(n) x2 <- rnorm(n) u <- rep(rnorm(20, sd = 0.5), each = 10) # cluster random effect y <- rpois(n, exp(0.5 + 0.3 * x1 - 0.2 * x2 + u)) dat <- data.frame(y = y, x1 = x1, x2 = x2, cluster = cluster_id) ``` ## Fitting the model ```{r fit} fit <- parglm(y ~ x1 + x2, data = dat, family = poisson(), control = parglm.control(nthreads = 1L)) ``` ## Standard errors The default model-based standard errors assume the Poisson variance equals the mean. They will be too small here because the cluster random effects induce overdispersion. ```{r model-based} coeftest(fit) ``` ## Heteroskedasticity-consistent (HC) standard errors `vcovHC()` computes sandwich standard errors that are robust to misspecification of the variance function. HC3 (the default) is recommended for small to moderate samples. ```{r hc} coeftest(fit, vcov = vcovHC) ``` ## Cluster-robust standard errors `vcovCL()` accounts for within-cluster correlation, which is the appropriate correction here. ```{r cluster} coeftest(fit, vcov = vcovCL, cluster = ~cluster) ``` As expected, the cluster-robust standard errors are larger than the model-based ones, reflecting the extra variability due to the cluster random effects. ## Covariance matrices directly The covariance matrices themselves are also available: ```{r vcov} vcovHC(fit, type = "HC3") vcovCL(fit, cluster = ~cluster) ``` ### Note `model = TRUE` (the default in `parglm`) must be set so that the model frame is stored, allowing `sandwich` to reconstruct the design matrix internally. ## Regression tables with gtsummary The [`gtsummary`](https://cran.r-project.org/package=gtsummary) package produces publication-ready regression tables from model objects. `parglm()` models are supported directly because they inherit from `glm`. ```{r gtsummary-setup, eval = requireNamespace("gtsummary", quietly = TRUE) && requireNamespace("broom", quietly = TRUE) && requireNamespace("broom.helpers", quietly = TRUE), message = FALSE} library(gtsummary) ``` ### Logistic regression We fit a logistic regression model with `parglm()` and pass it to `tbl_regression()`. Setting `exponentiate = TRUE` displays odds ratios with their confidence intervals. ```{r gtsummary-logistic, eval = requireNamespace("gtsummary", quietly = TRUE) && requireNamespace("broom", quietly = TRUE) && requireNamespace("broom.helpers", quietly = TRUE)} set.seed(2) n2 <- 300 x1_b <- rnorm(n2) x2_b <- rnorm(n2) y_bin <- rbinom(n2, 1, plogis(0.4 + 0.6 * x1_b - 0.4 * x2_b)) dat_b <- data.frame(y = y_bin, x1 = x1_b, x2 = x2_b) fit_logistic <- parglm(y ~ x1 + x2, data = dat_b, family = binomial(), control = parglm.control(nthreads = 1L)) suppressWarnings( tbl_regression(fit_logistic, exponentiate = TRUE) ) ``` ### Robust standard errors in a gtsummary table `tidy_parglm_robust()` is a drop-in `tidy_fun` for `tbl_regression()` that replaces the default model-based standard errors with sandwich estimates. Here we use cluster-robust standard errors for the Poisson model fitted earlier. ```{r gtsummary-robust, eval = requireNamespace("gtsummary", quietly = TRUE) && requireNamespace("broom", quietly = TRUE) && requireNamespace("broom.helpers", quietly = TRUE)} tbl_regression(fit, tidy_fun = tidy_parglm_robust) ```