--- title: "Using summclust" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{summclust} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(summclust) ``` ## Introduction The `summclust` package allows to compute leverage statistics for clustered errors and fast `CRV3(J)` variance-covariance matrices as described in [MacKinnon, J.G., Nielsen, M.Ø., Webb, M.D., 2022. Leverage, influence, and the jackknife in clustered regression models: Reliable inference using summclust](https://arxiv.org/abs/2205.03288). It is a post-estimation command and currently supports methods for objects of type `lm` (from `stats`) and `fixest` (from the `fixest` package). ## The `summclust` function ```{r example, warning = FALSE, message = FALSE, out.width="50%", out.height="50%"} library(summclust) library(lmtest) library(haven) url <- "http://www.stata-press.com/data/r9/nlswork.dta" if (httr::http_error(url)) { stop("No internet connection or data source broken. Sorry about that!") return(NULL) } else { message("downloading the 'nlswork' dataset.") nlswork <- read_dta(url) } # drop NAs at the moment nlswork <- nlswork[, c("ln_wage", "grade", "age", "birth_yr", "union", "race", "msp", "ind_code")] nlswork <- na.omit(nlswork) lm_fit <- lm( ln_wage ~ as.factor(grade) + as.factor(age) + as.factor(birth_yr) + union + race + msp, data = nlswork) summclust_res <- summclust( obj = lm_fit, cluster = ~ind_code, params = c("msp", "union") ) # CRV3-based inference - exactly matches output of summclust-stata tidy(summclust_res) summary(summclust_res) ``` To visually inspect the leverage statistics, use the `plot` method ```{r, warning = FALSE, message = FALSE, out.width="50%"} plot(summclust_res) ``` ## Using `summclust` with `coefplot` and `fixest` Note that you can also use CVR3 and CRV3J covariance matrices computed via `summclust` or its `vcov_CR3J` method with the `lmtest()` and `fixest` packages. ```{r, warning = FALSE, message=FALSE} library(lmtest) vcov3J <- vcov_CR3J( lm_fit, cluster = ~ ind_code ) all.equal( vcov3J, summclust_res$vcov ) df <- length(summclust_res$cluster) - 1 # with lmtest CRV1 <- lmtest::coeftest(lm_fit, sandwich::vcovCL(lm_fit, ~ind_code), df = df) CRV3 <- lmtest::coeftest(lm_fit, vcov3J, df = df) CRV1[c("union", "race", "msp"),] CRV3[c("union", "race", "msp"),] stats::confint(CRV1)[c("union", "race", "msp"),] stats::confint(CRV3)[c("union", "race", "msp"),] ``` ```{r} library(fixest) feols_fit <- feols( ln_wage ~ i(grade) + i(age) + i(birth_yr) + union + race + msp, data = nlswork ) # Store vcov into the fixest object feols_fit <- summary( feols_fit, vcov = vcov_CR3J(feols_fit, cluster = ~ ind_code) ) # Now it just works with fixest functions fixest::coeftable(feols_fit, keep = c("msp", "union", "race")) ``` The p-value and confidence intervals for `fixest::coeftable()` differ from `lmtest::coeftest()` and `summclust::coeftable()`. This is due to the fact that `fixest::coeftable()` uses a different degree of freedom for the t-distribution used in these calculation (I believe it uses t(N-1)).