--- title: "LeaveOutKSS Overview" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{LeaveOutKSS Overview} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Overview `LeaveOutKSS` is an 'R' translation of the leave-out variance-component workflow for two-way fixed effects models associated with Kline, Saggio, and Solvsten (2020). The package follows the same broad logic described in the repository README and in the original 'MATLAB' vignette: 1. start from worker identifiers, firm identifiers, and an outcome; 2. restrict the sample to a connected mobility graph; 3. prune further to a leave-one-worker-out connected set; 4. optionally partial out controls; 5. compute leverage-based bias adjustments exactly or by Johnson-Lindenstrauss approximation (JLA); 6. report plug-in and bias-corrected variance components. The examples in this package currently rely on the small bundled panel used by the repository's `01_basic_no_controls.R` example. # Abowd, Kramarz, and Margolis (1999; AKM) Setup The target application is the familiar Abowd, Kramarz, and Margolis (1999; AKM)-style model \[ y_{gt} = \alpha_g + \psi_{j(g,t)} + w'_{gt}\delta + \varepsilon_{gt}, \] where `id` indexes workers, `firmid` indexes firms, and `controls` can be used for observed covariates such as year effects. The main quantities of interest are the variance of firm effects, the covariance of worker and firm effects, and the variance of worker effects. The Kline, Saggio, and Solvsten (KSS) correction matters because plug-in variance decompositions treat estimated fixed effects as if they were measured without error. The leave-out approach instead uses observation-specific leverage adjustments to remove the leading bias from these variance-component estimates. # Bundled Example Data ```{r} path <- system.file("extdata", "test.csv", package = "LeaveOutKSS") dt <- data.table::fread(path, header = FALSE) data.table::setorder(dt, V1, V3) dim(dt) head(dt) ``` The bundled file follows the layout used in the repository examples: - column 1: worker identifier - column 2: firm identifier - column 3: year - column 4: outcome Before calling `leave_out_KSS()` or `leave_out_KSS_fe()`, sort the panel by worker identifier and, within worker, from earlier to later time periods. # Main Workflow The basic decomposition is performed by `leave_out_KSS()`. ```{r eval = FALSE} res <- leave_out_KSS( y = dt[[4]], id = dt[[1]], firmid = dt[[2]], leave_out_level = "matches", type_algorithm = "JLA", simulations_JLA = 5, paral = FALSE, progress = FALSE ) print(res) res$estimates$table ``` The routine returns an object whose main elements are: - `res$estimates$table`: biased and bias-corrected decomposition estimates - `res$effects`: estimated worker and firm effects in the original identifier space If you want files, you can export them explicitly: ```{r eval = FALSE} stem <- tempfile("leaveoutkss_") leave_out_KSS( y = dt[[4]], id = dt[[1]], firmid = dt[[2]], simulations_JLA = 5, paral = FALSE, csv_file = paste0(stem, ".csv"), txt_file = paste0(stem, ".txt"), progress = FALSE ) unlink(paste0(stem, c(".csv", ".txt"))) ``` # Controls The original vignette emphasizes that controls are handled by partialling them out in the leave-out connected set and then running the decomposition on the residualized outcome. In R, one way to do this is to pass a control matrix directly. ```{r eval = FALSE} controls <- model.matrix(~ factor(dt[[3]]) - 1) controls <- controls[, -ncol(controls), drop = FALSE] leave_out_KSS( y = dt[[4]], id = dt[[1]], firmid = dt[[2]], controls = controls, simulations_JLA = 5, paral = FALSE, progress = FALSE ) ``` If a control is more naturally supplied as a coded categorical variable, `leave_out_KSS_fe()` can expand selected columns internally: ```{r eval = FALSE} leave_out_KSS_fe( y = dt[[4]], id = dt[[1]], firmid = dt[[2]], controls = cbind(year = dt[[3]]), absorb_col = 1, simulations_JLA = 5, paral = FALSE, progress = FALSE ) ``` # Leaving Out Matches or Observations The default `leave_out_level = "matches"` follows the discussion in the original vignette: it is intended to be robust to unrestricted heteroskedasticity and serial correlation within worker-firm matches. Setting `leave_out_level = "obs"` switches the correction to leaving out single person-year observations instead. ```{r eval = FALSE} leave_out_KSS( y = dt[[4]], id = dt[[1]], firmid = dt[[2]], leave_out_level = "obs", simulations_JLA = 5, paral = FALSE, progress = FALSE ) ``` # Regressing Firm Effects on Observables The 'MATLAB' vignette also discusses linear projections of estimated firm effects on observables. In this package, that workflow is exposed through the `lincom_do`, `Z_lincom`, and `labels_lincom` arguments of `leave_out_KSS()`, which call `lincom_KSS()` internally. ```{r eval = FALSE} region_dummy <- as.numeric(dt[[3]] <= median(dt[[3]], na.rm = TRUE)) leave_out_KSS( y = dt[[4]], id = dt[[1]], firmid = dt[[2]], simulations_JLA = 5, paral = FALSE, lincom_do = 1, Z_lincom = region_dummy, labels_lincom = list("Early-Year Indicator"), progress = FALSE ) ``` # R-Squared Companion `rsquared_comp()` compares the fit of the standard two-way fixed effects design with a saturated worker-firm interaction model. ```{r eval = FALSE} rsquared_comp( y = dt[[4]], id = dt[[1]], firmid = dt[[2]], progress = FALSE ) ``` # Notes on Current Scope At this stage, package documentation and examples intentionally rely on the small bundled dataset rather than the large-data workflow from the repository's `04_large_no_controls.R`. The computational shortcuts for large datasets are still reflected in the application programming interface (API), especially the Johnson-Lindenstrauss approximation (JLA)-based leverage routines, but the documentation examples focus on the small reproducible panel. # References Abowd, J. M., Kramarz, F., and Margolis, D. N. (1999). High wage workers and high wage firms. *Econometrica*, 67(2), 251-333. Kline, P., Saggio, R., and Solvsten, M. (2020). Leave-out estimation of variance components. *Econometrica*, 88(5), 1859-1898.