--- title: "Introduction to the dcce Package: DCCE Estimation for Panel Data" author: "Mustapha Wasseja" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 number_sections: true vignette: > %\VignetteIndexEntry{Introduction to the dcce Package: DCCE Estimation for Panel Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib link-citations: true --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5, out.width = "100%" ) ``` # Introduction Macro-panel datasets---large panels of countries, regions, or industries observed over many time periods---are increasingly central to empirical economic research. A pervasive feature of such panels is **cross-sectional dependence (CSD)**: the residuals of unit $i$ are correlated with the residuals of unit $j$ even after controlling for observable explanatory variables. Ignoring CSD causes standard panel estimators to produce inconsistent estimates and badly sized tests. The `dcce` package implements the family of **Common Correlated Effects (CCE)** estimators introduced by @pesaran2006estimation and extended to dynamic panels by @chudik2015common. These estimators filter out unobserved common factors---the source of CSD---by augmenting each unit's regression with cross-sectional averages (CSAs) of the dependent variable and the regressors. The package further implements long-run estimators (CS-ARDL, CS-DL), an extensive CD test suite, the exponent of cross-sectional dependence, and information criteria for CSA lag selection. **Key features:** - Mean Group (MG), CCE-MG, Dynamic CCE (DCCE), Regularized CCE (rCCE) - Long-run estimators: CS-ARDL (@chudik2016long), CS-DL, Pooled Mean Group (PMG, @shin1999ardl) - CD test suite: Pesaran CD (@pesaran2015testing), CDw (@juodis2022), PEA (@fan2015power), CD* (@pesaran2021cdstar) - Exponent of cross-sectional dependence (@bailey2016exponent, @bailey2019exponent) - Information criteria for CSA selection (@margaritella2023information) - Rank condition classifier (@devos2024rank) - Cross-section and wild bootstrap inference - `broom`-compatible `tidy()` and `glance()` methods ```{r load-package} library(dcce) library(generics) ``` # The Problem: Cross-Sectional Dependence ## What is cross-sectional dependence? In a panel with $N$ units and $T$ time periods, suppose the data generating process is: $$ y_{it} = \alpha_i + \boldsymbol{\beta}_i' \mathbf{x}_{it} + \varepsilon_{it}, $$ where $\varepsilon_{it}$ is the error term. Standard panel estimators assume $\mathbb{E}[\varepsilon_{it}\varepsilon_{jt}] = 0$ for $i \neq j$. In macro panels, this assumption is routinely violated because units share common shocks (global business cycles, oil price shocks, financial crises) and are connected via trade, capital flows, and technology spillovers. ## Consequences of ignoring CSD When the errors contain unobserved common factors, $$ \varepsilon_{it} = \boldsymbol{\lambda}_i' \mathbf{f}_t + u_{it}, $$ standard fixed-effects and first-differencing estimators are inconsistent if the regressors are also driven by $\mathbf{f}_t$---a situation called *strong* CSD. In that case, $\mathbf{x}_{it}$ and $\varepsilon_{it}$ share a common component ($\boldsymbol{\lambda}_i' \mathbf{f}_t$), inducing endogeneity. @pesaran2006estimation shows the resulting bias can be substantial. ## Testing for CSD: the `pcd_test()` function Before estimating, we should test for CSD. The Pesaran [-@pesaran2015testing] CD statistic is the benchmark: $$ \text{CD} = \sqrt{\frac{2T}{N(N-1)}} \sum_{i=1}^{N-1} \sum_{j=i+1}^{N} \sqrt{T_{ij}}\, \hat{\rho}_{ij} \xrightarrow{d} \mathcal{N}(0,1), $$ where $\hat{\rho}_{ij}$ is the sample correlation of the residuals of units $i$ and $j$. Under the null of no CSD, $\text{CD} \to \mathcal{N}(0,1)$ as $N, T \to \infty$. We illustrate with the `pwt8` dataset: 93 countries, 1960--2007, adapted from @ditzen2018estimating. ```{r pwt8-load} data(pwt8, package = "dcce") str(pwt8) ``` Run a naive pooled OLS of GDP growth on factor inputs, ignoring CSD: ```{r cd-ols} # Pooled OLS on complete cases pwt_cc <- na.omit(pwt8[, c("country", "year", "d_log_rgdpo", "log_hc", "log_ck", "log_ngd")]) fit_ols <- lm(d_log_rgdpo ~ log_hc + log_ck + log_ngd, data = pwt_cc) # CD test on OLS residuals cd_ols <- pcd_test( residuals(fit_ols), data = pwt_cc, unit_index = "country", time_index = "year", test = "pesaran" ) print(cd_ols) ``` The CD statistic is far above the critical value ($\approx 1.96$), decisively rejecting the null of no cross-sectional dependence. Ignoring this leads to spurious inference. # The Model: Heterogeneous Panels with Common Factors ## The general DGP `dcce` works with the **heterogeneous panel model** with unobserved common factors: $$ y_{it} = \alpha_i + \boldsymbol{\beta}_i' \mathbf{x}_{it} + \boldsymbol{\lambda}_i' \mathbf{f}_t + u_{it}, \quad i = 1,\ldots,N, \quad t = 1,\ldots,T, $$ where: - $\boldsymbol{\beta}_i$ are **unit-specific** slope coefficients (heterogeneous); - $\mathbf{f}_t$ is a $(m \times 1)$ vector of **unobserved common factors**; - $\boldsymbol{\lambda}_i$ are unit-specific factor loadings; - $u_{it}$ is an idiosyncratic error with $\mathbb{E}[u_{it} u_{jt}] = 0$ for $i \neq j$. The regressors follow: $$ \mathbf{x}_{it} = \mathbf{A}_i' \mathbf{g}_t + \boldsymbol{\Gamma}_i' \mathbf{f}_t + \mathbf{v}_{it}, $$ so the common factors $\mathbf{f}_t$ drive both $y_{it}$ and $\mathbf{x}_{it}$, inducing CSD and endogeneity. ## The CCE approach @pesaran2006estimation shows that the unobserved $\mathbf{f}_t$ can be asymptotically spanned by the cross-sectional averages: $$ \bar{y}_t = N^{-1}\sum_i y_{it}, \quad \bar{\mathbf{x}}_t = N^{-1}\sum_i \mathbf{x}_{it}. $$ These **CSAs** are observable proxies for the common factors. Augmenting each unit's regression with the CSAs (and their lags, for dynamic panels) removes the common factor component and restores consistency. ## Dynamic extension (DCCE) For dynamic panels where $y_{i,t-1}$ appears as a regressor, @chudik2015common show that the CCE estimator requires CSA lags to remain consistent. The recommended number of lags is $p_T = \lfloor T^{1/3} \rfloor$ (roughly 3 for $T \approx 48$). This gives the **DCCE** estimator. # Estimators ## Mean Group (MG) The **Mean Group** estimator of @pesaran1995estimating runs OLS separately for each unit $i$: $$ \hat{\boldsymbol{\beta}}_i^{\text{OLS}} = (\mathbf{X}_i' \mathbf{X}_i)^{-1} \mathbf{X}_i' \mathbf{y}_i, $$ and averages: $$ \hat{\boldsymbol{\beta}}^{\text{MG}} = N^{-1} \sum_{i=1}^N \hat{\boldsymbol{\beta}}_i. $$ MG is consistent for $N, T \to \infty$ under heterogeneous slopes, but inconsistent when CSD is present via common factors. It serves as a useful benchmark. ```{r mg-pwt8} fit_mg <- dcce( data = pwt8, unit_index = "country", time_index = "year", formula = d_log_rgdpo ~ L(log_rgdpo, 1) + log_hc + log_ck + log_ngd, model = "mg", cross_section_vars = NULL ) print(fit_mg) ``` The `L(log_rgdpo, 1)` term is the first lag of log GDP---its MG coefficient is $({\hat\rho} - 1)$, so a negative value indicates mean-reverting dynamics. ## CCE Mean Group (CCE-MG) The **CCE-MG** estimator augments each unit's regression with contemporaneous CSAs: $$ y_{it} = \alpha_i + \boldsymbol{\beta}_i' \mathbf{x}_{it} + \boldsymbol{\delta}_i' \bar{\mathbf{z}}_t + u_{it}, $$ where $\bar{\mathbf{z}}_t = (\bar{y}_t, \bar{\mathbf{x}}_t')'$. The coefficients on the CSAs absorb the unobserved factors; the structural parameters $\hat{\boldsymbol{\beta}}_i$ are then averaged over units. @pesaran2006estimation establishes consistency and asymptotic normality of the CCE-MG estimator under mild conditions. ```{r cce-pwt8} fit_cce <- dcce( data = pwt8, unit_index = "country", time_index = "year", formula = d_log_rgdpo ~ L(log_rgdpo, 1) + log_hc + log_ck + log_ngd, model = "cce", cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd, cross_section_lags = 0 ) print(fit_cce) ``` Comparing MG and CCE-MG shows the effect of correcting for CSD: the coefficients shift, reflecting the removal of common-factor-driven bias. ## Dynamic CCE (DCCE) When the lagged dependent variable is included, @chudik2015common require CSA lags to achieve asymptotically unbiased estimates. The DCCE estimator augments each unit's regression with $\bar{\mathbf{z}}_t, \bar{\mathbf{z}}_{t-1}, \ldots, \bar{\mathbf{z}}_{t-p_T}$. ```{r dcce-pwt8} fit_dcce <- dcce( data = pwt8, unit_index = "country", time_index = "year", formula = d_log_rgdpo ~ L(log_rgdpo, 1) + log_hc + log_ck + log_ngd, model = "dcce", cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd, cross_section_lags = 3 ) print(fit_dcce) ``` The coefficient on `L(log_rgdpo, 1)` is the speed of adjustment in the error correction framework. A value of approximately $-0.6$ implies $\hat\rho \approx 0.4$, consistent with mean-reverting dynamics in log GDP. ### Verify that DCCE removes CSD A key diagnostic is to apply the CD test to the DCCE residuals. If the estimator has successfully absorbed the common factors, the residuals should no longer exhibit significant CSD: ```{r dcce-cd-test} cd_dcce <- pcd_test(fit_dcce, test = "pesaran") print(cd_dcce) ``` After DCCE correction with three CSA lags, the CD statistic falls to an insignificant level, confirming that the common factors have been filtered out. ## Regularized CCE (rCCE) The **regularized CCE** of @juodis2022regularized addresses the case where the number of CSA variables is large relative to $T$. It replaces the full CSA matrix with its first few principal components, regularizing the factor proxy: ```{r rcce-pwt8} fit_rcce <- dcce( data = pwt8, unit_index = "country", time_index = "year", formula = d_log_rgdpo ~ L(log_rgdpo, 1) + log_hc + log_ck + log_ngd, model = "rcce", cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd, cross_section_lags = 0 ) print(fit_rcce) ``` ## Long-Run Estimators: CS-DL, CS-ARDL, and PMG For researchers interested in long-run relationships, `dcce` implements three estimators. All three produce a three-block print output: **short-run** coefficients, the **adjustment** (speed of adjustment to equilibrium), and **long-run** coefficients. ### CS-DL (Cross-Sectionally Augmented Distributed Lag) The **CS-DL** estimator of @chudik2016long uses \(\Delta y_{it}\) as the dependent variable and regresses it on the **level** of \(x_{it}\) (whose coefficient is the long-run effect) plus the contemporaneous and lagged first differences of \(x\) (as short-run controls) plus CSAs: $$ \Delta y_{it} = \alpha_i + w'_i x_{it} + \sum_{\ell=0}^{p_x} \phi'_{i\ell} \Delta x_{i,t-\ell} + \delta'_i \bar{\mathbf{z}}_t + u_{it}. $$ `dcce` **automatically** constructs \(\Delta y\) as the LHS and adds \(\Delta x, \Delta x_{-1}, \ldots, \Delta x_{-p_x}\) when `model = "csdl"`. You just provide the levels formula and set `csdl_xlags`: ```{r csdl-pwt8} fit_csdl <- dcce( data = pwt8, unit_index = "country", time_index = "year", formula = log_rgdpo ~ log_ck + log_hc + log_ngd, model = "csdl", cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd, cross_section_lags = 3, csdl_xlags = 3 ) print(fit_csdl) ``` The coefficients on `log_ck`, `log_hc`, and `log_ngd` in the **Long-run Estimates** block are the long-run elasticities of GDP. ### CS-ARDL (Cross-Sectionally Augmented ARDL) The **CS-ARDL** estimator fits an ARDL\((p_y, p_x)\) model with CSAs per unit and recovers long-run coefficients and the speed of adjustment via the delta method: $$ y_{it} = \alpha_i + \sum_{p=1}^{P_y} \phi_{ip} y_{i,t-p} + \sum_{q=0}^{P_x} \beta'_{iq} x_{i,t-q} + \delta'_i \bar{\mathbf{z}}_t + e_{it}, $$ with long-run coefficients \(\theta_{ik} = \frac{\sum_q \beta_{ikq}}{1 - \sum_p \phi_{ip}}\) and adjustment speed \(\varphi_i = -(1 - \sum_p \phi_{ip})\). Users specify the dynamic structure explicitly using `L()` in the formula: ```{r csardl-pwt8} fit_csardl <- dcce( data = pwt8, unit_index = "country", time_index = "year", formula = log_rgdpo ~ L(log_rgdpo, 1) + log_hc + L(log_hc, 1) + log_ck + L(log_ck, 1) + log_ngd + L(log_ngd, 1), model = "csardl", cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd, cross_section_lags = 3 ) print(fit_csardl) ``` The output shows three blocks: 1. **Short-run** coefficients from the ARDL(1,1) regression (unit-level OLS, MG average) 2. **Adjustment term** \(\varphi \approx -0.69\), meaning about 69% of the deviation from equilibrium is corrected each period 3. **Long-run** elasticities recovered via the delta method, with MG standard errors ### PMG (Pooled Mean Group) The **PMG** estimator of @shin1999ardl imposes **common long-run coefficients** across units while keeping adjustment and short-run dynamics heterogeneous. `dcce` implements PMG via inverse-variance weighted pooling of the unit-level CS-ARDL long-run estimates — a fast, consistent alternative to the Pesaran-Shin-Smith concentrated-MLE that avoids numerical optimization. ```{r pmg-pwt8} fit_pmg <- dcce( data = pwt8, unit_index = "country", time_index = "year", formula = log_rgdpo ~ L(log_rgdpo, 1) + log_hc + L(log_hc, 1) + log_ck + L(log_ck, 1) + log_ngd + L(log_ngd, 1), model = "pmg", cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd, cross_section_lags = 3 ) print(fit_pmg) ``` Compared to CS-ARDL, PMG produces **tighter** standard errors on the long-run coefficients because it exploits the homogeneity restriction. ## Comparing All Estimators ```{r compare-estimators} models <- list( MG = fit_mg, CCE = fit_cce, DCCE = fit_dcce, CS_DL = fit_csdl, CS_ARDL = fit_csardl ) # Extract coefficient on physical capital (log_ck) across models key_var <- "log_ck" compare <- data.frame( estimator = names(models), estimate = vapply(models, function(m) { b <- coef(m) if (key_var %in% names(b)) unname(b[key_var]) else NA_real_ }, numeric(1)), std_error = vapply(models, function(m) { se <- m$se if (key_var %in% names(se)) unname(se[key_var]) else NA_real_ }, numeric(1)) ) compare$t_stat <- compare$estimate / compare$std_error print(compare, digits = 4) ``` **Note:** MG, CCE, and DCCE estimate a **short-run** coefficient on `log_ck` within a dynamic error-correction specification (with `L(log_rgdpo,1)` on the RHS). CS-DL and CS-ARDL target the **long-run** elasticity of GDP with respect to capital, using a levels-in-first-difference specification. Direct comparisons across the two groups should be interpreted with care. # The CD Test Suite `dcce` implements four CD tests, each addressing different aspects of the CSD problem. | Test | Reference | Description | |------|-----------|-------------| | `pesaran` | @pesaran2015testing | Benchmark; N(0,1) under H0 | | `cdw` | @juodis2022 | Rademacher-weighted; robust to incidental parameters | | `cdwplus` | Baltagi, Feng & Kao (2012) | Bias-adjusted LM with random sign weighting | | `pea` | @fan2015power | Power-enhanced; better against sparse alternatives | | `cdstar` | @pesaran2021cdstar | Bias-corrected for panels with strong factors | ```{r cd-all-tests} # Run all five CD tests on DCCE residuals set.seed(42) cd_all <- pcd_test(fit_dcce, test = c("pesaran", "cdw", "cdwplus", "pea", "cdstar"), n_reps = 199) print(cd_all) ``` **Interpretation:** The four tests have different power properties: - **Pesaran CD** is the workhorse benchmark. Insignificance ($|CD| < 1.96$) indicates that the estimator has removed the dominant common factor(s). - **CDw** (Juodis-Reese) corrects for incidental parameter bias in dynamic panels; a value near zero confirms no systematic residual dependence. - **PEA** is power-enhanced against **sparse** alternatives (a few strongly correlated pairs). It can remain significant even when the average pairwise correlation is near zero, so significance here need not contradict the Pesaran CD result. - **CD\*** applies a PCA-based bias correction for panels with strong factors; borderline significance reflects the strength of the factor structure in the data. The key diagnostic is the **Pesaran CD**: after DCCE with three lags, a value close to zero (well below $\pm 1.96$) confirms that the common factors have been successfully absorbed. # Bootstrap Inference Asymptotic standard errors for DCCE rest on the central limit theorem approximation $\hat{\boldsymbol{\beta}}^{\text{MG}} \approx \mathcal{N}(\boldsymbol{\beta}, V/N)$. In panels with small $N$, bootstrap confidence intervals are preferable. `dcce` provides a `bootstrap()` function with two schemes: - **Cross-section bootstrap** (default): resamples units with replacement, preserving time-series dynamics within each unit. - **Wild bootstrap**: multiplies residuals by Rademacher or Mammen weights to preserve heteroskedasticity. ```{r bootstrap, eval=FALSE} # Cross-section bootstrap (not evaluated in vignette to limit build time) set.seed(42) boot_res <- bootstrap(fit_dcce, type = "crosssection", reps = 499) print(boot_res) ``` For a quick illustration on the simulated dataset (smaller $N$ and $T$): ```{r bootstrap-sim} data(dcce_sim, package = "dcce") fit_sim <- dcce( data = dcce_sim, unit_index = "unit", time_index = "time", formula = y ~ L(y, 1) + x, model = "dcce", cross_section_vars = ~ y + x, cross_section_lags = 3 ) set.seed(42) boot_sim <- bootstrap(fit_sim, type = "crosssection", reps = 199) print(boot_sim) ``` # Worked Example: Ditzen (2018) Growth Regression We replicate the core results from @ditzen2018estimating, Section 7, using the `pwt8` dataset (Penn World Tables 8, N = 93 countries, T = 48 years, 1960--2007). The model is a Solow-type growth regression: $$ \Delta \log y_{it} = \alpha_i + \phi_i \log y_{i,t-1} + \beta_{1i} \log h_{it} + \beta_{2i} \log k_{it} + \beta_{3i} \log(n_{it} + g + \delta) + \text{CSD correction} + u_{it}, $$ where $y$ is real GDP (output-side), $h$ is the human capital index, $k$ is the physical capital stock, $n+g+\delta$ aggregates population growth and depreciation, and $\phi_i = \rho_i - 1$ is the speed of adjustment. ## Step 1: Detect CSD in naive residuals ```{r ex1-ols-cd} dat <- na.omit(pwt8[, c("country", "year", "d_log_rgdpo", "log_rgdpo", "log_hc", "log_ck", "log_ngd")]) fit_ols2 <- lm(d_log_rgdpo ~ log_hc + log_ck + log_ngd + log_rgdpo, data = dat) cd_naive <- pcd_test(residuals(fit_ols2), data = dat, unit_index = "country", time_index = "year", test = "pesaran") cat(sprintf("CD statistic (OLS): %.2f p-value: %.4f\n", cd_naive$statistics$statistic[1], cd_naive$statistics$p_value[1])) ``` **Result:** The CD statistic is large and significant, confirming strong CSD. ## Step 2: Mean Group (no CSD correction) ```{r ex2-mg} fit_mg2 <- dcce( data = pwt8, unit_index = "country", time_index = "year", formula = d_log_rgdpo ~ L(log_rgdpo, 1) + log_hc + log_ck + log_ngd, model = "mg", cross_section_vars = NULL ) cd_mg <- pcd_test(fit_mg2, test = "pesaran") cat(sprintf("MG: phi = %.3f, log_ck = %.3f\n", coef(fit_mg2)["L(log_rgdpo,1)"], coef(fit_mg2)["log_ck"])) cat(sprintf("CD after MG: %.2f p-value: %.4f\n", cd_mg$statistics$statistic[1], cd_mg$statistics$p_value[1])) ``` MG residuals still exhibit significant CSD because MG does not correct for common factors. ## Step 3: Dynamic CCE with 3 CSA lags (Ditzen 2018 Example 7.3) ```{r ex3-dcce} fit_dcce2 <- dcce( data = pwt8, unit_index = "country", time_index = "year", formula = d_log_rgdpo ~ L(log_rgdpo, 1) + log_hc + log_ck + log_ngd, model = "dcce", cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd, cross_section_lags = 3 ) cd_dcce2 <- pcd_test(fit_dcce2, test = "pesaran") cat(sprintf("DCCE: phi = %.3f, log_ck = %.3f\n", coef(fit_dcce2)["L(log_rgdpo,1)"], coef(fit_dcce2)["log_ck"])) cat(sprintf("CD after DCCE(3): %.2f p-value: %.4f\n", cd_dcce2$statistics$statistic[1], cd_dcce2$statistics$p_value[1])) ``` After applying DCCE with three CSA lags, the CD statistic is no longer significant, indicating that the estimator has successfully absorbed the common factors. ## Step 4: Summary table ```{r ex4-summary-table} # Coefficient table comparing MG and DCCE vars <- c("L(log_rgdpo,1)", "log_hc", "log_ck", "log_ngd") tab <- data.frame( Variable = vars, MG_coef = round(coef(fit_mg2)[vars], 3), MG_se = round(fit_mg2$se[vars], 3), DCCE_coef = round(coef(fit_dcce2)[vars], 3), DCCE_se = round(fit_dcce2$se[vars], 3), row.names = NULL ) tab$MG_sig <- ifelse(abs(coef(fit_mg2)[vars] / fit_mg2$se[vars]) > 1.96, "*", "") tab$DCCE_sig <- ifelse(abs(coef(fit_dcce2)[vars] / fit_dcce2$se[vars]) > 1.96, "*", "") print(tab) ``` **Interpretation:** - The **speed of adjustment** ($\hat\phi$) is negative and significant, confirming mean-reverting dynamics in log GDP. The DCCE estimate ($\approx -0.6$) implies $\hat\rho \approx 0.4$, plausible for this sample. - **Physical capital** (`log_ck`) has a positive, significant effect in both specifications, consistent with the Solow model. - **Population growth** (`log_ngd`) is negative, also in line with the Solow model. - The shift in coefficient magnitudes from MG to DCCE reflects the bias from CSD in the MG estimator. ## Step 5: Unit-level heterogeneity One advantage of the Mean Group approach is access to unit-level estimates. We can examine the distribution of the speed-of-adjustment across countries: ```{r ex5-unit-coefs, fig.alt="Histogram of unit-level speed of adjustment coefficients"} b_unit <- coef(fit_dcce2, type = "unit") ar_coefs <- b_unit$estimate[b_unit$term == "L(log_rgdpo,1)"] rho_implied <- ar_coefs + 1 # rho = (phi + 1) cat(sprintf("Implied rho: mean = %.3f, median = %.3f, [min, max] = [%.3f, %.3f]\n", mean(rho_implied), median(rho_implied), min(rho_implied), max(rho_implied))) cat(sprintf("Fraction with 0 < rho < 1 (stationarity): %.1f%%\n", 100 * mean(rho_implied > 0 & rho_implied < 1))) hist( rho_implied, breaks = 20, main = "Unit-level AR(1) coefficient (rho = phi + 1)", xlab = expression(hat(rho)[i]), col = "steelblue", border = "white" ) abline(v = median(rho_implied), col = "firebrick", lwd = 2, lty = 2) legend("topright", legend = "Median", col = "firebrick", lty = 2, lwd = 2, bty = "n") ``` The distribution is concentrated in $(0, 1)$, confirming stationary dynamics for most countries, with the median $\hat\rho \approx 0.4$. # Auxiliary Diagnostics ## Exponent of cross-sectional dependence The **CSD exponent** $\alpha$ of @bailey2016exponent characterizes the strength of dependence. For $\alpha = 1$ the panel exhibits strong CSD; for $\alpha < 1/2$ the CSD is weak and standard estimators remain consistent. ```{r csd-exp} # CSD exponent of log real GDP (levels) across countries data(pwt8) X_lev <- do.call(rbind, by(pwt8, pwt8$country, function(d) d$log_rgdpo)) # Keep only time periods with no missing values X_lev <- X_lev[, colSums(is.na(X_lev)) == 0, drop = FALSE] alpha_res <- csd_exp(X_lev, use_residuals = FALSE) print(alpha_res) ``` An estimate of $\hat\alpha \approx 0.84$ is close to 1, signalling strong CSD in log real GDP---further motivation for using DCCE. For first-differenced data (growth rates) the exponent is typically much lower because differencing strips out the common trend. ## Information criterion for CSA lag selection @margaritella2023information propose IC and PC criteria for selecting the number of CSA variables in **static** CCE models. We compare static CCE models with 0--3 contemporaneous CSA variables (via varying the number of regressors included in `cross_section_vars`). For **dynamic** panels, the Chudik-Pesaran rule $p_T = \lfloor T^{1/3} \rfloor$ is the standard choice for CSA lags. ```{r ic-selection} # Compare static CCE at lag 0 across different CSA variable sets # (IC criteria apply to the static CCE case) lags_to_try <- 0:3 models_ic <- lapply(lags_to_try, function(p) { dcce( data = pwt8, unit_index = "country", time_index = "year", formula = d_log_rgdpo ~ L(log_rgdpo, 1) + log_hc + log_ck + log_ngd, model = "cce", cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd, cross_section_lags = p ) }) ic_values <- sapply(models_ic, function(m) { res <- ic(m, models = models_ic) c(IC1 = res$IC1, IC2 = res$IC2) }) ic_table <- data.frame(lags = lags_to_try, t(ic_values)) print(ic_table, digits = 5) cat(sprintf("IC1 selects %d CSA lag(s)\n", lags_to_try[which.min(ic_table$IC1)])) cat(sprintf("IC2 selects %d CSA lag(s)\n", lags_to_try[which.min(ic_table$IC2)])) ``` A value of 0 selected by IC indicates that contemporaneous CSAs suffice for the static CCE specification---consistent with the Pesaran (2006) theoretical recommendation. For the dynamic DCCE application above, we use 3 lags as prescribed by the Chudik-Pesaran $p_T = \lfloor T^{1/3} \rfloor \approx 3$ rule. ## Rank condition The **rank condition** of @devos2024rank checks whether the CSA variables span the factor space, a necessary condition for CCE consistency: ```{r rank-condition} rc <- rank_condition(fit_cce) cat(sprintf("Estimated factors (m): %d\n", rc$m)) cat(sprintf("Rank of avg loadings (g): %d\n", rc$g)) cat(sprintf("Rank condition (RC = 1 means satisfied): %d\n", rc$RC)) ``` ## Panel unit root test: Pesaran CIPS The **CIPS** test of @pesaran2007simple is the workhorse panel unit root test under cross-sectional dependence. It augments unit-level Dickey-Fuller regressions with cross-sectional averages and averages the resulting CADF t-statistics. ```{r cips-test} X_lev <- do.call(rbind, by(pwt8, pwt8$country, function(d) d$log_rgdpo)) X_lev <- X_lev[, colSums(is.na(X_lev)) == 0, drop = FALSE] cips_res <- cips_test(X_lev, lags = 1, trend = FALSE) print(cips_res) ``` A CIPS statistic below the 5% critical value of \(-2.27\) rejects the null of a panel unit root. ## Slope heterogeneity: Swamy and Pesaran-Yamagata Before deciding between MG and a pooled estimator, you should formally test whether slopes are homogeneous. The **Swamy (1970)** test and its standardised large-\(N\) version due to **Pesaran & Yamagata (2008)** are implemented as `swamy_test()`: ```{r swamy-test} swamy_test(fit_cce) ``` A large, significant statistic rejects homogeneity and justifies heterogeneous slopes (MG/CCE/DCCE). For this growth regression the test rejects decisively, confirming that growth dynamics differ across countries. ## Hausman test: MG vs pooled A Hausman-style comparison of MG against the weighted pooled estimator: ```{r hausman-test} hausman_test(fit_cce) ``` Under the null (homogeneous slopes) both estimators are consistent; under the alternative only MG is. Rejection of the null is another signal that the homogeneous-slope assumption is too restrictive. ## `broom` compatibility `dcce_fit` objects are compatible with `broom::tidy()` and `broom::glance()`. For long-run estimators, `tidy()` additionally includes rows for the adjustment speed and the long-run coefficients, tagged in the `type` column: ```{r broom-methods} tidy(fit_dcce2) glance(fit_dcce2) ``` ```{r broom-csardl} # For a CS-ARDL fit, tidy() returns short-run, adjustment, and long-run rows tidy(fit_csardl) ``` ## Confidence intervals The `confint()` method supports three types: `"mg"` for the main coefficients, `"lr"` for long-run estimates, and `"adjustment"` for the speed of adjustment: ```{r confint-demo} confint(fit_csardl, type = "lr", level = 0.90) confint(fit_csardl, type = "adjustment") ``` ## Visualising heterogeneity The `plot()` method shows the distribution of unit-level coefficients: ```{r plot-coef, fig.alt="Histograms of unit-level coefficients for the DCCE growth regression"} plot(fit_dcce2, which = "coef") ``` Each panel corresponds to one regressor; the dashed red line marks the MG mean. Use `which = "resid"` for a residual diagnostic plot. # Additional Estimators (v0.2.0+) ## Augmented Mean Group (AMG) The **AMG** estimator of @eberhardt2010productivity accounts for CSD by extracting a Common Dynamic Process (CDP) from time-dummy coefficients in a pooled first-difference regression. The CDP is then added as a nuisance regressor in each unit's OLS: ```{r amg} fit_amg <- dcce( data = pwt8, unit_index = "country", time_index = "year", formula = d_log_rgdpo ~ log_hc + log_ck + log_ngd, model = "amg", cross_section_vars = NULL ) coef(fit_amg) ``` ## Interactive Fixed Effects (IFE, Bai 2009) Unlike CCE (which proxies factors via cross-sectional averages), **IFE** estimates the factors and loadings directly via iterative principal components. It produces **pooled** slope coefficients (not heterogeneous): ```{r ife} fit_ife <- dcce( data = pwt8, unit_index = "country", time_index = "year", formula = log_rgdpo ~ log_hc + log_ck + log_ngd, model = "ife", n_factors = 2L ) print(fit_ife) ``` ## Pooled CCE (CCEP) @pesaran2006estimation proposed two CCE estimators: the Mean Group version (CCE-MG, our `model = "cce"`) and the **Pooled** version (CCEP). CCEP constrains slopes to be identical across units, gaining efficiency when slopes are truly homogeneous: ```{r ccep} fit_ccep <- dcce( data = pwt8, unit_index = "country", time_index = "year", formula = d_log_rgdpo ~ log_hc + log_ck + log_ngd, model = "ccep", cross_section_vars = ~ . ) coef(fit_ccep) ``` # Additional Diagnostics (v0.3.0+) ## Panel Granger causality The Dumitrescu-Hurlin (2012) test checks whether one variable Granger-causes another in a heterogeneous panel: ```{r granger} gc <- granger_test( data = pwt8, unit_index = "country", time_index = "year", y = "d_log_rgdpo", x = "log_ck", lags = 1L ) print(gc) ``` ## IPS and LLC panel unit root tests Standard (non-CSD-robust) benchmarks alongside CIPS: ```{r ips-llc} X_lev <- do.call(rbind, by(pwt8, pwt8$country, function(d) d$log_rgdpo)) X_lev <- X_lev[, colSums(is.na(X_lev)) == 0, drop = FALSE] panel_ur_test(X_lev, test = "ips", lags = 1) ``` ## Pedroni cointegration test ```{r pedroni} panel_coint_test( data = pwt8, unit_index = "country", time_index = "year", formula = log_rgdpo ~ log_hc + log_ck, test = "pedroni", lags = 1 ) ``` ## Structural break test The sup-Wald test for an unknown break date with Andrews (1993) critical values: ```{r structural-break} brk <- structural_break_test( data = pwt8, unit_index = "country", time_index = "year", formula = d_log_rgdpo ~ log_hc + log_ck + log_ngd, model = "mg", cross_section_vars = NULL, type = "unknown", trim = 0.20 ) print(brk) ``` # Impulse Response Functions (v0.4.0) For dynamic models, `irf()` traces the response of the dependent variable to a unit shock in a regressor: ```{r irf-example} data(dcce_sim) fit_dyn <- dcce( data = dcce_sim, unit_index = "unit", time_index = "time", formula = y ~ L(y, 1) + x, model = "dcce", cross_section_vars = ~ ., cross_section_lags = 3 ) ir <- irf(fit_dyn, impulse = "x", horizon = 10, boot_reps = 99, seed = 42) print(ir) ``` # The `dcce_workflow()` Pipeline A single function runs all recommended pre-estimation diagnostics and returns a recommended `dcce()` call: ```{r workflow-demo, eval=FALSE} dcce_workflow( data = pwt8, unit_index = "country", time_index = "year", formula = log_rgdpo ~ log_hc + log_ck + log_ngd ) ``` # Quick-Start Guide ```{r quickstart, eval=FALSE} library(dcce) data(pwt8) # 1. Detect CSD fit_ols_qs <- lm(d_log_rgdpo ~ log_hc + log_ck + log_ngd, data = na.omit(pwt8)) pcd_test(residuals(fit_ols_qs), data = na.omit(pwt8), unit_index = "country", time_index = "year", test = "pesaran") # 2. Estimate with DCCE fit_qs <- dcce( data = pwt8, unit_index = "country", time_index = "year", formula = d_log_rgdpo ~ L(log_rgdpo, 1) + log_hc + log_ck + log_ngd, model = "dcce", cross_section_vars = ~ log_rgdpo + log_hc + log_ck + log_ngd, cross_section_lags = 3 ) # 3. Check residuals pcd_test(fit_qs, test = "pesaran") # 4. Tidy output tidy(fit_qs) glance(fit_qs) # 5. Bootstrap dcce_bootstrap(fit_qs, reps = 499) ``` # References