Title: | Two-Stage Difference-in-Differences Following Gardner (2021) |
Version: | 1.0.2 |
Description: | Estimates Two-way Fixed Effects difference-in-differences/event-study models using the approach proposed by Gardner (2021) <doi:10.48550/arXiv.2207.05943>. To avoid the problems caused by OLS estimation of the Two-way Fixed Effects model, this function first estimates the fixed effects and covariates using untreated observations and then in a second stage, estimates the treatment effects. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.2.3 |
Depends: | R (≥ 3.5.0), fixest (≥ 0.10.1) |
Imports: | data.table, SparseM, MatrixExtra, Matrix, stats, boot, broom, ggplot2, rlang, did, staggered, didimputation |
URL: | https://kylebutts.github.io/did2s/ |
Suggests: | rmarkdown, knitr, haven, testthat (≥ 3.0.0) |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2023-04-02 18:41:56 UTC; kylebutts |
Author: | Kyle Butts |
Maintainer: | Kyle Butts <kyle.butts@colorado.edu> |
Repository: | CRAN |
Date/Publication: | 2023-04-07 15:50:02 UTC |
Data from Cheng and Hoekstra (2013)
Description
State-wide panel data from 2000-2010 that has information on castle-doctrine, the so-called "stand-your-ground" laws that were implemented by 20 states.
Usage
castle
Format
A data frame with 550 rows and 5 variables:
- sid
state id, unit of observation
- year
time in panel data
- l_homicide
log of the number of homicides per capita
- effyear
year that castle doctrine is passed
- post
0/1 variable for when castle doctrine is active
- time_til
time relative to castle doctrine being passed into law
Simulated data with two treatment groups and heterogenous effects
Description
Generated using the following call:
did2s::gen_data(panel = c(1990, 2020),
g1 = 2000, g2 = 2010, g3 = 0,
te1 = 2, te2 = 1, te3 = 0,
te_m1 = 0.05, te_m2 = 0.15, te_m3 = 0)
Usage
df_het
Format
A data frame with 31000 rows and 15 variables:
- unit
individual in panel data
- year
time in panel data
- g
the year that treatment starts
- dep_var
outcome variable
- treat
T/F variable for when treatment is on
- rel_year
year relative to treatment start. Inf = never treated.
- rel_year_binned
year relative to treatment start, but <=-6 and >=6 are binned.
- unit_fe
Unit FE
- year_fe
Year FE
- error
Random error component
- te
Static treatment effect = te
- te_dynamic
Dynamic treatmet effect = te_m
- state
State that unit is in
- group
String name for group
Simulated data with two treatment groups and homogenous effects
Description
Generated using the following call:
did2s::gen_data(panel = c(1990, 2020),
g1 = 2000, g2 = 2010, g3 = 0,
te1 = 2, te2 = 2, te3 = 0,
te_m1 = 0, te_m2 = 0, te_m3 = 0)
Usage
df_hom
Format
A data frame with 31000 rows and 15 variables:
- unit
individual in panel data
- year
time in panel data
- g
the year that treatment starts
- dep_var
outcome variable
- treat
T/F variable for when treatment is on
- rel_year
year relative to treatment start. Inf = never treated.
- rel_year_binned
year relative to treatment start, but <=-6 and >=6 are binned.
- unit_fe
Unit FE
- year_fe
Year FE
- error
Random error component
- te
Static treatment effect = te
- te_dynamic
Dynamic treatmet effect = te_m
- group
String name for group
- state
State that unit is in
- weight
Weight from runif()
Calculate two-stage difference-in-differences following Gardner (2021)
Description
Calculate two-stage difference-in-differences following Gardner (2021)
Usage
did2s(
data,
yname,
first_stage,
second_stage,
treatment,
cluster_var,
weights = NULL,
bootstrap = FALSE,
n_bootstraps = 250,
return_bootstrap = FALSE,
verbose = TRUE
)
Arguments
data |
The dataframe containing all the variables |
yname |
Outcome variable |
first_stage |
Fixed effects and other covariates you want to residualize
with in first stage.
Formula following |
second_stage |
Second stage, these should be the treatment indicator(s)
(e.g. treatment variable or event-study leads/lags).
Formula following |
treatment |
A variable that = 1 if treated, = 0 otherwise |
cluster_var |
What variable to cluster standard errors. This can be IDs or a higher aggregate level (state for example) |
weights |
Optional. Variable name for regression weights. |
bootstrap |
Optional. Should standard errors be calculated using bootstrap?
Default is |
n_bootstraps |
Optional. How many bootstraps to run.
Default is |
return_bootstrap |
Optional. Logical. Will return each bootstrap second-stage estimate to allow for manual use, e.g. percentile standard errors and empirical confidence intervals. |
verbose |
Optional. Logical. Should information about the two-stage
procedure be printed back to the user?
Default is |
Value
fixest
object with adjusted standard errors
(either by formula or by bootstrap). All the methods from fixest
package
will work, including fixest::esttable
and
fixest::coefplot
Examples
Load example dataset which has two treatment groups and homogeneous treatment effects
# Load Example Dataset data("df_hom")
Static TWFE
You can run a static TWFE fixed effect model for a simple treatment indicator
static <- did2s(df_hom, yname = "dep_var", treatment = "treat", cluster_var = "state", first_stage = ~ 0 | unit + year, second_stage = ~ i(treat, ref=FALSE)) #> Running Two-stage Difference-in-Differences #> - first stage formula `~ 0 | unit + year` #> - second stage formula `~ i(treat, ref = FALSE)` #> - The indicator variable that denotes when treatment is on is `treat` #> - Standard errors will be clustered by `state` fixest::esttable(static) #> static #> Dependent Var.: dep_var #> #> treat = TRUE 2.005*** (0.0202) #> _______________ _________________ #> S.E. type Custom #> Observations 46,500 #> R2 0.47520 #> Adj. R2 0.47520 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Event Study
Or you can use relative-treatment indicators to estimate an event study estimate
es <- did2s(df_hom, yname = "dep_var", treatment = "treat", cluster_var = "state", first_stage = ~ 0 | unit + year, second_stage = ~ i(rel_year, ref=c(-1, Inf))) #> Running Two-stage Difference-in-Differences #> - first stage formula `~ 0 | unit + year` #> - second stage formula `~ i(rel_year, ref = c(-1, Inf))` #> - The indicator variable that denotes when treatment is on is `treat` #> - Standard errors will be clustered by `state` fixest::esttable(es) #> es #> Dependent Var.: dep_var #> #> rel_year = -20 0.0043 (0.0322) #> rel_year = -19 0.0222 (0.0296) #> rel_year = -18 -0.0358 (0.0308) #> rel_year = -17 0.0043 (0.0337) #> rel_year = -16 -0.0186 (0.0353) #> rel_year = -15 -0.0045 (0.0346) #> rel_year = -14 -0.0393 (0.0384) #> rel_year = -13 0.0453 (0.0323) #> rel_year = -12 0.0324 (0.0309) #> rel_year = -11 -0.0245 (0.0349) #> rel_year = -10 -0.0017 (0.0241) #> rel_year = -9 0.0155 (0.0242) #> rel_year = -8 -0.0073 (0.0210) #> rel_year = -7 -0.0513* (0.0202) #> rel_year = -6 0.0269 (0.0237) #> rel_year = -5 0.0136 (0.0237) #> rel_year = -4 0.0381. (0.0223) #> rel_year = -3 -0.0228 (0.0284) #> rel_year = -2 0.0041 (0.0228) #> rel_year = 0 1.971*** (0.0470) #> rel_year = 1 2.050*** (0.0466) #> rel_year = 2 2.033*** (0.0441) #> rel_year = 3 1.966*** (0.0400) #> rel_year = 4 1.965*** (0.0430) #> rel_year = 5 2.030*** (0.0456) #> rel_year = 6 2.040*** (0.0447) #> rel_year = 7 1.995*** (0.0370) #> rel_year = 8 2.019*** (0.0485) #> rel_year = 9 1.955*** (0.0468) #> rel_year = 10 1.950*** (0.0455) #> rel_year = 11 2.117*** (0.0664) #> rel_year = 12 2.132*** (0.0741) #> rel_year = 13 2.019*** (0.0640) #> rel_year = 14 2.013*** (0.0522) #> rel_year = 15 1.961*** (0.0605) #> rel_year = 16 1.916*** (0.0584) #> rel_year = 17 1.938*** (0.0607) #> rel_year = 18 2.070*** (0.0666) #> rel_year = 19 2.066*** (0.0609) #> rel_year = 20 1.964*** (0.0612) #> _______________ _________________ #> S.E. type Custom #> Observations 46,500 #> R2 0.47577 #> Adj. R2 0.47533 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# plot rel_year coefficients and standard errors fixest::coefplot(es, keep = "rel_year::(.*)")
Example from Cheng and Hoekstra (2013)
Here's an example using data from Cheng and Hoekstra (2013)
# Castle Data castle <- haven::read_dta("https://github.com/scunning1975/mixtape/raw/master/castle.dta") did2s( data = castle, yname = "l_homicide", first_stage = ~ 0 | sid + year, second_stage = ~ i(post, ref=0), treatment = "post", cluster_var = "state", weights = "popwt" ) #> Running Two-stage Difference-in-Differences #> - first stage formula `~ 0 | sid + year` #> - second stage formula `~ i(post, ref = 0)` #> - The indicator variable that denotes when treatment is on is `post` #> - Standard errors will be clustered by `state` #> OLS estimation, Dep. Var.: l_homicide #> Observations: 550 #> Weights: weights_vector #> Standard-errors: Custom #> Estimate Std. Error t value Pr(>|t|) #> post::1 0.075142 0.03538 2.12387 0.034127 * #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> RMSE: 263.4 Adj. R2: 0.052465
Estimate event-study coefficients using TWFE and 5 proposed improvements.
Description
Uses the estimation procedures recommended from Borusyak, Jaravel, Spiess (2021); Callaway and Sant'Anna (2020); Gardner (2021); Roth and Sant'Anna (2021); Sun and Abraham (2020)
Usage
event_study(
data,
yname,
idname,
gname,
tname,
xformla = NULL,
weights = NULL,
estimator = c("all", "TWFE", "did2s", "did", "impute", "sunab", "staggered")
)
plot_event_study(out, separate = TRUE, horizon = NULL)
Arguments
data |
The dataframe containing all the variables |
yname |
Variable name for outcome variable |
idname |
Variable name for unique unit id |
gname |
Variable name for unit-specific date of initial treatment (never-treated should be zero or NA) |
tname |
Variable name for calendar period |
xformla |
A formula for the covariates to include in the model.
It should be of the form |
weights |
Variable name for estimation weights. This is used in estimating Y(0) and also augments treatment effect weights |
estimator |
Estimator you would like to use. Use "all" to estimate all. Otherwise see table to know advantages and requirements for each of these. |
out |
Output from |
separate |
Logical. Should the estimators be on separate plots? Default is TRUE. |
horizon |
Numeric. Vector of length 2. First element is min and second element is max of event_time to plot |
Value
event_study
returns a data.frame of point estimates for each estimator
plot_event_study
returns a ggplot object that can be fully customized
Examples
out = event_study(
data = did2s::df_het, yname = "dep_var", idname = "unit",
tname = "year", gname = "g", estimator = "all"
)
plot_event_study(out)
Generate TWFE data
Description
Generate TWFE data
Usage
gen_data(
g1 = 2000,
g2 = 2010,
g3 = 0,
panel = c(1990, 2020),
te1 = 2,
te2 = 2,
te3 = 2,
te_m1 = 0,
te_m2 = 0,
te_m3 = 0,
n = 1500
)
Arguments
g1 |
treatment date for group 1. For no treatment, set g = 0. |
g2 |
treatment date for group 2. For no treatment, set g = 0. |
g3 |
treatment date for group 3. For no treatment, set g = 0. |
panel |
numeric vector of size 2, start and end years for panel |
te1 |
treatment effect for group 1. Will ignore for that group if g = 0. |
te2 |
treatment effect for group 1. Will ignore for that group if g = 0. |
te3 |
treatment effect for group 1. Will ignore for that group if g = 0. |
te_m1 |
treatment effect slope per year |
te_m2 |
treatment effect slope per year |
te_m3 |
treatment effect slope per year |
n |
number of individuals in sample |
Value
Dataframe of generated data
Examples
# Homogeneous treatment effect
df_hom <- gen_data(panel = c(1990, 2020),
g1 = 2000, g2 = 2010, g3 = 0,
te1 = 2, te2 = 2, te3 = 0,
te_m1 = 0, te_m2 = 0, te_m3 = 0)
# Heterogeneous treatment effect
df_het <- gen_data(panel = c(1990, 2020),
g1 = 2000, g2 = 2010, g3 = 0,
te1 = 2, te2 = 1, te3 = 0,
te_m1 = 0.05, te_m2 = 0.15, te_m3 = 0)