Simulate data from the model Y_it = alpha_i + mu_t + ATT*(t >= G_i) + epsilon_it, where i is individual, t is year, and G_i is the cohort. The ATT formula is ATTat0 + EventTime*ATTgrowth + \*cohort_counter\*ATTcohortdiff, where cohort_counter is the order of treated cohort (first, second, etc.).
Usage
SimDiD(
seed = 1,
sample_size = 100,
cohorts = c(2007, 2010, 2012),
ATTat0 = 1,
ATTgrowth = 1,
ATTcohortdiff = 0.5,
anticipation = 0,
minyear = 2003,
maxyear = 2013,
idvar = 1,
yearvar = 1,
shockvar = 1,
indivAR1 = FALSE,
time_covars = FALSE,
clusters = FALSE,
markets = FALSE,
randomNA = FALSE,
missingCohorts = NULL
)
Arguments
- seed
Set the random seed. Default is seed=1.
- sample_size
Number of individuals. Default is sample_size=100.
- cohorts
Vector of years at which treatment onset occurs. Default is cohorts=c(2007,2010,2012).
- ATTat0
Treatment effect at event time 0. Default is 1.
- ATTgrowth
Increment in the ATT for each event time after 0. Default is 1.
- ATTcohortdiff
Incrememnt in the ATT for each cohort. Default is 0.5.
- anticipation
Number of years prior to cohort to allow 50% treatment effects. Default is anticipation=0.
- minyear
Minimum calendar year to include in the data. Default is minyear=2003.
- maxyear
Maximum calendar year to include in the data. Default is maxyear=2013.
- idvar
Variance of individual fixed effects (alpha_i). Default is idvar=1.
- yearvar
Variance of year effects (mu_i). Default is yearvar=1.
- shockvar
Variance of idiosyncratic shocks (epsilon_it). Default is shockvar=1.
- indivAR1
Each individual's shocks follow an AR(1) process. Default is FALSE.
- time_covars
Add 2 time-varying covariates, called "X1" and "X2". Default is FALSE.
- clusters
Add 10 randomly assigned clusters, with cluster-specific AR(1) shocks. Default is FALSE.
- markets
Add 10 randomly assigned markets, with market-specific shocks that are systematically greater for markets that are treated earlier. Default is FALSE.
- randomNA
If TRUE, randomly assign the outcome variable with missing values (NA) in some cases. Default is FALSE.
- missingCohorts
If set to a particular cohort (or vector of cohorts), all of the outcomes for that cohort at event time -1 will be set to missing. Default is NULL.
Value
A list with two data.tables. The first data.table is simulated data with variables (id, year, cohort, Y), where Y is the outcome variable. The second data.table contains the true ATT values, both at the (event,cohort) level and by event averaging across cohorts.
Examples
# simulate data with default options
SimDiD()
#> $simdata
#> id year cohort Y
#> <int> <int> <num> <num>
#> 1: 1 2003 2012 8.058406
#> 2: 1 2004 2012 12.348703
#> 3: 1 2005 2012 7.549438
#> 4: 1 2006 2012 9.731058
#> 5: 1 2007 2012 11.269052
#> ---
#> 1096: 100 2009 2010 7.071765
#> 1097: 100 2010 2010 12.349172
#> 1098: 100 2011 2010 13.647523
#> 1099: 100 2012 2010 13.490701
#> 1100: 100 2013 2010 12.650051
#>
#> $true_ATT
#> cohort event ATTge
#> <char> <num> <num>
#> 1: 2007 0 1.000000
#> 2: 2007 1 2.000000
#> 3: 2007 2 3.000000
#> 4: 2007 3 4.000000
#> 5: 2007 4 5.000000
#> 6: 2007 5 6.000000
#> 7: 2007 6 7.000000
#> 8: 2010 0 1.500000
#> 9: 2010 1 2.500000
#> 10: 2010 2 3.500000
#> 11: 2010 3 4.500000
#> 12: 2012 0 2.000000
#> 13: 2012 1 3.000000
#> 14: Average 0 1.506757
#> 15: Average 1 2.506757
#> 16: Average 2 3.255102
#> 17: Average 3 4.255102
#> 18: Average 4 5.000000
#> 19: Average 5 6.000000
#> 20: Average 6 7.000000
#> cohort event ATTge
#>