Skip to contents

Simulate data from the model Y_it = alpha_i + mu_t + ATT*(t >= G_i) + epsilon_it, where i is individual, t is year, and G_i is the cohort. The ATT formula is ATTat0 + EventTime*ATTgrowth + \*cohort_counter\*ATTcohortdiff, where cohort_counter is the order of treated cohort (first, second, etc.).

Usage

SimDiD(
  seed = 1,
  sample_size = 100,
  cohorts = c(2007, 2010, 2012),
  ATTat0 = 1,
  ATTgrowth = 1,
  ATTcohortdiff = 0.5,
  anticipation = 0,
  minyear = 2003,
  maxyear = 2013,
  idvar = 1,
  yearvar = 1,
  shockvar = 1,
  indivAR1 = FALSE,
  time_covars = FALSE,
  clusters = FALSE,
  markets = FALSE,
  randomNA = FALSE,
  missingCohorts = NULL
)

Arguments

seed

Set the random seed. Default is seed=1.

sample_size

Number of individuals. Default is sample_size=100.

cohorts

Vector of years at which treatment onset occurs. Default is cohorts=c(2007,2010,2012).

ATTat0

Treatment effect at event time 0. Default is 1.

ATTgrowth

Increment in the ATT for each event time after 0. Default is 1.

ATTcohortdiff

Incrememnt in the ATT for each cohort. Default is 0.5.

anticipation

Number of years prior to cohort to allow 50% treatment effects. Default is anticipation=0.

minyear

Minimum calendar year to include in the data. Default is minyear=2003.

maxyear

Maximum calendar year to include in the data. Default is maxyear=2013.

idvar

Variance of individual fixed effects (alpha_i). Default is idvar=1.

yearvar

Variance of year effects (mu_i). Default is yearvar=1.

shockvar

Variance of idiosyncratic shocks (epsilon_it). Default is shockvar=1.

indivAR1

Each individual's shocks follow an AR(1) process. Default is FALSE.

time_covars

Add 2 time-varying covariates, called "X1" and "X2". Default is FALSE.

clusters

Add 10 randomly assigned clusters, with cluster-specific AR(1) shocks. Default is FALSE.

markets

Add 10 randomly assigned markets, with market-specific shocks that are systematically greater for markets that are treated earlier. Default is FALSE.

randomNA

If TRUE, randomly assign the outcome variable with missing values (NA) in some cases. Default is FALSE.

missingCohorts

If set to a particular cohort (or vector of cohorts), all of the outcomes for that cohort at event time -1 will be set to missing. Default is NULL.

Value

A list with two data.tables. The first data.table is simulated data with variables (id, year, cohort, Y), where Y is the outcome variable. The second data.table contains the true ATT values, both at the (event,cohort) level and by event averaging across cohorts.

Examples

# simulate data with default options
SimDiD()
#> $simdata
#>          id  year cohort         Y
#>       <int> <int>  <num>     <num>
#>    1:     1  2003   2012  8.058406
#>    2:     1  2004   2012 12.348703
#>    3:     1  2005   2012  7.549438
#>    4:     1  2006   2012  9.731058
#>    5:     1  2007   2012 11.269052
#>   ---                             
#> 1096:   100  2009   2010  7.071765
#> 1097:   100  2010   2010 12.349172
#> 1098:   100  2011   2010 13.647523
#> 1099:   100  2012   2010 13.490701
#> 1100:   100  2013   2010 12.650051
#> 
#> $true_ATT
#>      cohort event    ATTge
#>      <char> <num>    <num>
#>  1:    2007     0 1.000000
#>  2:    2007     1 2.000000
#>  3:    2007     2 3.000000
#>  4:    2007     3 4.000000
#>  5:    2007     4 5.000000
#>  6:    2007     5 6.000000
#>  7:    2007     6 7.000000
#>  8:    2010     0 1.500000
#>  9:    2010     1 2.500000
#> 10:    2010     2 3.500000
#> 11:    2010     3 4.500000
#> 12:    2012     0 2.000000
#> 13:    2012     1 3.000000
#> 14: Average     0 1.506757
#> 15: Average     1 2.506757
#> 16: Average     2 3.255102
#> 17: Average     3 4.255102
#> 18: Average     4 5.000000
#> 19: Average     5 6.000000
#> 20: Average     6 7.000000
#>      cohort event    ATTge
#>