绿色柱状组由像 Jonathan 这样的人组成,他们被最佳地分类到Rattlesnake学习中,而白色组由像 Scott 这样的人组成,他们的处理效应是负面的,因此他们没有参加编程语言的学习。在我进行的数据生成过程中,我得到了以下结果:
1、条件平行趋势(基于年龄和 GPA)
* Generate fixed effect with control group making 10,000 more at baseline qui gen unit_fe = 40000 + rnormal(0,5000)
* Generate Potential Outcomes Based on Age and GPA trends plus constant trend gen e = rnormal(0, 1500) qui gen y0 = unit_fe + 100 * age + 1000 * gpa + e if year == 1987 qui replace y0 = unit_fe + 1000 + 150 * age + 1500 * gpa + e if year == 1988 qui replace y0 = unit_fe + 2000 + 200 * age + 2000 * gpa + e if year == 1989 qui replace y0 = unit_fe + 3000 + 250 * age + 2500 * gpa + e if year == 1990 qui replace y0 = unit_fe + 4000 + 300 * age + 3000 * gpa + e if year == 1991 qui replace y0 = unit_fe + 5000 + 350 * age + 3500 * gpa + e if year == 1992
2、年龄和 GPA 方面的处理效应不同,但没有动态变化
* Covariate-based treatment effect heterogeneity gen y1 = y0 replace y1 = y0 + 1000 + 250 * age + 1000 * gpa if year >= 1991
* Treatment effect gen delta = y1 - y0 label var delta "Treatment effect for unit i"
3、完美医生选择处理
* Calculate average treatment effect for each unit in post-period bysort id (year): egen avg_delta_post = mean(delta) if year >= 1991 bysort id: egen avg_delta = mean(avg_delta_post)
* Assign treatment based on positive average treatment effect gen treat = (avg_delta > 0)
* Generate treatment dates gen treat_date = 0 replace treat_date = 1991 if treat == 1
* step4_roy.do. Selection on treatment gains or "Perfect Doctor" (e.g., Roy) clear all capture log close set seed 54321
******************************************************************************** * Define dgp ******************************************************************************** cap program drop dgp program define dgp
* First create the states quietly set obs 40 gen state = _n
* Generate 1000 workers. These are in each state. So 25 per state. quietly expand 25 bysort state: gen worker=runiform(0,5) label variable worker "Unique worker fixed effect per state" quietly egen id = group(state worker)
* Generate Covariates (Baseline values) gen age = rnormal(35, 10) gen gpa = rnormal(2.0, 0.5)
* Center Covariates (Baseline) sum age, meanonly qui replace age = age - r(mean) sum gpa, meanonly qui replace gpa = gpa - r(mean)
* Generate the years quietly expand 6 sort state bysort state worker: gen year = _n
* years 1987 -- 1992 replace year = 1986 + year
* Post-treatment gen post = 0 qui replace post = 1 if year >= 1991
* Generate fixed effect with control group making 10,000 more at baseline qui gen unit_fe = 40000 + rnormal(0,5000)
* Generate Potential Outcomes Based on Age and GPA trends plus constant trend gen e = rnormal(0, 1500) qui gen y0 = unit_fe + 100 * age + 1000 * gpa + e if year == 1987 qui replace y0 = unit_fe + 1000 + 150 * age + 1500 * gpa + e if year == 1988 qui replace y0 = unit_fe + 2000 + 200 * age + 2000 * gpa + e if year == 1989 qui replace y0 = unit_fe + 3000 + 250 * age + 2500 * gpa + e if year == 1990 qui replace y0 = unit_fe + 4000 + 300 * age + 3000 * gpa + e if year == 1991 qui replace y0 = unit_fe + 5000 + 350 * age + 3500 * gpa + e if year == 1992
* Covariate-based treatment effect heterogeneity gen y1 = y0 replace y1 = y0 + 1000 + 250 * age + 1000 * gpa if year >= 1991
* Treatment effect gen delta = y1 - y0 label var delta "Treatment effect for unit i"
* Calculate average treatment effect for each unit in post-period bysort id (year): egen avg_delta_post = mean(delta) if year >= 1991 bysort id: egen avg_delta = mean(avg_delta_post)
* Assign treatment based on positive average treatment effect gen treat = (avg_delta > 0)
* Generate treatment dates gen treat_date = 0 replace treat_date = 1991 if treat == 1
* Generate observed outcome based on treatment assignment gen earnings = y0 qui replace earnings = y1 if treat == 1 & year >= 1991
end
******************************************************************************** * Draw a sample ********************************************************************************
******************************************************************************** * Monte-carlo simulation ******************************************************************************** cap program drop simulation program define simulation, rclass clear quietly dgp
// True ATT gen true_att = y1 - y0 qui sum true_att if treat == 1 & year == 1991 return scalar att_1991 = r(mean) qui sum true_att if treat == 1 & year == 1992 return scalar att_1992 = r(mean)
******************************************************************************** * Plot results ******************************************************************************** use ./step4_roy.dta, clear
* Calculate means and standard deviations for OLS variables summarize ols_pre1987 local ols_pre1987_mean = r(mean) local ols_pre1987_sd = r(sd) summarize ols_pre1988 local ols_pre1988_mean = r(mean) local ols_pre1988_sd = r(sd) summarize ols_pre1989 local ols_pre1989_mean = r(mean) local ols_pre1989_sd = r(sd) summarize ols_post1991 local ols_post1991_mean = r(mean) local ols_post1991_sd = r(sd) summarize ols_post1992 local ols_post1992_mean = r(mean) local ols_post1992_sd = r(sd)
* Calculate means and standard deviations for CSDID variables summarize cs_pre1987 local cs_pre1987_mean = r(mean) local cs_pre1987_sd = r(sd) summarize cs_pre1988 local cs_pre1988_mean = r(mean) local cs_pre1988_sd = r(sd) summarize cs_pre1989 local cs_pre1989_mean = r(mean) local cs_pre1989_sd = r(sd) summarize cs_post1991 local cs_post1991_mean = r(mean) local cs_post1991_sd = r(sd) summarize cs_post1992 local cs_post1992_mean = r(mean) local cs_post1992_sd = r(sd)
summarize att_1992 local true_att_1991 = r(mean) summarize att_1991 local true_att_1992 = r(mean)
* Create a new dataset for plotting clear set obs 6
* Define the years gen year = 1987 + _n - 1
* True ATT values gen truth = 0 replace truth = `true_att_1991' if year == 1991 replace truth = `true_att_1992' if year == 1992
* OLS means and confidence intervals gen ols_mean = . gen ols_ci_lower = . gen ols_ci_upper = . replace ols_mean = `ols_pre1987_mean' if year == 1987 replace ols_mean = `ols_pre1988_mean' if year == 1988 replace ols_mean = `ols_pre1989_mean' if year == 1989 replace ols_mean = 0 if year == 1990 replace ols_mean = `ols_post1991_mean' if year == 1991 replace ols_mean = `ols_post1992_mean' if year == 1992
replace ols_ci_lower = ols_mean - 1.96 * `ols_pre1987_sd' if year == 1987 replace ols_ci_lower = ols_mean - 1.96 * `ols_pre1988_sd' if year == 1988 replace ols_ci_lower = ols_mean - 1.96 * `ols_pre1989_sd' if year == 1989 replace ols_ci_lower = 0 if year == 1990 replace ols_ci_lower = ols_mean - 1.96 * `ols_post1991_sd' if year == 1991 replace ols_ci_lower = ols_mean - 1.96 * `ols_post1992_sd' if year == 1992
replace ols_ci_upper = ols_mean + 1.96 * `ols_pre1987_sd' if year == 1987 replace ols_ci_upper = ols_mean + 1.96 * `ols_pre1988_sd' if year == 1988 replace ols_ci_upper = ols_mean + 1.96 * `ols_pre1989_sd' if year == 1989 replace ols_ci_upper = 0 if year == 1990 replace ols_ci_upper = ols_mean + 1.96 * `ols_post1991_sd' if year == 1991 replace ols_ci_upper = ols_mean + 1.96 * `ols_post1992_sd' if year == 1992
* CSDID means and confidence intervals gen csdid_mean = . gen csdid_ci_lower = . gen csdid_ci_upper = . replace csdid_mean = `cs_pre1987_mean' if year == 1987 replace csdid_mean = `cs_pre1988_mean' if year == 1988 replace csdid_mean = `cs_pre1989_mean' if year == 1989 replace csdid_mean = 0 if year == 1990 replace csdid_mean = `cs_post1991_mean' if year == 1991 replace csdid_mean = `cs_post1992_mean' if year == 1992
replace csdid_ci_lower = csdid_mean - 1.96 * `cs_pre1987_sd' if year == 1987 replace csdid_ci_lower = csdid_mean - 1.96 * `cs_pre1988_sd' if year == 1988 replace csdid_ci_lower = csdid_mean - 1.96 * `cs_pre1989_sd' if year == 1989 replace csdid_ci_lower = 0 if year == 1990 replace csdid_ci_lower = csdid_mean - 1.96 * `cs_post1991_sd' if year == 1991 replace csdid_ci_lower = csdid_mean - 1.96 * `cs_post1992_sd' if year == 1992
replace csdid_ci_upper = csdid_mean + 1.96 * `cs_pre1987_sd' if year == 1987 replace csdid_ci_upper = csdid_mean + 1.96 * `cs_pre1988_sd' if year == 1988 replace csdid_ci_upper = csdid_mean + 1.96 * `cs_pre1989_sd' if year == 1989 replace csdid_ci_upper = 0 if year == 1990 replace csdid_ci_upper = csdid_mean + 1.96 * `cs_post1991_sd' if year == 1991 replace csdid_ci_upper = csdid_mean + 1.96 * `cs_post1992_sd' if year == 1992
* Shift years slightly to avoid overlap gen year_ols = year - 0.1 gen year_csdid = year