除了Roth的文章外,最近几年,效力分析也在Top期刊的应用研究中越来越常见,例如,Black et al.(2022,JPubE)、Dench et al.(2024,JPubE)。关于效力分析更详细的技术讲解,请参见Sylvain Chabé-Ferret(2024)的计量课本《Statistical Tools for Causal Inference》第七章。
下面的内容主要来自于Nick Huntington-Klein(2020):《Simulation for Power Analysis》。
* Set the number of observations we want in our simulated data clear
* Create our data * Note 0 and 1 are the default start and end of runiform; I want that, so I don't put anything inside runiform() g X = runiform()
* Now create Y based on X * Don't forget to add additional noise! * The *true effect* of X on Y is .2 g Y = .2*X + rnormal(0, 3)
第二步:执行分析
假设我们计划获得回归中的稳健标准误:
* reg is short for regress reg Y X, robust
* And if we're just interested in significance, say at the 95% level, * we can compare the p-value to .05 and store the result as a local variable (i.e. just store the single number, not as a regular Stata variable) local sig = 2*ttail(e(df_r),abs(_b[X]/_se[X])) <= .05
* di is short for display di `sig'
第三步:重复上述过程
forvalues i = 1(1)500 { * Set the number of observations we want in our simulated data clear set obs 1000
* Create our data g X = runiform() g Y = .2*X + rnormal(0, 3)
* Run analysis quietly qui reg Y X, robust
* Pull out results local sig = 2*ttail(e(df_r),abs(_b[X]/_se[X])) <= .05 di `sig' local coef = _b[X] di `coef' }
clear * We want a number of observations equal to the number of times we will simulate the data set obs 500
* Blank variables g coef_results = . g sig_results = .
* Let's wrap the whole thing in quietly{} - we don't need the output! quietly { forvalues i = 1(1)500 {
* Preserve our results data set so we can instantly come back to it preserve
* Set the number of observations we want in our simulated data * Now we're no longer in the results data set; we're in the simulated-data data set clear set obs 1000
* Create our data g X = runiform() g Y = .2*X + rnormal(0, 3)
* Run analysis quietly qui reg Y X, robust
* Pull out results local sig = 2*ttail(e(df_r),abs(_b[X]/_se[X])) <= .05 local coef = _b[X]
* Now restore to come back to results restore
* And store our results in the results variables, using "in" to tell it which row to store the data in replace coef_results = `coef' in `i' replace sig_results = `sig' in `i' } }
* summ is short for summarize summ sig_results
结果如下:
结果显示,6%的统计效力(均值)。
第五步:更多的结果
我们还可以考察系数的分布:
tw kdensity coef_results, xti("Coefficient on X") yti("Density") lc(red)
image.png
X系数的密度分布
label def sig 0 "Not Significant" 1 "Significant" label values sig_results sig graph bar, over(sig_results) yti("Percentage")
foreach effect in .4 .8 1.2 1.6 2 { foreach sample_size in 1000 { quietly { clear * We want a number of observations equal to the number of times we will simulate the data set obs 500
* Blank variables g sig_results = .
* Let's wrap the whole thing in quietly{} - we don't need the output! forvalues i = 1(1)500 {
preserve
clear * NOTICE I'm putting the "sample_size" variable from above here set obs `sample_size'
* Create our data g X = runiform() * and the "effect" variable here g Y = `effect'*X + rnormal(0, 3)
* Run analysis quietly reg Y X, robust
* Pull out results local sig = 2*ttail(e(df_r),abs(_b[X]/_se[X])) <= .05 local coef = _b[X]
* Now restore to come back to results restore
* And store our results in the results variables, using "in" to tell it which row to store the data in replace sig_results = `sig' in `i' }
summ sig_results * since we're inside a quietly{} we need a noisily to see this noisily di "With a sample size of `sample_size' and an effect of `effect', the mean of sig_results is " r(mean) } } }