其二,希望诸位能建立起 Stata 的基本架构,熟知 Stata 能做什么、如何做?以便为后续学习打下宽厚扎实的基础。
翻阅 Top 期刊上的论文,文中的方法我们似乎都会。细细想来,原因在于这些论文的想法或视角通常都比较独特,并使用了恰当的方法来论证。这里的关键在于研究设计,而这在目前的计量教科书中却鲜有涉及。为此,本次研讨班突出两个特点:一方面,我会努力把基础知识讲解透彻,进度上不求快;另一方面,我在每个专题中都会提供了 2-3 篇比较经典的论文,展示这些方法的合理应用。
在
内容安排
上,基本上遵循了由浅入深,循序渐进的原则。
A0-A1 讲
依序介绍 Stata 的基本用法、数据处理、程序编写和可视化分析,学习这些内容无需太多的计量经济学基础,但对于提高实证分析能力和分析效率,大有裨益。本讲中,我会以一篇文章为实例,说明 Stata 的基本语法结构,并对数据处理过程中的关键问题进行介绍,如离群值的处理、文字变量的处理等。就我个人的经验而言,数据处理能力的高低直接决定实证分析的效率,而对于离群值等问题的处理是否妥善会直接影响全文结果的稳健性,是多数人不够重视但却至关重要的问题。此前有不少学完了高级班的同学又回炉初级班,便是感悟到了这一点。
Cameron, C. A., D. L. Miller,
2015
, A practitioner’s guide to cluster-robust inference,
Journal of Human Resources
, 50 (2): 317-372. -Link-, -PDF-
Correia, S.
2016
.
reghdfe
, Linear Models with High-Dimensional Fixed Effects: An Efficient and Feasible Estimator. Working Paper. -PDF-, Examples
A6. 一篇 Top 期刊论文重现
Akcigit, U., J. Grigsby, T. Nicholas, S. Stantcheva,
2022
, Taxation and innovation in the twentieth century,
The Quarterly Journal of Economics
, 137 (1): 329-385. -Link-, -PDF-, -Appendix-, -cited-, -Replication-
Sherman M G, Tookes H E. Female representation in the academic finance profession.
Journal of Finance
, 2022, 77(1): 317-365. -Link-, -cited-, -PDF-, -Replication-
近期的 Top 期刊越来越强调模型不确定性,比如:控制变量也有好坏之分、是否存在非线性特征、不同模型的优劣对比等。这就需要进行各类检验,以便排除各种「混杂因素」和「似是而非」的论述,让论文的研究结论具有排他性,经济含义也更为清晰明确。本专题包括假设检验的基本原理、模型筛选和对比检验,以及「不容易做好」的稳健性检验等内容。在介绍检验方法和命令的同时,重点在于如何解释它们的经济含义,如何选择合适的检验方法并采用合适的方式加以呈现和分析。在后续的专题中,会结合具体模型设定来讲解这些检验方法的变种的灵活运用。
系数的联合检验:Wald,LR,LM 检验
test
,
testparm
,
lincom
,
nlcom
,
testnl
结果的汇集与呈现
模型比较:嵌套模型比较、非嵌套模型比较
R2 分解和贡献度分析
系数差异检验:Chow 检验,SUR,Bootstrap,排序检验
内生性检验、稳健性检验、安慰剂检验
参考文献:
Hansen B E . 2021. Econometrics. Princeton University Press. Data and Contents, PDF. Chap 9.
Yan, G., & Chen, Q. (2023). synth2: Synthetic control method with placebo tests, robustness test, and visualization.
The Stata Journal
, 23(3), 597–624. Link, PDF, Google.
复现论文
Ye, D., Y. K. Ng, Y. Lian,
2015
, Culture and happiness,
Social Indicators Research
, 123 (2): 519-547. -Link-, -PDF-, -cited-,-Replication-
Akcigit, U., J. Grigsby, T. Nicholas, S. Stantcheva,
2022
, Taxation and innovation in the twentieth century,
The Quarterly Journal of Economics
, 137 (1): 329-385. -Link-, -PDF-, -Appendix-, -cited-, -Replication-
Lee, C.-C., Feng, Y., & Peng, D. (2022). A green path towards sustainable development: The impact of low-carbon city pilot on energy transition.
Energy Economics
, 115, 106343. Link (rep), PDF, Google. -Replication-
B2. IV 和 GMM (3 小时)
IV 的思想并不复杂,但想找到一个能说服审稿人的 IV 却往往是可遇不可求的事情。在横截面分析中确实如此。随着面板数据模型的快速发展,IV 的构造思路已经发生了很大的变化,「就地取材」、「差别反应」都是非常有用的构造思路。GMM 是估计动态面板的标准方法,它在投资方程、DSGE 等领域也是重要的估计方法。本讲以因果图为基础,介绍 IV,2SLS 和 GMM 的基本思想,辅以几篇经典论文的 Stata 实操。
IV 和 2SLS 估计的原理
GMM 估计的原理
假设检验:内生性、排他性
应用 1:动态面板数据模型
应用 2:Lasso-IV
参考文献:
Hansen B E . 2021. Econometrics. Princeton University Press. Data and Contents, PDF. Chap 12-13.
Lal, A., Lockhart, M., Xu, Y., & Zu, Z. (2024). How Much Should We Trust Instrumental Variable Estimates in Political Science? Practical Advice Based on 67 Replicated Studies.
Political Analysis
, 1–20. Link, PDF, -Appendix-, -Replication-, PDF+附录-260页,
复现文档
Akcigit, U., J. Grigsby, T. Nicholas, S. Stantcheva,
2022
, Taxation and innovation in the twentieth century,
The Quarterly Journal of Economics
, 137 (1): 329-385. -Link-, -PDF-, -Appendix-, -cited-, -Replication-
Acemoglu, D., & Restrepo, P. (2017). Secular Stagnation? The Effect of Aging on Economic Growth in the Age of Automation.
American Economic Review
, 107(5), 174–179. Link, PDF, -PDF2-, Google.
Bai, J. 2009. Panel Data Models With Interactive Fixed Effects. (2009).
Econometrica
, 77(4), 1229–1279. Link (rep), PDF, Google.
Bai, J., Liao, Y., & Yang, J. (2015). Unbalanced Panel Data Models with Interactive Effects. In The Oxford Handbook of Panel Data, ed. B. H. Baltagi, 149–170. Oxford:Oxford University Press, 149–170. Link, PDF, Google.
Norkutė, M., Sarafidis, V., Yamagata, T., & Cui, G. (2021).
In
strumental variable estimation of dynamic linear panel data models with defactored regressors and a multifactor error structure.
Journal of Econometrics
, 220(2), 416–446. Link, PDF, Google.
Cui, G., Norkutė, M., Sarafidis, V., & Yamagata, T. (2021). Two-stage instrumental variable estimation of linear panel data models with interactive effects.
The Econometrics Journal
, 25(2), 340–361. Link, PDF, Google. -Replication-
Kripfganz, S., & Sarafidis, V. (2021). Instrumental-variable estimation of large-T panel-data models with common factors.
The Stata Journal
, 21(3), 659–686. Link, PDF, Google. -cited-
Arkhangelsky D, Athey S, Hirshberg D A, et al. Synthetic difference-in-differences[J]. American Economic Review, 2021, 111(12): 4088-4118. Link, -PDF- -Replicate- -Github-
Ditzen, J., & Reese, S. (2023). xtnumfac: A battery of estimators for the number of common factors in time series and panel-data models.
The Stata Journal
, 23(2), 438–454. Link, PDF, Google. github
Sul, D. Panel data econometrics: Common factor analysis for empirical researchers[M]. 2019. -Link-, -PDF-, Book-review, Codes-Stata/Gauss/Matlab, R-codes-readme
Huang, W., Wang, Y., & Zhou, L. (2024). Identify latent group structures in panel data: The classifylasso command.
The Stata Journal
, 24(1), 46–71. Link, PDF, Google.
Yan, G., & Chen, Q. (2022). rcm: A command for the regression control method.
The Stata Journal
, 22(4), 842–883. Link, PDF, Google.
Cattaneo, M. D., Crump, R. K., Farrell, M. H., & Feng, Y. (2024). On Binscatter.
American Economic Review
, 114(5), 1488–1514. Link, PDF, Appendix, Google, -Replication-, github, Slides
Cattaneo, Crump, Farrell and Feng (2024): Binscatter Regressions.
Stata Journal
, Forthcoming.
Du, K., Zhang, Y., & Zhou, Q. (2020). Fitting partially linear functional-coefficient panel-data models with Stata.
The Stata Journal
, 20(4), 976–998. Link, PDF, Google. -cited-, -Github-
复现文档
Akcigit, U., J. Grigsby, T. Nicholas, S. Stantcheva,
2022
, Taxation and innovation in the twentieth century,
The Quarterly Journal of Economics
, 137 (1): 329-385. -Link-, -PDF-, -Appendix-, -cited-, -Replication-
Chen Yu, S. Shi, Y. Tang.
2019
, Valuing the urban hukou in China: Evidence from a regression discontinuity design for housing prices.
The Journal of Development Economics
, 141 (2019) 102381. -Link-,-PDF-
Du, K., Cheng, Y., & Yao, X. (2021). Environmental regulation, green technology innovation, and industrial structure upgrading: The road to the green transformation of Chinese cities.
Energy Economics
, 98, 105247. Link (rep), PDF, -Replication-, Google.
Du, C., Cao, Y., Ling, Y., Jin, Z., Wang, S., & Wang, D. (2024). Does manufacturing agglomeration promote green productivity growth in China? Fresh evidence from partially linear functional-coefficient models.
Energy Economics
, 131, 107352. Link (rep), PDF, Google. -Replication-
Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, and Whitney Newey.
2017
. "Double/Debiased/Neyman Machine Learning of Treatment Effects."
American Economic Review
, 107 (5): 261-265. -Link-, -PDF-, -Replication-R, -2-
Chernozhukov, V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey,J. Robins,
2018
, Double/debiased machine learning for treatment and structural parameters,
The Econometrics Journal
, 21 (1): C1-C68. -Link-, -PDF-, Replication
Ahrens, A., Hansen, C. B., Schaffer, M. E., & Wiemann, T. (2024).
ddml
: Double/debiased machine learning in Stata.
The Stata Journal
, 24(1), 3–45. Link, PDF, Google.
Ahrens, A., Hansen, C. B., & Schaffer, M. E. (2023). pystacked: Stacking generalization and machine learning in Stata.
The Stata Journal
, 23(4), 909–931. Link, PDF, Google.
Ahrens, A., Hansen, C. B., & Schaffer, M. E. (2020). lassopack: Model selection and prediction with regularized regression in Stata.
The Stata Journal
, 20(1), 176–235. Link, PDF, Google. -PDF-, 程序更新到了 2024.2,
ssc des lassopack
Ahrens, A., Hansen, C. B., Schaffer, M. E., & Wiemann, T. (2024). Model averaging and double machine learning. arXiv Working Paper. Link, PDF
Dallakyan, A. (2022).
graphiclasso
: Graphical lasso for learning sparse inverse-covariance matrices.
The Stata Journal
, 22(3), 625–642. Link, PDF, Google.
Chiang, H. D., Kato, K., Ma, Y., & Sasaki, Y. (2022). Multiway Cluster Robust Double/Debiased Machine Learning.
Journal of Business & Economic Statistics
, 40(3), 1046–1056. Link, PDF, Google. 推文
Dhar, D., Jain, T., & Jayachandran, S. (2022). Reshaping Adolescents’ Gender Attitudes: Evidence from a School-Based Experiment in India.
American Economic Review
, 112(3), 899–927. Link (rep), PDF, Appendix, Google. -Replication-Stata, -cited-
不变因果预测(Invariant causal prediction)(Kook et al., 2024)
双重有效/双重尖锐(Doubly-Valid/Doubly-Sharp)、敏感性分析(Dorn et al.,2024)
基于模型辅助的敏感性分析(Tan,2024)
排他性检验(Goldsmith et al.,2022)
平行趋势检验
传统平行趋势检验
Event Study(Freyaldenhoven et al.,2021;Roth,2022)
Treeffuser(Beltran-Velezet al.,2024)
安慰剂检验
DID的新范式
TWFE与标准DID(Wooldridge, 2021)
纽曼正交框架下的DID(包括TWFE,DRDID,DML,GRF, Npcausal)(Sant & Zhao,2020;Kennedy et al.,2023)
匹配框架下的DID(包括:PSM-DID,SDID,WGAN)(Athey et al.,2021)
多期DID(包括:CSDID,动态IPW,RIPW)(Arkhangelsky et al.,2021;Callaway & Sant,2021;Goodman-Bacon,2021;van den Berg & Gerard,2022)
参考文献
:
以下是处理后的完整引文信息:
Boileau, P., Leng, N., Hejazi, N. S., Van Der Laan, M., & Dudoit, S. (2024). A nonparametric framework for treatment effect modifier discovery in high dimensions. Journal of the Royal Statistical Society Series B: Statistical Methodology. Link, PDF, Google.
Beltran-Velez, N., Grande, A. A., Nazaret, A., Kucukelbir, A., & Blei, D. (2024). Treeffuser: Probabilistic Predictions via Conditional Diffusions with Gradient-Boosted Trees (Version 2). arXiv. Link (rep), PDF, Google.
Chang, H., Middleton, J. A., & Aronow, P. M. (2024). Exact Bias Correction for Linear Adjustment of Randomized Controlled Trials. Econometrica, 92(5), 1503–1519. Link (rep), PDF, Google.
Di Giuli, A., & Laux, P. A. (2022). The effect of media-linked directors on financing and external governance. Journal of Financial Economics, 145(2), 103–131. Link (rep), PDF, Google.
Dorn, J., Guo, K., & Kallus, N. (2024). Doubly-Valid/Doubly-Sharp Sensitivity Analysis for Causal Inference with Unmeasured Confounding. Journal of the American Statistical Association, 1–12. Link, PDF, Google.
Guo, X., Li, R., Liu, J., & Zeng, M., 2022. High-Dimensional Mediation Analysis for Selecting DNA Methylation Loci Mediating Childhood Trauma and Cortisol Stress Reactivity.
Journal of the American Statistical Association
, 117(539), 1110-1121. Link, PDF, Google.
Guo, Z., Ćevid, D., & Bühlmann, P.,2022. Doubly debiased lasso: High-dimensional inference under hidden confounding.
Annals of Statistics
, 50(3), 1320.Link, PDF, Google.
Imai, K., Keele, L., & Yamamoto, T. (2010). Identification, inference and sensitivity analysis for causal mediation effects.
Statistical Science
, 25(1), 51-71. Link, PDF, Google.
Kook, L., Saengkyongam, S., Lundborg, A. R., Hothorn, T., & Peters, J. (2024). Model-Based Causal Feature Selection for General Response Types. Journal of the American Statistical Association, 1–12. Link, PDF, -PDF2-, Google.
Lin, Y., Windmeijer, F., Song, X., & Fan, Q. (2024). On the instrumental variable estimation with many weak and invalid instruments.
Journal of the Royal Statistical Society Series B: Statistical Methodology
, qkae025. Link, PDF, Google.
Ouyang, J., Tan, K. M., & Xu, G. (2023). High-dimensional inference for generalized linear models with hidden confounding.
The Journal of Machine Learning Research
, 24(1), 14030-14090. Link, PDF, Google.
Zhou, X. (2022). Semiparametric Estimation for Causal Mediation Analysis with Multiple Causally Ordered Mediators. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(3), 794–821. Link, PDF, Google.
Zhou, X., & Yamamoto, T., 2023. Tracing causal paths from experimental and observational data.
The Journal of Politics
, 85(1), 250-265.Link (rep), PDF, Appendix, Google.