Hal Varian is Chief Economist, Google Inc., Mountain View, California, and Emeritus Professor of Economics, University of California, Berkeley, California.
这篇文章开始给读者介绍了一些处理数据的方法和软件,以及大型 IT 公司的处理方法,这还是挺有用的。比如在处理百万条的大型数据时需要用到 SQL,数据清理可以用 OpenRefine 和 DataWrangler。
不过计量经济学和机器学习当然是有区别的,作者认为:
Data analysis in statistics and econometrics can be broken down into four categories: 1) prediction, 2) summarization, 3) estimation, and 4) hypothesis testing. Machine learning is concerned primarily with prediction.
[...]
Machine learning specialists are often primarily concerned with developing high-performance computer systems that can provide useful predictions in the presence of challenging computational constraints.
[...]
Data science, a somewhat newer term, is concerned with both prediction and summarization, but also with data manipulation, visualization, and other similar tasks.