专栏名称: 机器学习研究会

机器学习研究会是北京大学大数据与机器学习创新中心旗下的学生组织，旨在构建一个机器学习从事者交流的平台。除了及时分享领域资讯外，协会还会举办各种业界巨头/学术神牛讲座、学术大牛沙龙分享会、real data 创新竞赛等活动。

【学习】贝叶斯：深度网络泛化的秘密

机器学习研究会 · 公众号 · AI · 2017-05-26 21:37

正文

点击上方“机器学习研究会”可以订阅哦

摘要

转自：爱可可-爱生活

The Bayesian community should really start going to ICLR. They really should have started going years ago. Some people actually have.

For too long we Bayesians have, quite arrogantly, dismissed deep neural networks as unprincipled, dumb black boxes that lack elegance. We said that highly over-parametrised models fitted via maximum likelihood can't possibly work, they will overfit, won't generalise, etc. We touted our Bayesian nonparametric models instead: Chinese restaurants, Indian buffets, Gaussian processes. And, when things started looking really dire for us Bayesians, we even formed an alliance with kernel people, who used to be our mortal enemies just years before because they like convex optimisation. Surely, nonparametric models like kernel machines are a principled way to build models with effectively infinite number of parameters. Any model with infinite parameters should be strictly better than any large, but finite parametric model, right? Well, we have been proven wrong.

But maybe not, actually. We Bayesians also have a not-so-secret super-weapon: we can take algorithms that work well, reinterpret them as approximations to some form of Bayesian inference, and voila, we can claim credit for the success of an entire field of machine learning as a special case of Bayesian machine learning. We are the BORG of machine learning: eventually assimilate all other successful areas of machine learning and make them perfect. Resistance is futile.

We did this before: L1 regularisation is just MAP estimation with sparsity inducing priors, support vector machines are just the wrong way to train Gaussian processes. David Duvenaud and I even snatched herding from Max Welling and Alex Smola when we established herding is just Bayesian quadrature done slightly wrong.

But so far, we just couldn't find a way to claim credit for all of deep learning. Some of us tried to come through the back-door with Bayesian neural networks. It helps somewhat that Yann LeCun himself has written a paper on the topic. Yarin managed to claim dropout is just variational inference done wrong. But so far, Bayesian neural networks are just a complementary to existing successes. We could not so far claim that deep networks trained with stochastic gradient descent are Bayesian. Well, fellow Bayesian, our wait may be over.

链接：

http://www.inference.vc/everything-that-works-works-because-its-bayesian-2/

原文链接：

http://weibo.com/1402400261/F4PWA7WYp?from=page_1005051402400261_profile&wvr=6&mod=weibotime&type=comment#_rnd1495804641101

“完整内容”请点击【阅读原文】

↓↓↓