The degradation (of training accuracy) indicates that not all systems are similarly easy to optimize. ……(省略介绍恒等变化)In this paper, we address the degradation problem by introducing a deep residual learning framework.
”
看到了吗,address the degradation problem,这就是ResNet最直接的目的:
解决退化问题理论上如果是全局最优,ResNet并不能得到更好的效果,但问题就出在DL训练不可能全局最优,而深层网络由于训练问题甚至达不到较好的局部最优
。
The degradation (of training accuracy) indicates that not all systems are similarly easy to optimize. Let us consider a shallower architecture and its deeper counterpart that adds more layers onto it. There exists a solution by construction to the deeper model: the added layers are identity mapping, and the other layers are copied from the learned shallower model. The existence of this constructed solution indicates that a deeper model should produce no higher training error than its shallower counterpart. But experiments show that our current solvers on hand are unable to find solutions that are comparably good or better than the constructed solution (or unable to do so in feasible time).
那为什么深的模型退化了呢,原句:But experiments show that our current solvers on hand are unable to find solutions that are comparably good or better than the constructed solution (or unable to do so in feasible time). 说白了就是训练优化问题,模型学不出来这样的参数。
那么好了,我们让模型更容易学出这样效果不就行了吗
恒等层不好学,0还不好学吗?
“
To the extreme, if an identity mapping were optimal, it would be easier to push the residual to zero than to fit an identity mapping by a stack of nonlinear layers.